sparkSpark is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters.Spark’s primary abstraction is a distributed collection of items called a Dataset. Datasets can be created from Hadoop InputFormats (such as HDFS files) or by transforming other Datasets.