My Learning Space
Space to take notes, learn and share.
How Apache Spark work
Understanding Apache Spark Architecture Apache Spark is a distributed computing system designed for big data processing and analytics. Here's a breakdown of how it works: Core Components Driver Program: The central coordinator that manages the execution of...
Spark Motivation
Motivation Behind Apache Spark The limitations of MapReduce led to the development of Apache Spark, addressing key challenges in modern data processing. 1.Distributed data processing began with Hadoop and MapReduce. 2.Over time, specialized solutions were built...
Data Abstractions in Spark ( RDD, DataSet, DataFrame)
Spark provides three abstractions for handling data RDDs Distributed collections of objects that can be cached in memory across cluster nodes (e.g., if an array is large, it can be distributed across multiple clusters). DataFrame DataFrames are distributed collections...