My Learning Space
Space to take notes, learn and share.
Lambda Architecture
Storm Architecture
Apache Storm is a distributed real-time stream processing system designed for big data. Here's a simple way to understand and remember its architecture: Core Components Spouts: Think of these as "water sources" Entry points that emit data streams Like a...
YARN
MapReduce 1.0 Architecture In MapReduce 1.0, JobTracker was the central component responsible for both resource management and job scheduling/monitoring. This created a bottleneck as it had to handle all responsibilities: Resource Management: Tracking resource...
Hive
Hive is altnerative way to program MapReduce . Hive write SQL-like queries which are then converted to MapReduce programs. Here's a comparison between Pig and Hive: FeaturePigHiveLanguagePig Latin (procedural)HiveQL (declarative,...
Hadoop Storage ( HDFS)
HDFS (Hadoop Distributed File System) is a fundamental component of the Apache Hadoop ecosystem designed specifically for handling big data processing. It provides a distributed storage system that can reliably store massive amounts of data across clusters of...
Hadoop – Overall
Hadoop Architecture Overview Hadoop consists of two main components: A. Storage (HDFS) and B. Processing (MapReduce/YARN). Hadoop users a master-slave architecture, where there is a master node for computation layer and master node for storage layer. Usually the...