Select Page

My Learning Space

​Space to take notes, learn and share.

Storm Architecture

Apache Storm is a distributed real-time stream processing system designed for big data. Here's a simple way to understand and remember its architecture: Core Components Spouts: Think of these as "water sources" Entry points that emit data streams Like a...

read more

YARN

MapReduce 1.0 Architecture In MapReduce 1.0, JobTracker was the central component responsible for both resource management and job scheduling/monitoring. This created a bottleneck as it had to handle all responsibilities: Resource Management: Tracking resource...

read more

Hive

Hive is altnerative way to program MapReduce . Hive write SQL-like queries which are then converted to MapReduce programs.       Here's a comparison between Pig and Hive: FeaturePigHiveLanguagePig Latin (procedural)HiveQL (declarative,...

read more

Hadoop Storage ( HDFS)

HDFS (Hadoop Distributed File System) is a fundamental component of the Apache Hadoop ecosystem designed specifically for handling big data processing. It provides a distributed storage system that can reliably store massive amounts of data across clusters of...

read more

Hadoop – Overall

Hadoop Architecture Overview Hadoop consists of two main components: A. Storage (HDFS) and B. Processing (MapReduce/YARN). Hadoop users a master-slave architecture, where there is a master node for computation layer and master node for storage layer. Usually the...

read more