My Learning Space
Space to take notes, learn and share.
Hadoop – Compute
Hadoops schedules MapReduce jobs across the nodes of a cluster using JobTracker. Job Tracker JobTracker takes a MapReduce job and breaks it into small map and reduce tasks and then schedules them to the various machines in your cluster. It also ensures...
Hive Table operations and Storage config
Hive Data Types
Hadoop Data Types Overview CategoryData TypeDescriptionPrimitiveTINYINT1-byte signed integerPrimitiveSMALLINT2-byte signed integerPrimitiveINT4-byte signed integerPrimitiveBIGINT8-byte signed integerPrimitiveFLOATSingle precision floating pointPrimitiveDOUBLEDouble...
MapReduce Task Scheduling – Simplified Diagram
Here's a simplified explanation of MapReduce task scheduling architecture: graph TD A["Client Job Submission"] --> B["JobTracker"] B --> C["Resource Manager"] C --> D1["TaskTracker 1"] C --> D2["TaskTracker 2"] C --> D3["TaskTracker n"] D1 --> E1["Map Tasks"]...
Apache Spark
Graph processing with Pregel
What is Pregel ? Pregel is a distributed graph processing system developed by Google that follows a vertex-centric approach for large-scale graph computations. How it works ? It works through iterative supersteps where each vertex: Receives messages from the...