Select Page

My Learning Space

​Space to take notes, learn and share.

Hadoop – Compute

    Hadoops schedules MapReduce jobs across the nodes of a cluster using JobTracker. Job Tracker JobTracker takes a MapReduce job and breaks it into small map and reduce tasks and then schedules them to the various machines in your cluster. It also ensures...

read more

Hive Data Types

Hadoop Data Types Overview CategoryData TypeDescriptionPrimitiveTINYINT1-byte signed integerPrimitiveSMALLINT2-byte signed integerPrimitiveINT4-byte signed integerPrimitiveBIGINT8-byte signed integerPrimitiveFLOATSingle precision floating pointPrimitiveDOUBLEDouble...

read more

MapReduce Task Scheduling – Simplified Diagram

  Here's a simplified explanation of MapReduce task scheduling architecture: graph TD A["Client Job Submission"] --> B["JobTracker"] B --> C["Resource Manager"] C --> D1["TaskTracker 1"] C --> D2["TaskTracker 2"] C --> D3["TaskTracker n"] D1 --> E1["Map Tasks"]...

read more

Graph processing with Pregel

What is Pregel ? Pregel is a distributed graph processing system developed by Google that follows a vertex-centric approach for large-scale graph computations.   How it works ? It works through iterative supersteps where each vertex: Receives messages from the...

read more