What is Pregel ?
Pregel is a distributed graph processing system developed by Google that follows a vertex-centric approach for large-scale graph computations.
How it works ?
It works through iterative supersteps where each vertex:
- Receives messages from the previous superstep
- Performs computations based on received messages
- Sends messages to neighboring vertices for the next superstep
Here's a simple example of how Pregel would compute the shortest path from a source vertex:
# Vertex Program pseudocode
def compute(vertex, messages):
if superstep == 0:
if vertex.id == source_vertex:
vertex.value = 0
send_to_neighbors(vertex.value + 1)
else:
vertex.value = INFINITY
else:
min_distance = min(messages)
if min_distance < vertex.value:
vertex.value = min_distance
send_to_neighbors(vertex.value + 1)
vote_to_halt()
The process continues until no more messages are being sent (convergence).
Key Features
Key features of Pregel include:
- Fault tolerance through checkpointing
- Synchronous superstep execution
- Message passing between vertices
- Ability to handle very large graphs through distribution
This model has inspired many modern graph processing systems like Apache Giraph and GraphX.