Gossip Protocol
The Gossip (Epidemic) Protocol facilitates data synchronization across nodes in distributed systems.
Node states are defined as:
- S (Susceptible): Unaware of updates
- I (Infected): Aware of updates and actively propagating them
- R (Removed): Completed updates, no longer participating in propagation
SI Model Implementation
# Initialization
self.status = 'susceptible'
# Main gossip loop
while True:
sleep(random_interval())
target_node = select_random_peer()
if self.push_mode and self.status == 'infected':
transmit(target_node, generate_data_message()) # Triggers handle_data_update
if self.pull_mode:
transmit(target_node, request_data_message()) # Triggers handle_request
def handle_data_update(self, message):
self.local_store(message.content)
self.status = 'infected'
def handle_request(self, message):
if self.status == 'infected':
transmit(message.originator, generate_data_message())
def generate_data_message(self):
return self.propagated_information
def request_data_message(self):
return "Data request payload"
Propagation Rate Analysis
Let $P_i$ represent the probability of remaining in susceptible state after round $i$:
- Pull mode: $P_{i+1} = P_i^2$
- Push mode: $P_{i+1} = P_i \cdot (1+\frac{1}{n})^{n \cdot (1-P_i)} \approx P_i \cdot e^{-1}$
Pure push performs worst; pull or combined approaches are superior.
SIR Model Variant
In SIR model, infected nodes transition to removed state with probability $1/k$.
Mathematical Framework
Let $s(t)$ and $i(t)$ denote proportions of susceptible and infected nodes at time $t$, with $r(t) = 1-s(t)-i(t)$ for removed nodes.
Transmission equations: $$\left{ \begin{array}{l} \frac{ds}{dt} = -si \ \frac{di}{dt} = si - \frac{1}{k}(1-s)i \end{array} \right.$$
This yields: $i(s) = -\frac{k+1}{k}s+\frac{1}{k}\log(s) + C$
For large $N$, $C \approx \frac{k+1}{k}$, leading to $s^* = \exp[-(k+1)(1-s^*)]$.
Propagation efficiency depends exponentially on $k$:
- $k=3 \Rightarrow s \leq 0.02$
- $k=4 \Rightarrow s \leq 0.007$
CS-DN Model
Implementation follows similar patterns to basic gossip protocols with enhanced state management.
P2P Lookup Systems
Peer-to-peer middleware simplifies cross-host service construction in large-scale distributed networks.
Required functionality includes location, communication, and resource management through overlay routing in the application layer.
Pastry Protocol
Pastry implements prefix-based routing to locate nodes numerically closest to keys.
Data Structures
-
Global Unique Identifier (GUID): Nodes and objects share the same identifier space
- Node ID:
node_id = hash(public_key) - Object ID:
item_id = hash(item_name)
- Node ID:
-
Leaf Set: Maintains neighbor collection
neighbors: {guid: address}- Contains numerically adjacent nodes
- Size:
2lentries (l above, l below)
-
Routing Table: Two-dimensional table
route_table[depth][digit]acccelerates lookupss
Routing Algorithm
# neighbors = {"identifier": "address"}
# route_table = table[identifier_length][15]
def find_target(self, target_key):
if target_key in self.neighbors:
self.neighbors[target_key].find_target(target_key)
else:
row_idx, col_idx = calculate_route_indices(target_key)
if self.route_table[row_idx][col_idx]:
self.route_table[row_idx][col_idx].find_target(target_key)
elif self.route_table[row_idx][random_column()]:
self.route_table[row_idx][random_column()].find_target(target_key)
else:
self.neighbors[random_neighbor()].find_target(target_key)
Network Join Process
New nodes establish routing tables and neighbor sets:
- Calculate own identifier as X
- Send join message toward X
- Message routes to numerically closest node Z
- Copy routing table rows from intermediate nodes
- Adopt neighbor set from destination node Z
Chord Protocol
Chord arranges nodes in a circular toploogy with exponential step routing.
Finger Table Structure
Entry $i$ contains: <successor(node + 2^{i-1}), address>
Current node: $n$ Entry $i$ represents node reached by advancing $2^i$ steps
Maximum: $m = \log_2 N$ entries
Chord Lookup Process
From linked list perspective:
- For given key, locate nearest node in finger table
- Jump and repeat
From interval perspective:
- Navigate to interval containing the key
- Continue until target found
Performance Metrics
With $N$ nodes and $K$ keys:
- Node operations move $O(K/N)$ objects
- Object lookup requires $O(\log N)$ messages
Kademlia Protocol
Nodes and objects exist in 160-bit consistent hash space using XOR distance metric.
XOR Distance Definition
Distance between nodes: $distance(x,y) = x \oplus y$
Satisfies mathematical distance properties:
- Non-negativity: $d(x,y) \geq 0$, equality when $x=y$
- Symmetry: $d(x,y) = d(y,x)$
- Triangle inequality: $d(x,y) \leq d(x,z) + d(z,y)$
Routing Table Maintenance
received_message = receive_packet()
distance_to_sender = xor_distance(self.identifier, received_message.source)
kbucket = self.routing_table.get_bucket(distance_to_sender)
if received_message.source in kbucket:
kbucket.remove(received_message.source)
kbucket.append(received_message.source)
return
if len(kbucket) < K_SIZE:
kbucket.append(received_message.source)
return
oldest_entry = kbucket.get_head()
if verify_connectivity(oldest_entry):
kbucket.remove(oldest_entry)
kbucket.append(oldest_entry)
else:
kbucket.remove(oldest_entry)
kbucket.append(received_message.source)
Core Kademlia Operations
verify_connectivity(node_id)find_closest_nodes(node_id): Returns K nearest nodesstore_key_value(key, value)retrieve_value(key): Performs FindNode operation
Data Persistence
During FindValue operations, replicate key-value pairs to known closest nodes without existing copies. Various scheduled operations ensure availability.