Underlying Mechanisms of Python Set Deduplication

Python sets utilize a hash table implementation to store unique elements. The deduplication mechanism operates through a two-step verification process involving the __hash__ and __eq__ methods of the stored objects. Initially, the set evaluates the hash value of the incoming object. If this hash does not exist in the current hash table, the object is stored immediately. Conversely, if the hash matches an existing entry, Python invokes the __eq__ method to determine if the collision represents the same object or distinct objects that simply share the same hash code.

The following code demonstrates this behavior using a Packet class where the hash is derived from the packet's source address, while equality is determined by the full payload.

class Packet:
    def __init__(self, source, payload):
        self.source = source
        self.payload = payload

    def __eq__(self, other):
        print(f"-> Verifying equality for {self.source}")
        return self.payload == other.payload

    def __hash__(self):
        # Hash based only on source address
        return hash(self.source)

def verify_deduplication():
    # Packet A and B have same source (same hash), different payload
    pA = Packet("192.168.1.1", "Message_A")
    pB = Packet("192.168.1.1", "Message_B")

    # Packet C has different source (different hash)
    pC = Packet("10.0.0.1", "Message_C")

    network_buffer = set()
    
    print("Adding Packet A...")
    network_buffer.add(pA)
    
    print("Adding Packet B (Hash Collision)...")
    network_buffer.add(pB)
    
    print("Adding Packet C (New Hash)...")
    network_buffer.add(pC)

    print(f"\nBuffer Size: {len(network_buffer)}")
    for p in network_buffer:
        print(p.payload)

if __name__ == "__main__":
    verify_deduplication()

The console output illustrates the sequence of operations. When adding Packet B, the hash matches Packet A, triggering the __eq__ method. Since the payloads differ, False is returned, and both packets remain in the set. However, when Packet C is added, its hash is unique, so the system accepts it without executing __eq__, confirming that differing hashes bypass the equality check.

Tags: python Data Structures programming algorithms

Posted on Thu, 14 May 2026 23:00:25 +0000 by Love_Daddy