How Does Redis Handle Hash Collisions?

Time: Column:Backend & Servers views:214

For hash tables, the most common issue is hash collisions. So, how does Redis handle hash collisions? In this article, we will provide a detailed explanation of how Redis deals with hash collisions, along with a discussion of its performance and implementation details.

How Does Redis Handle Hash Collisions?

Hash Table Implementation in Redis

In Redis, hash tables are a common data structure, typically used to store the properties of objects. Hash tables are used to implement several internal data structures, including the database key space and the hash type. Redis's hash table implementation is based on a data structure called dict. This structure uses two hash tables internally to support progressive rehashing.

Hash Table Structure

The structure of Redis's hash table is defined as follows:

typedef struct dictht {
    dictEntry **table;  // Hash table array
    unsigned long size; // Size of the hash table
    unsigned long sizemask; // Hash table size mask, used for calculating the index
    unsigned long used; // Number of hash table nodes in use
} dictht;

The dictEntry represents a hash table node and is defined as follows:

typedef struct dictEntry {
    void *key; // The key
    union {
        void *val; // The value
        uint64_t u64;
        int64_t s64;
        double d;
    } v;
    struct dictEntry *next; // Pointer to the next hash table node, forming a linked list
} dictEntry;

Each hash table node contains a key, a value, and a pointer to the next node. This pointer is used to resolve hash collisions.

Hash Collision Resolution Strategy

In Redis, hash collisions are resolved using chaining. Specifically, when multiple keys map to the same hash bucket, these keys are stored in a linked list. The advantage of chaining is that it is simple to implement and performs well when the hash table's load factor is low.

1. Chaining Implementation

When inserting a key-value pair, Redis first computes the hash value of the key and uses it to find the corresponding hash bucket. If the bucket is empty, the key-value pair is inserted directly; if the bucket is not empty, the new node is inserted at the head of the list. Thus, Redis's hash table uses a linked list with head insertion.

The pseudo-code for the insertion operation is as follows:

function dictAdd(dict, key, value):
    index = hashFunction(key) & dict.sizemask
    if dict.table[index] == NULL:
        dict.table[index] = new dictEntry(key, value)
    else:
        newEntry = new dictEntry(key, value)
        newEntry.next = dict.table[index]
        dict.table[index] = newEntry

2. Lookup Operation

For lookup operations, Redis first computes the hash value of the key and finds the corresponding hash bucket. Then, it traverses the linked list within the bucket, searching for the key until it is found or the list ends.

The pseudo-code for the lookup operation is as follows:

function dictFind(dict, key):
    index = hashFunction(key) & dict.sizemask
    entry = dict.table[index]
    while entry != NULL:
        if entry.key == key:
            return entry.value
        entry = entry.next
    return NULL

Progressive Rehashing

To maintain the performance of the hash table, Redis expands the table when it becomes too crowded or shrinks it when it becomes too idle. Redis uses a progressive rehashing strategy to avoid blocking the service during the rehash process.

Rehashing Process

The rehashing process works as follows:

  1. A new hash table is created, with a size either twice or half the size of the current table.

  2. The data from the old hash table is gradually migrated to the new one.

  3. Once the migration is complete, the old hash table's memory is freed.

Progressive rehashing is achieved by migrating data from the old hash table to the new one in batches. Specifically, each time an insert, delete, or update operation occurs, a certain number of hash table nodes are also migrated until the process is complete.

The pseudo-code for progressive rehashing is as follows:

function rehashStep(dict):
    if dict.rehashidx == -1:
        return
    for i = 0 to REHASH_BATCH_SIZE:
        if dict.rehashidx >= dict.size:
            dict.rehashidx = -1
            break
        while dict.table[dict.rehashidx] == NULL:
            dict.rehashidx += 1
        entry = dict.table[dict.rehashidx]
        while entry != NULL:
            nextEntry = entry.next
            index = hashFunction(entry.key) & dict.new_ht.sizemask
            entry.next = dict.new_ht.table[index]
            dict.new_ht.table[index] = entry
            entry = nextEntry
        dict.table[dict.rehashidx] = NULL
        dict.rehashidx += 1

Performance Analysis

Redis's hash table performs well when the load factor is low, but as the load factor increases, the length of the linked lists also increases, leading to a decline in lookup performance. To address this issue, Redis uses progressive rehashing to keep the load factor of the hash table within a reasonable range.

Conclusion

Redis resolves hash collisions using chaining and maintains the performance of the hash table through progressive rehashing. Chaining is simple to implement and performs well when the load factor is low, but performance decreases as the load factor increases. Progressive rehashing ensures that the system remains highly performant and available by migrating data in batches, avoiding blocking during the rehash process.

Through these mechanisms, Redis effectively balances performance and complexity in handling hash collisions, ensuring efficient data storage and retrieval in various usage scenarios.