For hash tables, the most common issue is hash collisions. So, how does Redis handle hash collisions? In this article, we will provide a detailed explanation of how Redis deals with hash collisions, along with a discussion of its performance and implementation details.
Hash Table Implementation in Redis
In Redis, hash tables are a common data structure, typically used to store the properties of objects. Hash tables are used to implement several internal data structures, including the database key space and the hash type. Redis's hash table implementation is based on a data structure called dict
. This structure uses two hash tables internally to support progressive rehashing.
Hash Table Structure
The structure of Redis's hash table is defined as follows:
typedef struct dictht { dictEntry **table; // Hash table array unsigned long size; // Size of the hash table unsigned long sizemask; // Hash table size mask, used for calculating the index unsigned long used; // Number of hash table nodes in use } dictht;
The dictEntry
represents a hash table node and is defined as follows:
typedef struct dictEntry { void *key; // The key union { void *val; // The value uint64_t u64; int64_t s64; double d; } v; struct dictEntry *next; // Pointer to the next hash table node, forming a linked list } dictEntry;
Each hash table node contains a key, a value, and a pointer to the next node. This pointer is used to resolve hash collisions.
Hash Collision Resolution Strategy
In Redis, hash collisions are resolved using chaining. Specifically, when multiple keys map to the same hash bucket, these keys are stored in a linked list. The advantage of chaining is that it is simple to implement and performs well when the hash table's load factor is low.
1. Chaining Implementation
When inserting a key-value pair, Redis first computes the hash value of the key and uses it to find the corresponding hash bucket. If the bucket is empty, the key-value pair is inserted directly; if the bucket is not empty, the new node is inserted at the head of the list. Thus, Redis's hash table uses a linked list with head insertion.
The pseudo-code for the insertion operation is as follows:
function dictAdd(dict, key, value): index = hashFunction(key) & dict.sizemask if dict.table[index] == NULL: dict.table[index] = new dictEntry(key, value) else: newEntry = new dictEntry(key, value) newEntry.next = dict.table[index] dict.table[index] = newEntry
2. Lookup Operation
For lookup operations, Redis first computes the hash value of the key and finds the corresponding hash bucket. Then, it traverses the linked list within the bucket, searching for the key until it is found or the list ends.
The pseudo-code for the lookup operation is as follows:
function dictFind(dict, key): index = hashFunction(key) & dict.sizemask entry = dict.table[index] while entry != NULL: if entry.key == key: return entry.value entry = entry.next return NULL
Progressive Rehashing
To maintain the performance of the hash table, Redis expands the table when it becomes too crowded or shrinks it when it becomes too idle. Redis uses a progressive rehashing strategy to avoid blocking the service during the rehash process.
Rehashing Process
The rehashing process works as follows:
A new hash table is created, with a size either twice or half the size of the current table.
The data from the old hash table is gradually migrated to the new one.
Once the migration is complete, the old hash table's memory is freed.
Progressive rehashing is achieved by migrating data from the old hash table to the new one in batches. Specifically, each time an insert, delete, or update operation occurs, a certain number of hash table nodes are also migrated until the process is complete.
The pseudo-code for progressive rehashing is as follows:
function rehashStep(dict): if dict.rehashidx == -1: return for i = 0 to REHASH_BATCH_SIZE: if dict.rehashidx >= dict.size: dict.rehashidx = -1 break while dict.table[dict.rehashidx] == NULL: dict.rehashidx += 1 entry = dict.table[dict.rehashidx] while entry != NULL: nextEntry = entry.next index = hashFunction(entry.key) & dict.new_ht.sizemask entry.next = dict.new_ht.table[index] dict.new_ht.table[index] = entry entry = nextEntry dict.table[dict.rehashidx] = NULL dict.rehashidx += 1
Performance Analysis
Redis's hash table performs well when the load factor is low, but as the load factor increases, the length of the linked lists also increases, leading to a decline in lookup performance. To address this issue, Redis uses progressive rehashing to keep the load factor of the hash table within a reasonable range.
Conclusion
Redis resolves hash collisions using chaining and maintains the performance of the hash table through progressive rehashing. Chaining is simple to implement and performs well when the load factor is low, but performance decreases as the load factor increases. Progressive rehashing ensures that the system remains highly performant and available by migrating data in batches, avoiding blocking during the rehash process.
Through these mechanisms, Redis effectively balances performance and complexity in handling hash collisions, ensuring efficient data storage and retrieval in various usage scenarios.