Quantitative Analysis of Memory Usage When Redis Experiences Sudden Memory Surges

Time： 2024-09-25 Column：Backend & Servers views：346

Recently, I encountered a case where a Redis instance experienced a sudden memory surge, reaching a maximum used_memory of 78.9G, while the instance's maxmemory configuration was only 16G. This ultimately led to a significant amount of data being evicted from the instance.

Below are part of the output from the INFO MEMORY command at the time the problem occurred:

# Memory
used_memory:84716542624
used_memory_human:78.90G
used_memory_rss:104497676288
used_memory_rss_human:97.32G
used_memory_peak:84716542624
used_memory_peak_human:78.90G
used_memory_peak_perc:100.00%
used_memory_overhead:75682545624
used_memory_startup:906952
used_memory_dataset:9033997000
used_memory_dataset_perc:10.66%
allocator_allocated:84715102264
allocator_active:101370822656
allocator_resident:102303637504
total_system_memory:810745470976
total_system_memory_human:755.07G
used_memory_lua:142336
used_memory_lua_human:139.00K
used_memory_scripts:6576
used_memory_scripts_human:6.42K
number_of_cached_scripts:13
maxmemory:17179869184
maxmemory_human:16.00G
maxmemory_policy:volatile-lru
allocator_frag_ratio:1.20
allocator_frag_bytes:16655720392

Memory surges leading to data eviction are a common issue in Redis. Many people often lack a clear analytical approach when facing such problems and mistakenly believe they are caused by operations like replication or RDB persistence. Next, let’s look at how to systematically analyze these types of issues.

This article mainly includes the following sections:

How is used_memory derived from INFO?
What is used_memory?
What scenarios is used_memory typically used for?
Changes in memory statistics in Redis 7.
Trigger conditions for data eviction—does exceeding maxmemory always trigger eviction?
Finally, I'll share a script to help analyze in real-time which specific part of memory consumption is causing the growth in used_memory.

How is `used_memory` Derived from INFO?

When we execute the INFO command, Redis calls the genRedisInfoString function to generate its output.

// server.c
sds genRedisInfoString(const char *section) {
    ...
    /* Memory */
    if (allsections || defsections || !strcasecmp(section,"memory")) {
        ...
        size_t zmalloc_used = zmalloc_used_memory();
        ...
        if (sections++) info = sdscat(info,"
");
        info = sdscatprintf(info,
            "# Memory
"
            "used_memory:%zu
"
            "used_memory_human:%s
"
            "used_memory_rss:%zu
"
            ...
            "lazyfreed_objects:%zu
",
            zmalloc_used,
            hmem,
            server.cron_malloc_stats.process_rss,
            ...
            lazyfreeGetFreedObjectsCount()
        );
        freeMemoryOverheadData(mh);
    }
    ...
    return info;
}

As we can see, the value of used_memory comes from zmalloc_used, which is obtained through the zmalloc_used_memory() function.

// zmalloc.c
size_t zmalloc_used_memory(void) {
    size_t um;
    atomicGet(used_memory,um);
    return um;
}

The implementation of zmalloc_used_memory() is straightforward; it atomically reads the value of used_memory.

What is `used_memory`?

used_memory is a static variable of type redisAtomic size_t, where redisAtomic is an alias for _Atomic. _Atomic is a keyword introduced in the C11 standard to declare atomic types, ensuring that operations on this type are atomic in a multithreaded environment, thus avoiding data races.

#define redisAtomic _Atomic
static redisAtomic size_t used_memory = 0;

The update of used_memory is mainly achieved through two macro definitions:

#define update_zmalloc_stat_alloc(__n) atomicIncr(used_memory,(__n))
#define update_zmalloc_stat_free(__n) atomicDecr(used_memory,(__n))

The update_zmalloc_stat_alloc(__n) macro is called when memory is allocated, incrementing used_memory by __n through atomic operation.

The update_zmalloc_stat_free(__n) macro is called when memory is freed, decrementing used_memory by __n through atomic operation.

These two macros ensure accurate updates to used_memory during memory allocation and deallocation, avoiding data races caused by concurrent operations.

When allocating or freeing memory through the allocator (commonly using glibc's malloc, jemalloc, or tcmalloc—Redis generally uses jemalloc), the update_zmalloc_stat_alloc or update_zmalloc_stat_free functions are called to update the value of used_memory.

In Redis, memory management is primarily implemented through the following two functions:

// zmalloc.c
void *ztrymalloc_usable(size_t size, size_t *usable) {
    ASSERT_NO_SIZE_OVERFLOW(size);
    void *ptr = malloc(MALLOC_MIN_SIZE(size)+PREFIX_SIZE);

    if (!ptr) return NULL;
#ifdef HAVE_MALLOC_SIZE
    size = zmalloc_size(ptr);
    update_zmalloc_stat_alloc(size);
    if (usable) *usable = size;
    return ptr;
#else
    ...
#endif
}

void zfree(void *ptr) {
    ...
    if (ptr == NULL) return;
#ifdef HAVE_MALLOC_SIZE
    update_zmalloc_stat_free(zmalloc_size(ptr));
    free(ptr);
#else
   ...
#endif
}

Where:

The ztrymalloc_usable function is used to allocate memory. It first calls malloc to allocate memory. If successful, it updates used_memory through update_zmalloc_stat_alloc.
The zfree function is used to free memory. Before releasing memory, it adjusts used_memory through update_zmalloc_stat_free, and then calls free to release the memory.

This mechanism ensures that Redis can accurately track memory allocation and deallocation, effectively managing memory usage.

In what scenarios is `used_memory` typically utilized?

used_memory consists of two main components:

The data itself: Corresponding to used_memory_dataset in the INFO command.
Overhead for internal management and maintenance of data structures: Corresponding to used_memory_overhead in the INFO command.

It is important to note that used_memory_dataset is not calculated based on the number of keys and the memory used by the keys, but is instead derived by subtracting used_memory_overhead from used_memory.

Next, let's focus on analyzing the source of used_memory_overhead. In fact, Redis provides a dedicated function getMemoryOverheadData specifically to calculate this part of the memory overhead.

// object.c
struct redisMemOverhead *getMemoryOverheadData(void) {
    int j;
    // mem_total is used to accumulate the total memory overhead and will eventually be assigned to `used_memory_overhead`.
    size_t mem_total = 0;
    // mem is used to calculate the memory usage of each component.
    size_t mem = 0;
    // Call `zmalloc_used_memory()` to get `used_memory`.
    size_t zmalloc_used = zmalloc_used_memory();
    // Allocate memory for a `redisMemOverhead` structure using `zcalloc`.
    struct redisMemOverhead *mh = zcalloc(sizeof(*mh));
    ...
    // Add the memory usage at Redis startup `server.initial_memory_usage` to the total overhead.
    mem_total += server.initial_memory_usage;

    mem = 0;
    // Add the memory overhead of the replication backlog buffer.
    if (server.repl_backlog)
        mem += zmalloc_size(server.repl_backlog);
    mh->repl_backlog = mem;
    mem_total += mem;

    /* Computing the memory used by the clients would be O(N) if done
     * here online. We use our values computed incrementally by
     * clientsCronTrackClientsMemUsage(). */
    // Calculate the memory overhead of clients.
    mh->clients_slaves = server.stat_clients_type_memory[CLIENT_TYPE_SLAVE];
    mh->clients_normal = server.stat_clients_type_memory[CLIENT_TYPE_MASTER]+
                         server.stat_clients_type_memory[CLIENT_TYPE_PUBSUB]+
                         server.stat_clients_type_memory[CLIENT_TYPE_NORMAL];
    mem_total += mh->clients_slaves;
    mem_total += mh->clients_normal;
    // Calculate the memory overhead of AOF buffers and AOF rewrite buffers.
    mem = 0;
    if (server.aof_state != AOF_OFF) {
        mem += sdsZmallocSize(server.aof_buf);
        mem += aofRewriteBufferSize();
    }
    mh->aof_buffer = mem;
    mem_total += mem;
    // Calculate the memory overhead of Lua script caches.
    mem = server.lua_scripts_mem;
    mem += dictSize(server.lua_scripts) * sizeof(dictEntry) +
        dictSlots(server.lua_scripts) * sizeof(dictEntry*);
    mem += dictSize(server.repl_scriptcache_dict) * sizeof(dictEntry) +
        dictSlots(server.repl_scriptcache_dict) * sizeof(dictEntry*);
    if (listLength(server.repl_scriptcache_fifo) > 0) {
        mem += listLength(server.repl_scriptcache_fifo) * (sizeof(listNode) +
            sdsZmallocSize(listNodeValue(listFirst(server.repl_scriptcache_fifo))));
    }
    mh->lua_caches = mem;
    mem_total += mem;
    // Calculate the memory overhead of databases: iterate over all databases (`server.dbnum`). For each database, calculate the memory overhead of the main dictionary (`db->dict`) and the expiration dictionary (`db->expires`).
    for (j = 0; j < server.dbnum; j++) {
        redisDb *db = server.db + j;
        long long keyscount = dictSize(db->dict);
        if (keyscount == 0) continue;

        mh->total_keys += keyscount;
        mh->db = zrealloc(mh->db, sizeof(mh->db[0]) * (mh->num_dbs + 1));
        mh->db[mh->num_dbs].dbid = j;

        mem = dictSize(db->dict) * sizeof(dictEntry) +
              dictSlots(db->dict) * sizeof(dictEntry*) +
              dictSize(db->dict) * sizeof(robj);
        mh->db[mh->num_dbs].overhead_ht_main = mem;
        mem_total += mem;

        mem = dictSize(db->expires) * sizeof(dictEntry) +
              dictSlots(db->expires) * sizeof(dictEntry*);
        mh->db[mh->num_dbs].overhead_ht_expires = mem;
        mem_total += mem;

        mh->num_dbs++;
    }
    // Assign the calculated `mem_total` to `mh->overhead_total`.
    mh->overhead_total = mem_total;
    // Calculate the memory used for data (`zmalloc_used - mem_total`) and store it in `mh->dataset`.
    mh->dataset = zmalloc_used - mem_total;
    mh->peak_perc = (float)zmalloc_used * 100 / mh->peak_allocated;

    /* Metrics computed after subtracting the startup memory from
     * the total memory. */
    size_t net_usage = 1;
    if (zmalloc_used > mh->startup_allocated)
        net_usage = zmalloc_used - mh->startup_allocated;
    mh->dataset_perc = (float)mh->dataset * 100 / net_usage;
    mh->bytes_per_key = mh->total_keys ? (net_usage / mh->total_keys) : 0;

    return mh;
}

From the above code analysis, we can understand that used_memory_overhead consists of the following parts:

server.initial_memory_usage: The memory usage at Redis startup, corresponding to used_memory_startup in the INFO command.
mh->repl_backlog: The memory overhead of the replication backlog buffer, corresponding to mem_replication_backlog in the INFO command.
mh->clients_slaves: The memory overhead of slave clients, corresponding to mem_clients_slaves in the INFO command.
mh->clients_normal: The memory overhead of other clients, corresponding to mem_clients_normal in the INFO command.
mh->aof_buffer: The memory overhead of the AOF buffer and AOF rewrite buffer, corresponding to mem_aof_buffer in the INFO command. The AOF buffer is the buffer used before data is written to the AOF file, and the AOF rewrite buffer is used during AOF rewriting to store new data.
mh->lua_caches: The memory overhead of Lua script caches, corresponding to used_memory_scripts in the INFO command (introduced in Redis 5.0).
Dictionary memory overhead: This part of the memory is not shown in the INFO command but can be viewed via the MEMORY STATS command.

In these memory overheads:

used_memory_startup is generally stable.
mem_replication_backlog is limited by repl-backlog-size.
used_memory_scripts generally has low overhead.
The dictionary memory overhead grows proportionally to the data size.

Therefore, the main focus should be on three items: mem_clients_slaves, mem_clients_normal, and mem_aof_buffer.

mem_aof_buffer: Pay special attention to the buffer size during AOF rewriting.
mem_clients_slaves and mem_clients_normal: These represent the memory usage of clients. Both are allocated in the same way. The memory overhead of a client mainly includes three parts:

Input buffer: Used to temporarily store client commands, limited by client-query-buffer-limit.
Output buffer: Used to cache data sent to clients, limited by client-output-buffer-limit. If the data exceeds the soft or hard limit for a certain period, the client will be closed.
Memory used by the client object itself.

Changes in Memory Statistics in Redis 7

In Redis 7, the following additional memory overhead items are tracked:

mh->cluster_links: The memory overhead of cluster links, corresponding to mem_cluster_links in the INFO command.
mh->functions_caches: The memory overhead of function caches, corresponding to used_memory_functions in the INFO command.
Memory overhead of key-to-slot mapping in cluster mode, corresponding to overhead.hashtable.slot-to-keys in the MEMORY STATS command.

Additionally, Redis 7 introduced the Multi-Part AOF feature, which removed the AOF rewrite buffer.

It is important to note that the way memory for mh->repl_backlog and mh->clients_slaves is calculated has also changed.

Before Redis 7, mh->repl_backlog accounted for the size of the replication backlog buffer, and mh->clients_slaves accounted for the memory usage of all replica clients.

if (server.repl_backlog)
    mem += zmalloc_size(server.repl_backlog);
mh->repl_backlog = mem;
mem_total += mem;

mem = 0;
// Iterate through all replica clients, accumulating the memory usage of their output buffers, input buffers, and the client object itself.
if (listLength(server.slaves)) {
    listIter li;
    listNode *ln;

    listRewind(server.slaves,&li);
    while((ln = listNext(&li))) {
        client *c = listNodeValue(ln);
        mem += getClientOutputBufferMemoryUsage(c);
        mem += sdsAllocSize(c->querybuf);
        mem += sizeof(client);
    }
}
mh->clients_slaves = mem;

Since each replica is allocated its own replication buffer (i.e., the output buffer corresponding to the replica's client), this approach can lead to memory waste as the number of replicas increases. Moreover, if client-output-buffer-limit is set too high and there are too many replicas, it may cause the master to run out of memory (OOM).

To address this issue, Redis 7 introduced a global replication buffer. Both the replication backlog buffer (repl-backlog) and the replica clients' replication buffers now share this buffer.

The replBufBlock structure is used to store a block of the global replication buffer:

typedef struct replBufBlock {
    int refcount;           /* Number of replicas or repl backlog using. */
    long long id;           /* The unique incremental number. */
    long long repl_offset;  /* Start replication offset of the block. */
    size_t size, used;
    char buf[];
} replBufBlock;

Each replBufBlock contains a refcount field, which records how many replication instances (including the master's replication backlog and replicas) reference this block.

When a new replica is added, Redis does not allocate a new replication buffer block but instead increments the refcount of an existing replBufBlock.

Correspondingly, in Redis 7, the memory calculation for mh->repl_backlog and mh->clients_slaves has also changed:

if (listLength(server.slaves) &&
    (long long)server.repl_buffer_mem > server.repl_backlog_size)
{
    mh->clients_slaves = server.repl_buffer_mem - server.repl_backlog_size;
    mh->repl_backlog = server.repl_backlog_size;
} else {
    mh->clients_slaves = 0;
    mh->repl_backlog = server.repl_buffer_mem;
}
if (server.repl_backlog) {
    /* The approximate memory of rax tree for indexed blocks. */
    mh->repl_backlog +=
        server.repl_backlog->blocks_index->numnodes * sizeof(raxNode) +
        raxSize(server.repl_backlog->blocks_index) * sizeof(void*);
}
mem_total += mh->repl_backlog;
mem_total += mh->clients_slaves;

Specifically, if the size of the global replication buffer exceeds repl-backlog-size, the size of the replication backlog buffer (mh->repl_backlog) is set to repl-backlog-size, and the remaining portion is considered as memory used by replicas (mh->clients_slaves). If the size of the global replication buffer is less than or equal to repl-backlog-size, the entire size of the global replication buffer is assigned.

Additionally, since a Rax tree is introduced to index some nodes in the global replication buffer, the replication backlog also needs to account for the memory overhead of the Rax tree.

Conditions for Triggering Data Eviction

Many people mistakenly believe that data eviction occurs as soon as used_memory exceeds maxmemory. In reality, this is not the case.

The following conditions must be met for data to be evicted:

maxmemory must be greater than 0.
maxmemory-policy must not be noeviction.
The memory usage must meet certain conditions. It's not simply when used_memory exceeds maxmemory, but when used_memory minus mem_not_counted_for_evict exceeds maxmemory.

The value of mem_not_counted_for_evict can be obtained through the INFO command, and it is calculated in the freeMemoryGetNotCountedMemory function.

size_t freeMemoryGetNotCountedMemory(void) {
    size_t overhead = 0;
    int slaves = listLength(server.slaves);

    if (slaves) {
        listIter li;
        listNode *ln;

        listRewind(server.slaves,&li);
        while((ln = listNext(&li))) {
            client *slave = listNodeValue(ln);
            overhead += getClientOutputBufferMemoryUsage(slave);
        }
    }
    if (server.aof_state != AOF_OFF) {
        overhead += sdsalloc(server.aof_buf)+aofRewriteBufferSize();
    }
    return overhead;
}

The freeMemoryGetNotCountedMemory function calculates the total size of all replica clients' replication buffers, the AOF buffer, and the AOF rewrite buffer.

Therefore, when Redis determines whether data needs to be evicted, it subtracts the memory usage of the replica clients' replication buffers, the AOF buffer, and the AOF rewrite buffer from used_memory.

Redis Memory Analysis Script

Lastly, here is a script that can help analyze Redis memory usage quickly. By reviewing the output, you can easily see the memory consumption of different parts of Redis and identify which part's memory usage has increased when used_memory grows.

Script link: https://github.com/slowtech/dba-toolkit/blob/master/redis/redis_mem_usage_analyzer.py

# python3 redis_mem_usage_analyzer.py -host 10.0.1.182 -p 6379
Metric(2024-09-12 04:52:42)    Old Value            New Value(+3s)       Change per second   
==========================================================================================
Summary
---------------------------------------------
used_memory                    16.43G               16.44G               1.1M                
used_memory_dataset            11.93G               11.93G               22.66K              
used_memory_overhead           4.51G                4.51G                1.08M               

Overhead(Total)                4.51G                4.51G                1.08M               
---------------------------------------------
mem_clients_normal             440.57K              440.52K              -18.67B             
mem_clients_slaves             458.41M              461.63M              1.08M               
mem_replication_backlog        160M                 160M                 0B                  
mem_aof_buffer                 0B                   0B                   0B                  
used_memory_startup            793.17K              793.17K              0B                  
used_memory_scripts            0B                   0B                   0B                  
mem_hashtable                  3.9G                 3.9G                 0B                  

Evict & Fragmentation
---------------------------------------------
maxmemory                      20G                  20G                  0B                  
mem_not_counted_for_evict      458.45M              461.73M              1.1M                
mem_counted_for_evict          15.99G               15.99G               2.62K               
maxmemory_policy               volatile-lru         volatile-lru                             
used_memory_peak               16.43G               16.44G               1.1M                
used_memory_rss                16.77G               16.77G               1.32M               
mem_fragmentation_bytes        345.07M              345.75M              232.88K             

Others
---------------------------------------------
keys                           77860000             77860000             0.0                 
instantaneous_ops_per_sec      8339                 8435                                     
lazyfree_pending_objects       0                    0                    0.0

The script collects Redis memory data at intervals (determined by the -i parameter, defaulting to 3 seconds). It then compares the newly collected data (New Value) with the previous data (Old Value) and calculates the change per second (Change per second).

The output is divided into four main parts:

Summary: A summary where used_memory = used_memory_dataset + used_memory_overhead.
Overhead (Total): Shows the memory consumption of individual items in used_memory_overhead. The Overhead (Total) equals the sum of all items and should theoretically match used_memory_overhead.
Evict & Fragmentation: Displays key metrics related to eviction and memory fragmentation. Here, mem_counted_for_evict = used_memory - mem_not_counted_for_evict. Data eviction only occurs when mem_counted_for_evict exceeds maxmemory.
Others: Other important metrics, including keys (the total number of keys), instantaneous_ops_per_sec (the number of operations per second), and lazyfree_pending_objects (the number of objects awaiting asynchronous deletion).

If you find that mem_clients_normal or mem_clients_slaves is large, you can use the --client option to check the memory usage of individual clients.

# python3 redis_mem_usage_analyzer.py -host 10.0.1.182 -p 6379 --client
ID    Address            Name  Age    Command         User     Qbuf       Omem       Total Memory   
----------------------------------------------------------------------------------------------------
216   10.0.1.75:37811          721    psync           default  0B         232.83M    232.85M        
217   10.0.1.22:35057          715    psync           default  0B         232.11M    232.13M        
453   10.0.0.198:51172         0      client          default  26B        0B         60.03K         
...

Qbuf: Size of the input buffer.
Omem: Size of the output buffer.
Total Memory: Total memory used by the connection.

Results are sorted by Total Memory in descending order.

💰 Support Us