Analyzing Memory Surge During .NET Project Stress Test on Linux Environment

Time: Column:Backend & Servers views:186

In September, while stress-testing a company-deployed .NET project on a Linux environment, an issue was discovered during a 50-concurrent user load test. A specific container showed abnormal thread and memory increases, leading to malfunctions. Through the analysis of threads and memory, this article explores the investigation using debugging tools, particularly focusing on the visual interface of WinDbg.

Story:During the stress test of a company project deployed on a certain disk in September, an abnormal surge in a container's thread and memory usage was detected. This led to functional issues and system malfunctions. Leveraging my knowledge, I decided to analyze the thread and memory behavior. While tools like lldb or windbg can be used for such debugging, I prefer WinDbg's graphical interface. Thus, I proceeded with WinDbg.

WinDbg Analysis

Where is the memory leak?On the Windows platform, many may be familiar with the !address -summary command to observe memory leaks. However, this is specific to Windows and doesn't work well with Linux dumps. Below is an example output from the command:

0:000> !address -summary
                                     
Mapping file section regions...
Mapping module regions...
Mapping heap regions...

--- Usage Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal
<unknown>                              4062 ffffffff`f5638600 (  16.000 EB) 100.00%  100.00%
Image                                  1282        0`09fc8a00 ( 159.784 MB)   0.00%    0.00%

--- Type Summary (for busy) ------ RgnCount ----------- Total Size -------- %ofBusy %ofTotal
                                       2431 fffffffe`2b813000 (  16.000 EB)          100.00%
MEM_PRIVATE                            2913        1`d3dee000 (   7.310 GB)   0.00%    0.00%

--- State Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal
                                       2431 fffffffe`2b813000 (  16.000 EB) 100.00%  100.00%
MEM_COMMIT                             2913        1`d3dee000 (   7.310 GB)   0.00%    0.00%

--- Protect Summary (for commit) - RgnCount ----------- Total Size -------- %ofBusy %ofTotal
PAGE_READWRITE                         2115        1`cb683000 (   7.178 GB)   0.00%    0.00%
PAGE_EXECUTE_READ                       175        0`03d49000 (  61.285 MB)   0.00%    0.00%
PAGE_READONLY                           585        0`03ce9000 (  60.910 MB)   0.00%    0.00%
PAGE_EXECUTE_WRITECOPY                   38        0`00d39000 (  13.223 MB)   0.00%    0.00%

--- Largest Region by Usage ----------- Base Address -------- Region Size ----------
<unknown>                              7ffc`011fa000 ffff8003`fe406000 (  16.000 EB)
Image                                  7f45`fe4e9000        0`01b16000 (  27.086 MB)

The memory segment classification in the hexagram is not very useful and has little reference value, so what can we do? In fact, the coreclr team has also considered this situation and provided a maddress command to implement cross-platform !address. The output after the change is as follows:

0:000> !sos maddress
Enumerating and tagging the entire address space and caching the result...
Subsequent runs of this command should be faster.
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 
 | Memory Kind            |        StartAddr |        EndAddr-1 |         Size | Type        | State       | Protect                | Image                                                   | 
 +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 
 | Stack                  |     7f42d256e000 |     7f42d2d6e000 |       8.00mb | MEM_PRIVATE | MEM_COMMIT  | PAGE_READWRITE         |                                                         | 
 | Stack                  |     7f42d3570000 |     7f42d3d70000 |       8.00mb | MEM_PRIVATE | MEM_COMMIT  | PAGE_READWRITE         |                                                         | 
 | Stack                  |     7f42d3d71000 |     7f42d4571000 |       8.00mb | MEM_PRIVATE | MEM_COMMIT  | PAGE_READWRITE         |                                                         | 
 | Stack                  |     7f42d4572000 |     7f42d4d72000 |       8.00mb | MEM_PRIVATE | MEM_COMMIT  | PAGE_READWRITE         |                                                         | 
 | Stack                  |     7f42d4d73000 |     7f42d5573000 |       8.00mb | MEM_PRIVATE | MEM_COMMIT  | PAGE_READWRITE         |                                                         | 
 | Stack                  |     7f42d5574000 |     7f42d5d74000 |       8.00mb | MEM_PRIVATE | MEM_COMMIT  | PAGE_READWRITE         |                                                         | 
 | Stack                  |     7f42d5d75000 |     7f42d6575000 |       8.00mb | MEM_PRIVATE | MEM_COMMIT  | PAGE_READWRITE         |                                                         | 
 | Stack                  |     7f42d6d77000 |     7f42d7577000 |       8.00mb | MEM_PRIVATE | MEM_COMMIT  | PAGE_READWRITE         |                                                         | 
 | Stack                  |     7f42d7578000 |     7f42d7d78000 |       8.00mb | MEM_PRIVATE | MEM_COMMIT  | PAGE_READWRITE         |                                                         | 
 | Stack                  |     7f42d7d79000 |     7f42d8579000 |       8.00mb | MEM_PRIVATE | MEM_COMMIT  | PAGE_READWRITE         |                                                         | 
 | Stack                  |     7f42d857a000 |     7f42d8d7a000 |       8.00mb | MEM_PRIVATE | MEM_COMMIT  | PAGE_READWRITE         |                                                         | 
 ...
 +-------------------------------------------------------------------------+ 
 | Memory Type            |          Count |         Size |   Size (bytes) | 
 +-------------------------------------------------------------------------+ 
 | Stack                  |            788 |       6.28gb |  6,743,269,376 | 
 | GCHeap                 |             48 |     688.98mb |    722,448,384 | 
 | PAGE_READWRITE         |            930 |     180.22mb |    188,977,152 | 
 | Image                  |          1,278 |     159.69mb |    167,447,040 | 
 | HighFrequencyHeap      |            327 |      20.35mb |     21,336,064 | 
 | LowFrequencyHeap       |            259 |      18.31mb |     19,202,048 | 
 | LoaderCodeHeap         |             15 |      17.53mb |     18,378,752 | 
 | HostCodeHeap           |             11 |       1.51mb |      1,581,056 | 
 | ResolveHeap            |              1 |     348.00kb |        356,352 | 
 | PAGE_READONLY          |            123 |     261.50kb |        267,776 | 
 | DispatchHeap           |              1 |     196.00kb |        200,704 | 
 | IndirectionCellHeap    |              3 |     152.00kb |        155,648 | 
 | LookupHeap             |              3 |     144.00kb |        147,456 | 
 | CacheEntryHeap         |              2 |     100.00kb |        102,400 | 
 | PAGE_EXECUTE_WRITECOPY |              5 |      96.00kb |         98,304 | 
 | StubHeap               |              2 |      76.00kb |         77,824 | 
 | PAGE_EXECUTE_READ      |              2 |       8.00kb |          8,192 | 
 +-------------------------------------------------------------------------+ 
 | [TOTAL]                |          3,798 |       7.34gb |  7,884,054,528 | 
 +-------------------------------------------------------------------------+

From the hexagram, we can see that the current program has a total memory usage of 6.28gb, which is basically eaten up by the thread stack. What is even more unexpected is that this thread stack actually occupies 8M of memory space, which is really a bit large. Moreover, Linux does not have a reserved concept like Windows. The 8M here is a real pre-occupation. You can observe the memory address of this 8M, which is initialized to 0. This is unreasonable.

0:000> dp 7f42d256e000 7f42d2d6e000
...
00007f42`d2d6dfa0  00000000`00000000 00000000`00000000
00007f42`d2d6dfb0  00000000`00000000 00000000`00000000
00007f42`d2d6dfc0  00000000`00000000 00000000`00000000
00007f42`d2d6dfd0  00000000`00000000 00000000`00000000
00007f42`d2d6dfe0  00000000`00000000 00000000`00000000
00007f42`d2d6dff0  00000000`00000000 00000000`00000000
00007f42`d2d6e000  ????????`????????

How to modify the stack space size

Generally speaking, different operating system distributions have different default stack space configurations. You can first search the memory to see which distribution is currently in use. The method is to search for the main keywords of the operating system name.

0:000> s-a 0 L?0xffffffffffffffff "centos"
...
00005570`9cddbc18  63 65 6e 74 6f 73 2e 37-2d 78 36 34 00 00 00 00  centos.7-x64....
...

From the hexagram, you can see that the current operating system is centos7-x64. To modify the stack space size on the Windows platform, you can modify the PE header. There are two ways to do it on Linux.

Modify the ulimit -s parameter (not recommended)

root@ubuntu:/data# ulimit -s
8192
root@ubuntu:/data# ulimit -s 2048
root@ubuntu:/data# ulimit -s
2048

Modify the DOTNET_DefaultStackSize environment variable (recommended, configure the environment variable for the exception container)

DOTNET_DefaultStackSize=180000

For more information, please refer to the article: https://www.alexander-koepke.de/post/2023-10-18-til-dotnet-stack-size/

The above is the first direction to solve the problem. Next, let's talk about another direction. Why are there a total of 788 threads?

Why are there so many threads?

To find this answer, you need to see what each thread is doing at this time. This can be done using the windbg exclusive command.

0:000> ~*e !clrstack
...
OS Thread Id: 0x1b82 (225)
        Child SP               IP Call Site
00007F441B7FD660 00007f4cdbb69ad8 [HelperMethodFrame_1OBJ: 00007f441b7fd660] System.Threading.Monitor.ObjWait(Int32, System.Object)
00007F441B7FD790 00007f4c676318cd System.Threading.ManualResetEventSlim.Wait(Int32, System.Threading.CancellationToken) [/_/src/libraries/System.Private.CoreLib/src/System/Threading/ManualResetEventSlim.cs @ 570]
00007F441B7FD810 00007f4c676312e1 System.Net.Sockets.SocketAsyncContext.PerformSyncOperation[[System.__Canon, System.Private.CoreLib]](OperationQueue`1<System.__Canon> ByRef, System.__Canon, Int32, Int32) [/_/src/libraries/System.Net.Sockets/src/System/Net/Sockets/SocketAsyncContext.Unix.cs @ 1330]
00007F441B7FD8A0 00007f4c67e26ff1 System.Net.Sockets.SocketAsyncContext.ReceiveFrom(System.Memory`1, System.Net.Sockets.SocketFlags ByRef, Byte[], Int32 ByRef, Int32, Int32 ByRef) [/_/src/libraries/System.Net.Sockets/src/System/Net/Sockets/SocketAsyncContext.Unix.cs @ 1557]
00007F441B7FD920 00007f4c67e2ea6b System.Net.Sockets.SocketPal.Receive(System.Net.Sockets.SafeSocketHandle, Byte[], Int32, Int32, System.Net.Sockets.SocketFlags, Int32 ByRef)
00007F441B7FD9A0 00007f4c67e26c37 System.Net.Sockets.Socket.Receive(Byte[], Int32, Int32, System.Net.Sockets.SocketFlags, System.Net.Sockets.SocketError ByRef)
00007F441B7FDA20 00007f4c67e26929 System.Net.Sockets.NetworkStream.Read(Byte[], Int32, Int32) [/_/src/libraries/System.Net.Sockets/src/System/Net/Sockets/NetworkStream.cs @ 231]
00007F441B7FDA70 00007f4c69b85757 System.IO.BufferedStream.ReadByteSlow() [/_/src/libraries/System.Private.CoreLib/src/System/IO/BufferedStream.cs @ 771]
00007F441B7FDA90 00007f4c69b774e8 System.IO.BinaryReader.ReadByte() [/_/src/libraries/System.Private.CoreLib/src/System/IO/BinaryReader.cs @ 207]
00007F441B7FDAA0 00007f4c69b853ee RabbitMQ.Client.Impl.InboundFrame.ReadFrom(RabbitMQ.Util.NetworkBinaryReader)
00007F441B7FDAF0 00007f4c69b852c6 RabbitMQ.Client.Framing.Impl.Connection.MainLoopIteration()
00007F441B7FDB10 00007f4c69b57068 RabbitMQ.Client.Framing.Impl.Connection.MainLoop()
00007F441B7FDB50 00007f4c67590d19 System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) [/_/src/libraries/System.Private.CoreLib/src/System/Threading/ExecutionContext.cs @ 183]
00007F441B7FDCF0 00007f4cdb1e3aa7 [DebuggerU2MCatchHandlerFrame: 00007f441b7fdcf0] 
...

You can use regular dotnet-dump or procdump to capture. According to the above hexagram display, you can see a large number of link libraries related to RabbitMQ.Client.Framing.Impl. It is speculated that a large number of threads are stuck in RabbitMQ.Client.Framing.Impl.

With this knowledge, the following suggestions are given to friends:

Modify the DOTNET_DefaultStackSize parameter

You can follow the default 1.5M stack space setting of .netcore on Windows, because 8M is really too large and cannot be supported, and it is also inconsistent with the low memory usage of Linux. After the modification, the stress test reads the dump and observes that the configuration has taken effect

0:000> !sos maddress
Enumerating and tagging the entire address space and caching the result...
Subsequent runs of this command should be faster.
*** WARNING: Unable to verify timestamp for lttng-ust-wait-8-0
*** WARNING: Unable to verify timestamp for lttng-ust-wait-8
 +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 
 | Memory Kind            |        StartAddr |        EndAddr-1 |         Size | Type        | State       | Protect                | Image                                                   | 
 +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 
 .......
 | Stack                  |     7fabe4e8c000 |     7fabe500c000 |       1.50mb | MEM_PRIVATE | MEM_COMMIT  | PAGE_READWRITE         |                                                         | 
 | Stack                  |     7fabe500d000 |     7fabe518d000 |       1.50mb | MEM_PRIVATE | MEM_COMMIT  | PAGE_READWRITE         |                                                         | 
 | Stack                  |     7fabe518e000 |     7fabe530e000 |       1.50mb | MEM_PRIVATE | MEM_COMMIT  | PAGE_READWRITE         |                                                         | 
 | Stack                  |     7fabe530f000 |     7fabe548f000 |       1.50mb | MEM_PRIVATE | MEM_COMMIT  | PAGE_READWRITE         |                                                         | 
 | Stack                  |     7fabe5490000 |     7fabe5610000 |       1.50mb | MEM_PRIVATE | MEM_COMMIT  | PAGE_READWRITE         |                                                         | 
 | Stack                  |     7fabe5611000 |     7fabe5791000 |       1.50mb | MEM_PRIVATE | MEM_COMMIT  | PAGE_READWRITE         |                                                         | 
 | Stack                  |     7fabe5792000 |     7fabe5912000 |       1.50mb | MEM_PRIVATE | MEM_COMMIT  | PAGE_READWRITE         |                                                         | 
 | Stack                  |     7fabe5913000 |     7fabe5a93000 |       1.50mb | MEM_PRIVATE | MEM_COMMIT  | PAGE_READWRITE         |                                                         | 
 | Stack                  |     7fabe5a94000 |     7fabe5c14000 |       1.50mb | MEM_PRIVATE | MEM_COMMIT  | PAGE_READWRITE         |                                                         | 
 | Stack                  |     7fabe5c15000 |     7fabe5d95000 |       1.50mb | MEM_PRIVATE | MEM_COMMIT  | PAGE_READWRITE         |                                                         | 
 .......
 +-------------------------------------------------------------------------+ 
 | Memory Type            |          Count |         Size |   Size (bytes) | 
 +-------------------------------------------------------------------------+ 
 | Stack                  |            766 |       1.41gb |  1,518,571,520 | 
 | GCHeap                 |             48 |     702.39mb |    736,509,952 | 
 | PAGE_READWRITE         |            931 |     186.31mb |    195,358,720 | 
 | Image                  |          1,283 |     158.77mb |    166,480,384 | 
 | HighFrequencyHeap      |            336 |      20.97mb |     21,991,424 | 
 | LowFrequencyHeap       |            256 |      18.32mb |     19,214,336 | 
 | LoaderCodeHeap         |             15 |      17.53mb |     18,378,752 | 
 | HostCodeHeap           |             11 |       1.63mb |      1,703,936 | 
 | ResolveHeap            |              1 |     348.00kb |        356,352 | 
 | PAGE_READONLY          |            123 |     261.50kb |        267,776 | 
 | DispatchHeap           |              1 |     196.00kb |        200,704 | 
 | IndirectionCellHeap    |              3 |     152.00kb |        155,648 | 
 | LookupHeap             |              3 |     144.00kb |        147,456 | 
 | PAGE_EXECUTE_WRITECOPY |              5 |     132.00kb |        135,168 | 
 | CacheEntryHeap         |              2 |     100.00kb |        102,400 | 
 | StubHeap               |              2 |      76.00kb |         77,824 | 
 | PAGE_EXECUTE_READ      |              2 |       8.00kb |          8,192 | 
 +-------------------------------------------------------------------------+ 
 | [TOTAL]                |          3,788 |       2.50gb |  2,679,660,544 | 
 +-------------------------------------------------------------------------+

Observe the logic of RabbitMQ.Client.Framing.Impl in the project code

It is found that the reference is actually an invalid reference in the code. After deleting the reference and performing stress testing, it is found that the thread is normal.

Summary

The .NET debugging ecosystem on Linux is becoming increasingly rich, which is an exciting thing.