In September, while stress-testing a company-deployed .NET project on a Linux environment, an issue was discovered during a 50-concurrent user load test. A specific container showed abnormal thread and memory increases, leading to malfunctions. Through the analysis of threads and memory, this article explores the investigation using debugging tools, particularly focusing on the visual interface of WinDbg.
Story:During the stress test of a company project deployed on a certain disk in September, an abnormal surge in a container's thread and memory usage was detected. This led to functional issues and system malfunctions. Leveraging my knowledge, I decided to analyze the thread and memory behavior. While tools like lldb
or windbg
can be used for such debugging, I prefer WinDbg's graphical interface. Thus, I proceeded with WinDbg.
WinDbg Analysis
Where is the memory leak?On the Windows platform, many may be familiar with the !address -summary
command to observe memory leaks. However, this is specific to Windows and doesn't work well with Linux dumps. Below is an example output from the command:
0:000> !address -summary Mapping file section regions... Mapping module regions... Mapping heap regions... --- Usage Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal <unknown> 4062 ffffffff`f5638600 ( 16.000 EB) 100.00% 100.00% Image 1282 0`09fc8a00 ( 159.784 MB) 0.00% 0.00% --- Type Summary (for busy) ------ RgnCount ----------- Total Size -------- %ofBusy %ofTotal 2431 fffffffe`2b813000 ( 16.000 EB) 100.00% MEM_PRIVATE 2913 1`d3dee000 ( 7.310 GB) 0.00% 0.00% --- State Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal 2431 fffffffe`2b813000 ( 16.000 EB) 100.00% 100.00% MEM_COMMIT 2913 1`d3dee000 ( 7.310 GB) 0.00% 0.00% --- Protect Summary (for commit) - RgnCount ----------- Total Size -------- %ofBusy %ofTotal PAGE_READWRITE 2115 1`cb683000 ( 7.178 GB) 0.00% 0.00% PAGE_EXECUTE_READ 175 0`03d49000 ( 61.285 MB) 0.00% 0.00% PAGE_READONLY 585 0`03ce9000 ( 60.910 MB) 0.00% 0.00% PAGE_EXECUTE_WRITECOPY 38 0`00d39000 ( 13.223 MB) 0.00% 0.00% --- Largest Region by Usage ----------- Base Address -------- Region Size ---------- <unknown> 7ffc`011fa000 ffff8003`fe406000 ( 16.000 EB) Image 7f45`fe4e9000 0`01b16000 ( 27.086 MB)
The memory segment classification in the hexagram is not very useful and has little reference value, so what can we do? In fact, the coreclr team has also considered this situation and provided a maddress command to implement cross-platform !address. The output after the change is as follows:
0:000> !sos maddress Enumerating and tagging the entire address space and caching the result... Subsequent runs of this command should be faster. +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Memory Kind | StartAddr | EndAddr-1 | Size | Type | State | Protect | Image | +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Stack | 7f42d256e000 | 7f42d2d6e000 | 8.00mb | MEM_PRIVATE | MEM_COMMIT | PAGE_READWRITE | | | Stack | 7f42d3570000 | 7f42d3d70000 | 8.00mb | MEM_PRIVATE | MEM_COMMIT | PAGE_READWRITE | | | Stack | 7f42d3d71000 | 7f42d4571000 | 8.00mb | MEM_PRIVATE | MEM_COMMIT | PAGE_READWRITE | | | Stack | 7f42d4572000 | 7f42d4d72000 | 8.00mb | MEM_PRIVATE | MEM_COMMIT | PAGE_READWRITE | | | Stack | 7f42d4d73000 | 7f42d5573000 | 8.00mb | MEM_PRIVATE | MEM_COMMIT | PAGE_READWRITE | | | Stack | 7f42d5574000 | 7f42d5d74000 | 8.00mb | MEM_PRIVATE | MEM_COMMIT | PAGE_READWRITE | | | Stack | 7f42d5d75000 | 7f42d6575000 | 8.00mb | MEM_PRIVATE | MEM_COMMIT | PAGE_READWRITE | | | Stack | 7f42d6d77000 | 7f42d7577000 | 8.00mb | MEM_PRIVATE | MEM_COMMIT | PAGE_READWRITE | | | Stack | 7f42d7578000 | 7f42d7d78000 | 8.00mb | MEM_PRIVATE | MEM_COMMIT | PAGE_READWRITE | | | Stack | 7f42d7d79000 | 7f42d8579000 | 8.00mb | MEM_PRIVATE | MEM_COMMIT | PAGE_READWRITE | | | Stack | 7f42d857a000 | 7f42d8d7a000 | 8.00mb | MEM_PRIVATE | MEM_COMMIT | PAGE_READWRITE | | ... +-------------------------------------------------------------------------+ | Memory Type | Count | Size | Size (bytes) | +-------------------------------------------------------------------------+ | Stack | 788 | 6.28gb | 6,743,269,376 | | GCHeap | 48 | 688.98mb | 722,448,384 | | PAGE_READWRITE | 930 | 180.22mb | 188,977,152 | | Image | 1,278 | 159.69mb | 167,447,040 | | HighFrequencyHeap | 327 | 20.35mb | 21,336,064 | | LowFrequencyHeap | 259 | 18.31mb | 19,202,048 | | LoaderCodeHeap | 15 | 17.53mb | 18,378,752 | | HostCodeHeap | 11 | 1.51mb | 1,581,056 | | ResolveHeap | 1 | 348.00kb | 356,352 | | PAGE_READONLY | 123 | 261.50kb | 267,776 | | DispatchHeap | 1 | 196.00kb | 200,704 | | IndirectionCellHeap | 3 | 152.00kb | 155,648 | | LookupHeap | 3 | 144.00kb | 147,456 | | CacheEntryHeap | 2 | 100.00kb | 102,400 | | PAGE_EXECUTE_WRITECOPY | 5 | 96.00kb | 98,304 | | StubHeap | 2 | 76.00kb | 77,824 | | PAGE_EXECUTE_READ | 2 | 8.00kb | 8,192 | +-------------------------------------------------------------------------+ | [TOTAL] | 3,798 | 7.34gb | 7,884,054,528 | +-------------------------------------------------------------------------+
From the hexagram, we can see that the current program has a total memory usage of 6.28gb, which is basically eaten up by the thread stack. What is even more unexpected is that this thread stack actually occupies 8M of memory space, which is really a bit large. Moreover, Linux does not have a reserved concept like Windows. The 8M here is a real pre-occupation. You can observe the memory address of this 8M, which is initialized to 0. This is unreasonable.
0:000> dp 7f42d256e000 7f42d2d6e000 ... 00007f42`d2d6dfa0 00000000`00000000 00000000`00000000 00007f42`d2d6dfb0 00000000`00000000 00000000`00000000 00007f42`d2d6dfc0 00000000`00000000 00000000`00000000 00007f42`d2d6dfd0 00000000`00000000 00000000`00000000 00007f42`d2d6dfe0 00000000`00000000 00000000`00000000 00007f42`d2d6dff0 00000000`00000000 00000000`00000000 00007f42`d2d6e000 ????????`????????
How to modify the stack space size
Generally speaking, different operating system distributions have different default stack space configurations. You can first search the memory to see which distribution is currently in use. The method is to search for the main keywords of the operating system name.
0:000> s-a 0 L?0xffffffffffffffff "centos" ... 00005570`9cddbc18 63 65 6e 74 6f 73 2e 37-2d 78 36 34 00 00 00 00 centos.7-x64.... ...
From the hexagram, you can see that the current operating system is centos7-x64. To modify the stack space size on the Windows platform, you can modify the PE header. There are two ways to do it on Linux.
Modify the ulimit -s parameter (not recommended)
root@ubuntu:/data# ulimit -s 8192 root@ubuntu:/data# ulimit -s 2048 root@ubuntu:/data# ulimit -s 2048
Modify the DOTNET_DefaultStackSize environment variable (recommended, configure the environment variable for the exception container)
DOTNET_DefaultStackSize=180000
For more information, please refer to the article: https://www.alexander-koepke.de/post/2023-10-18-til-dotnet-stack-size/
The above is the first direction to solve the problem. Next, let's talk about another direction. Why are there a total of 788 threads?
Why are there so many threads?
To find this answer, you need to see what each thread is doing at this time. This can be done using the windbg exclusive command.
0:000> ~*e !clrstack ... OS Thread Id: 0x1b82 (225) Child SP IP Call Site 00007F441B7FD660 00007f4cdbb69ad8 [HelperMethodFrame_1OBJ: 00007f441b7fd660] System.Threading.Monitor.ObjWait(Int32, System.Object) 00007F441B7FD790 00007f4c676318cd System.Threading.ManualResetEventSlim.Wait(Int32, System.Threading.CancellationToken) [/_/src/libraries/System.Private.CoreLib/src/System/Threading/ManualResetEventSlim.cs @ 570] 00007F441B7FD810 00007f4c676312e1 System.Net.Sockets.SocketAsyncContext.PerformSyncOperation[[System.__Canon, System.Private.CoreLib]](OperationQueue`1<System.__Canon> ByRef, System.__Canon, Int32, Int32) [/_/src/libraries/System.Net.Sockets/src/System/Net/Sockets/SocketAsyncContext.Unix.cs @ 1330] 00007F441B7FD8A0 00007f4c67e26ff1 System.Net.Sockets.SocketAsyncContext.ReceiveFrom(System.Memory`1, System.Net.Sockets.SocketFlags ByRef, Byte[], Int32 ByRef, Int32, Int32 ByRef) [/_/src/libraries/System.Net.Sockets/src/System/Net/Sockets/SocketAsyncContext.Unix.cs @ 1557] 00007F441B7FD920 00007f4c67e2ea6b System.Net.Sockets.SocketPal.Receive(System.Net.Sockets.SafeSocketHandle, Byte[], Int32, Int32, System.Net.Sockets.SocketFlags, Int32 ByRef) 00007F441B7FD9A0 00007f4c67e26c37 System.Net.Sockets.Socket.Receive(Byte[], Int32, Int32, System.Net.Sockets.SocketFlags, System.Net.Sockets.SocketError ByRef) 00007F441B7FDA20 00007f4c67e26929 System.Net.Sockets.NetworkStream.Read(Byte[], Int32, Int32) [/_/src/libraries/System.Net.Sockets/src/System/Net/Sockets/NetworkStream.cs @ 231] 00007F441B7FDA70 00007f4c69b85757 System.IO.BufferedStream.ReadByteSlow() [/_/src/libraries/System.Private.CoreLib/src/System/IO/BufferedStream.cs @ 771] 00007F441B7FDA90 00007f4c69b774e8 System.IO.BinaryReader.ReadByte() [/_/src/libraries/System.Private.CoreLib/src/System/IO/BinaryReader.cs @ 207] 00007F441B7FDAA0 00007f4c69b853ee RabbitMQ.Client.Impl.InboundFrame.ReadFrom(RabbitMQ.Util.NetworkBinaryReader) 00007F441B7FDAF0 00007f4c69b852c6 RabbitMQ.Client.Framing.Impl.Connection.MainLoopIteration() 00007F441B7FDB10 00007f4c69b57068 RabbitMQ.Client.Framing.Impl.Connection.MainLoop() 00007F441B7FDB50 00007f4c67590d19 System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) [/_/src/libraries/System.Private.CoreLib/src/System/Threading/ExecutionContext.cs @ 183] 00007F441B7FDCF0 00007f4cdb1e3aa7 [DebuggerU2MCatchHandlerFrame: 00007f441b7fdcf0] ...
You can use regular dotnet-dump or procdump to capture. According to the above hexagram display, you can see a large number of link libraries related to RabbitMQ.Client.Framing.Impl. It is speculated that a large number of threads are stuck in RabbitMQ.Client.Framing.Impl.
With this knowledge, the following suggestions are given to friends:
Modify the DOTNET_DefaultStackSize parameter
You can follow the default 1.5M stack space setting of .netcore on Windows, because 8M is really too large and cannot be supported, and it is also inconsistent with the low memory usage of Linux. After the modification, the stress test reads the dump and observes that the configuration has taken effect
0:000> !sos maddress Enumerating and tagging the entire address space and caching the result... Subsequent runs of this command should be faster. *** WARNING: Unable to verify timestamp for lttng-ust-wait-8-0 *** WARNING: Unable to verify timestamp for lttng-ust-wait-8 +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Memory Kind | StartAddr | EndAddr-1 | Size | Type | State | Protect | Image | +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ ....... | Stack | 7fabe4e8c000 | 7fabe500c000 | 1.50mb | MEM_PRIVATE | MEM_COMMIT | PAGE_READWRITE | | | Stack | 7fabe500d000 | 7fabe518d000 | 1.50mb | MEM_PRIVATE | MEM_COMMIT | PAGE_READWRITE | | | Stack | 7fabe518e000 | 7fabe530e000 | 1.50mb | MEM_PRIVATE | MEM_COMMIT | PAGE_READWRITE | | | Stack | 7fabe530f000 | 7fabe548f000 | 1.50mb | MEM_PRIVATE | MEM_COMMIT | PAGE_READWRITE | | | Stack | 7fabe5490000 | 7fabe5610000 | 1.50mb | MEM_PRIVATE | MEM_COMMIT | PAGE_READWRITE | | | Stack | 7fabe5611000 | 7fabe5791000 | 1.50mb | MEM_PRIVATE | MEM_COMMIT | PAGE_READWRITE | | | Stack | 7fabe5792000 | 7fabe5912000 | 1.50mb | MEM_PRIVATE | MEM_COMMIT | PAGE_READWRITE | | | Stack | 7fabe5913000 | 7fabe5a93000 | 1.50mb | MEM_PRIVATE | MEM_COMMIT | PAGE_READWRITE | | | Stack | 7fabe5a94000 | 7fabe5c14000 | 1.50mb | MEM_PRIVATE | MEM_COMMIT | PAGE_READWRITE | | | Stack | 7fabe5c15000 | 7fabe5d95000 | 1.50mb | MEM_PRIVATE | MEM_COMMIT | PAGE_READWRITE | | ....... +-------------------------------------------------------------------------+ | Memory Type | Count | Size | Size (bytes) | +-------------------------------------------------------------------------+ | Stack | 766 | 1.41gb | 1,518,571,520 | | GCHeap | 48 | 702.39mb | 736,509,952 | | PAGE_READWRITE | 931 | 186.31mb | 195,358,720 | | Image | 1,283 | 158.77mb | 166,480,384 | | HighFrequencyHeap | 336 | 20.97mb | 21,991,424 | | LowFrequencyHeap | 256 | 18.32mb | 19,214,336 | | LoaderCodeHeap | 15 | 17.53mb | 18,378,752 | | HostCodeHeap | 11 | 1.63mb | 1,703,936 | | ResolveHeap | 1 | 348.00kb | 356,352 | | PAGE_READONLY | 123 | 261.50kb | 267,776 | | DispatchHeap | 1 | 196.00kb | 200,704 | | IndirectionCellHeap | 3 | 152.00kb | 155,648 | | LookupHeap | 3 | 144.00kb | 147,456 | | PAGE_EXECUTE_WRITECOPY | 5 | 132.00kb | 135,168 | | CacheEntryHeap | 2 | 100.00kb | 102,400 | | StubHeap | 2 | 76.00kb | 77,824 | | PAGE_EXECUTE_READ | 2 | 8.00kb | 8,192 | +-------------------------------------------------------------------------+ | [TOTAL] | 3,788 | 2.50gb | 2,679,660,544 | +-------------------------------------------------------------------------+
Observe the logic of RabbitMQ.Client.Framing.Impl in the project code
It is found that the reference is actually an invalid reference in the code. After deleting the reference and performing stress testing, it is found that the thread is normal.
Summary
The .NET debugging ecosystem on Linux is becoming increasingly rich, which is an exciting thing.