As the end of 2024 approaches, I am curious to see how things have changed over the past year, with the release of various new versions of programming languages.
Let’s run the benchmark again and see the results!
Benchmark
The program used for benchmarking is the same as last year:
We start N concurrent tasks, each task waits for 10 seconds, and then the program exits after all tasks are completed. The number of tasks is controlled by a command-line parameter.
This time, let's focus on coroutines instead of multithreading.
All benchmark code can be found in async-runtimes-benchmarks-2024
.
What are Coroutines?
A coroutine is a component of a computer program that can pause and resume execution. This makes it more flexible than traditional threads, particularly suited for multitasking operations that require cooperation, such as implementing task coordination, exception handling, event loops, iterators, infinite lists, and data pipelines.
Rust
I created two programs in Rust. One uses tokio
:
use std::env; use tokio::time::{sleep, Duration}; #[tokio::main] async fn main() { let args: Vec<String> = env::args().collect(); let num_tasks = args[1].parse::<i32>().unwrap(); let mut tasks = Vec::new(); for _ in 0..num_tasks { tasks.push(sleep(Duration::from_secs(10))); } futures::future::join_all(tasks).await; }
And the other uses async_std
:
use std::env; use async_std::task; use futures::future::join_all; use std::time::Duration; #[async_std::main] async fn main() { let args: Vec<String> = env::args().collect(); let num_tasks = args[1].parse::<usize>().unwrap(); let mut tasks = Vec::new(); for _ in 0..num_tasks { tasks.push(task::sleep(Duration::from_secs(10))); } join_all(tasks).await; }
Both are commonly used asynchronous runtimes in Rust.
C#
C#, similar to Rust, provides first-class support for async/await
:
int numTasks = int.Parse(args[0]); List<Task> tasks = new List<Task>(); for (int i = 0; i < numTasks; i++) { tasks.Add(Task.Delay(TimeSpan.FromSeconds(10))); } await Task.WhenAll(tasks);
Since .NET 7, .NET also offers NativeAOT compilation, which compiles code directly into the final binary, so there is no need for a VM to run managed code. Therefore, we also added NativeAOT benchmark tests.
Node.js
Node.js is similar:
const util = require('util'); const delay = util.promisify(setTimeout); async function runTasks(numTasks) { const tasks = []; for (let i = 0; i < numTasks; i++) { tasks.push(delay(10000)); } await Promise.all(tasks); } const numTasks = parseInt(process.argv[2]); runTasks(numTasks);
Python
And Python:
import asyncio import sys async def main(num_tasks): tasks = [] for task_id in range(num_tasks): tasks.append(asyncio.sleep(10)) await asyncio.gather(*tasks) if __name__ == "__main__": num_tasks = int(sys.argv[1]) asyncio.run(main(num_tasks))
Go
In Go, goroutines are key to concurrency. We don’t need to wait for each goroutine individually; instead, we use a WaitGroup
to manage them:
package main import ( "fmt" "os" "strconv" "sync" "time" ) func main() { numRoutines, _ := strconv.Atoi(os.Args[1]) var wg sync.WaitGroup for i := 0; i < numRoutines; i++ { wg.Add(1) go func() { defer wg.Done() time.Sleep(10 * time.Second) }() } wg.Wait() }
Java
Starting from JDK 21, Java provides virtual threads, which are similar to coroutines:
import java.time.Duration; import java.util.ArrayList; import java.util.List; public class VirtualThreads { public static void main(String[] args) throws InterruptedException { int numTasks = Integer.parseInt(args[0]); List<Thread> threads = new ArrayList<>(); for (int i = 0; i < numTasks; i++) { Thread thread = Thread.startVirtualThread(() -> { try { Thread.sleep(Duration.ofSeconds(10)); } catch (InterruptedException e) { // Handle exception } }); threads.add(thread); } for (Thread thread : threads) { thread.join(); } } }
Java also has a new JVM variant called GraalVM, which provides native images, similar to .NET’s NativeAOT. Therefore, we also added benchmark tests for GraalVM.
Test Environment
Hardware: Intel® Core™ i7-13700K 13th Gen
Operating System: Debian GNU/Linux 12 (bookworm)
Rust: 1.82.0
.NET: 9.0.100
Go: 1.23.3
Java: openjdk 23.0.1 build 23.0.1+11-39
Java (GraalVM): java 23.0.1 build 23.0.1+11-jvmci-b01
Node.js: v23.2.0
Python: 3.13.0
If available, all programs were started in release mode, and internationalization and globalization support were disabled due to the absence of libicu
in our test environment.
Results
Minimum Memory Usage
Let's start small, as some runtimes need some memory by default. We start by launching just one task.
We can see that Rust, C# (NativeAOT), and Go all have similar results, as they are statically compiled into native binaries and require very little memory. Java (GraalVM native-image) also performs well, but uses slightly more memory than the other statically compiled programs. Programs that run on managed platforms or via interpreters consume more memory.
In this case, Go seems to have the least memory usage.
The results for Java using GraalVM were a bit surprising, as it consumed more memory than OpenJDK’s Java, but I guess this can be optimized with some settings.
10,000 Tasks
Here are some surprises! Both Rust benchmark tests performed exceptionally well. Even with 10,000 tasks running in the background, they used very little memory, with only a small increase in memory usage compared to the minimum memory usage! C# (NativeAOT) was close behind, using only around 10MB of memory. We need more tasks to really stress them!
Go's memory consumption increased significantly. Goroutines are supposed to be lightweight, but in practice, they consume much more RAM than Rust. In this case, virtual threads in Java (GraalVM native image) seem to be more lightweight than Go's goroutines. What surprised me was that both Go and Java (GraalVM native image), the programs compiled into native binaries, consumed more RAM than C# running on the VM!
100,000 Tasks
Once we increased the number of tasks to 100,000, memory consumption for all languages started to rise significantly.
Rust and C# both performed well in this case. A big surprise is that C# (NativeAOT) even consumed less RAM than Rust, outperforming all other languages. Very impressive!
At this point, Go not only got beaten by Rust but also by Java (except for the one running on GraalVM), C#, and NodeJS.
1 Million Tasks
Now, let's go to the extreme.
Ultimately, C# undoubtedly beat all other languages; it was incredibly competitive, really becoming a monster. As expected, Rust continued to perform excellently in memory efficiency.
Go's gap with the other languages widened. Now, Go consumes more than 13 times the memory of the champion. It also consumes more than twice the memory of Java, which contradicts the general perception that JVMs are memory-heavy while Go is lightweight.
Summary
As we observed, running large numbers of concurrent tasks, even without performing complex operations, consumes a lot of memory. Different language runtimes have different trade-offs: some are lightweight and efficient for small numbers of tasks but struggle with scalability when handling hundreds of thousands of tasks.
A lot has changed since last year. From the latest compiler and runtime benchmark results, we can see that .NET has made huge improvements, and .NET with NativeAOT can really compete with Rust. Java native images built with GraalVM also perform excellently in terms of memory efficiency. However, Go's goroutines continue to underperform in terms of resource consumption.