This post was published 112 days ago. The infomation described in this article may have changed.
The Rust compiler's front-end can now use parallel execution to significantly
reduce compile times. To try it, run the nightly compiler with the
-Z threads=8 option. This feature is currently experimental, and we aim to ship
it in the stable compiler in 2024.
Keep reading to learn why a parallel front-end is needed and how it works, or just skip ahead to the How to use it section.
Rust compile times are a perennial concern. The Compiler Performance Working Group has continually improved compiler performance for several years. For example, in the first 10 months of 2023, there were mean reductions in compile time of 13%, in peak memory use of 15%, and in binary size of 7%, as measured by our performance suite.
However, at this point the compiler has been heavily optimized and new improvements are hard to find. There is no low-hanging fruit remaining.
But there is one piece of large but high-hanging fruit: parallelism. Current Rust compiler users benefit from two kinds of parallelism, and the newly parallel front-end adds a third kind.
When you compile a Rust program, Cargo launches multiple rustc processes,
compiling multiple crates in parallel. This works well. Try compiling a large
Rust program with the
-j1 flag to disable this parallelization and it will
take a lot longer than normal.
You can visualise this parallelism if you build with Cargo's
which produces a chart showing how the crates are compiled. The following image
shows the timeline when building ripgrep on
a machine with 28 virtual cores.
There are 60 horizontal lines, each one representing a distinct process. Their durations range from a fraction of a second to multiple seconds. Most of them are rustc, and the few orange ones are build scripts. The first twenty run all start at the same time. This is possible because there are no dependencies between the relevant crates. But further down the graph, parallelism reduces as crate dependencies increase. Although the compiler can overlap compilation of dependent crates somewhat thanks to a feature called pipelined compilation, there is much less parallel execution happening towards the end of compilation, and this is typical for large Rust programs. Interprocess parallelism is not enough to take full advantage of many cores. For more speed, we need parallelism within each process.
The compiler is split into two halves: the front-end and the back-end.
The front-end does many things, including parsing, type checking, and borrow checking. Until this week, it could not use parallel execution.
The back-end performs code generation. It generates code in chunks called "codegen units" and then LLVM processes these in parallel. This is a form of coarse-grained parallelism.
We can visualize the difference between the serial front-end and the parallel back-end. The following image shows the output of a profiler called Samply measuring rustc as it does a release build of the final crate in Cargo. The image is superimposed with markers that indicate front-end and back-end execution.
Each horizontal line represents a thread. The main thread is labelled "rustc" and is shown at the bottom. It is busy for most of the execution. The other 16 threads are LLVM threads, labelled "opt cgu.00" through to "opt cgu.15". There are 16 threads because 16 is the default number of codegen units for a release build.
There are several things worth noting.
The front-end is now capable of parallel execution. It uses Rayon to perform compilation tasks using fine-grained parallelism. Many data structures are synchronized by mutexes and read-write locks, atomic types are used where appropriate, and many front-end operations are made parallel. The addition of parallelism was done by modifying a relatively small number of key points in the code. The vast majority of the front-end code did not need to be changed.
When the parallel front-end is enabled and configured to use eight threads, we get the following Samply profile when compiling the same example as before.
Again, there are several things worth nothing.
Rust compilation has long benefited from interprocess parallelism, via Cargo, and from intraprocess parallelism in the back-end. It can now also benefit from intraprocess parallelism in the front-end.
You might wonder how interprocess parallelism and intraprocess parallelism interact. If we have 20 parallel rustc invocations and each one can have up to 16 threads running, could we end up with hundreds of threads on a machine with only tens of cores, resulting in inefficient execution as the OS tries its best to schedule them?
Fortunately no. The compiler uses the jobserver protocol to limit the number of threads it creates. If a lot of interprocess parallelism is occuring, intraprocess parallelism will be limited appropriately, and the number of threads will not exceed the number of cores.
The nightly compiler is now shipping with the parallel front-end enabled. However, by default it runs in single-threaded mode and won't reduce compile times.
Keen users can opt into multi-threaded mode with the
-Z threads option. For
$ RUSTFLAGS="-Z threads=8" cargo build --release
Alternatively, to opt in from a config.toml file (for one or more projects), add these lines:
rustflags = ["-Z", "threads=8"]
It may be surprising that single-threaded mode is the default. Why parallelize the front-end and then run it in single-threaded mode? The answer is simple: caution. This is a big change! The parallel front-end has a lot of new code. Single-threaded mode exercises most of the new code, but excludes the possibility of threading bugs such as deadlocks that can affect multi-threaded mode. Even in Rust, parallel programs are harder to write correctly than serial programs. For this reason the parallel front-end also won't be shipped in beta or stable releases for some time.
When the parallel front-end is run in single-threaded mode, compilation times are typically 0% to 2% slower than with the serial front-end. This should be barely noticeable.
When the parallel front-end is run in multi-threaded mode with
our measurements on real-world
code show that compile
times can be reduced by up to 50%, though the effects vary widely and depend on
the characteristics of the code and its build configuration. For example, dev
builds are likely to see bigger improvements than release builds because
release builds usually spend more time doing optimizations in the back-end. A
small number of cases compile more slowly in multi-threaded mode than
single-threaded mode. These are mostly tiny programs that already compile
We recommend eight threads because this is the configuration we have tested the most and it is known to give good results. Values lower than eight will see smaller benefits. Values greater than eight will give diminishing returns and may even give worse performance.
If a 50% improvement seems low when going from one to eight threads, recall from the explanation above that the front-end only accounts for part of compile times, and the back-end is already parallel. You can't beat Amdahl's Law.
Memory usage can increase significantly in multi-threaded mode. We have seen increases of up to 35%. This is unsurprising given that various parts of compilation, each of which requires a certain amount of memory, are now executing in parallel.
Reliability in single-threaded mode should be high.
In multi-threaded mode there are some known bugs, including deadlocks. If compilation hangs, you have probably hit one of them.
If you have any problems with the parallel front-end, please check the issues marked with the "WG-compiler-parallel" label. If your problem does not match any of the existing issues, please file a new issue.
For more general feedback, please start a discussion on the wg-parallel-rustc Zulip channel. We are particularly interested to hear the performance effects on the code you care about.
We are working to improve the performance of the parallel front-end. As the graphs above showed, there is room to improve the utilization of the threads in the front-end. We are also ironing out the remaining bugs in multi-threaded mode.
We aim to stabilize the
-Z threads option and ship the parallel front-end
running by default in multi-threaded mode on stable releases in 2024.
The parallel front-end has been under development for a long time. It was started by @Zoxc, who also did most of the work for several years. After a period of inactivity, the project was revived this year by @SparrowLii, who led the effort to get it shipped. Other members of the Parallel Rustc Working Group have also been involved with reviews and other activities. Many thanks to everyone involved.🏷️ Rust_feed