Skip to content

Parallel builds of clang on Windows

David Tarditi edited this page Apr 24, 2017 · 13 revisions

Summary

We have found that it is easy to end up with too much build parallelism in Windows when using msbuild or Visual Studio to build clang or LLVM (see issue #268). This can lead to virtual memory paging, which can slow down your clang or LLVM build drastically.

Here are recommendeded settings for parallel builds of clang/LLVM using msbuild. These seem to provide a good trade-off between parallelism and memory usage.

msbuild /p:CL_CPUCount=6 /m

CL_MPCount is a build variable that controls the number of parallel compilations launched by the cl C++ compiler driver. We recommend setting it to 4 or 6, if you have at least 1 GByte of memory per CPU core on your machine. If you have less than 1 GByte of memory per CPU core (or a lot more memory than that), you should it to Total amount of memory on your machine / (# cores * 256 Mbytes). We recommend also using the /m switch (short for /maxcpucount), which will cause msbuild to try to create one build task per core.

If you find these are setting are not working well, read the details to figure what to do next. Note that parallel builds of debug versions of LLLVM bottleneck on LLVM table generation, which is very slow in debug builds (this is Amdahl's law law in action).

Details

CMake generates MSBuild files that set the /MP flag for the Microsoft Visual C++ compiler. When the C++ compiler is applied to a long list of files, it will launch as many compiler processes as there are CPU cores on your machine. For example,

cl A.c B.c, C.c D.c

causes the Microsoft Visual C++ compiler to launch 4 processes (one per file), if your machine as least 4 processors available (see this blog article). You can limit this in msbuild by setting the CL_MPCount property using the /p option. For example,

msbuild /p:CL_MPCount=nnn

where nnn is the maximum number of processes to launch.

When you use the /m:nnn option to msbuild, it launches as many build processes as are specified by nnn. If you use omit nnn and use only /m, it launches as many build processes as there are CPU cores on your machine. In our automated scripts, we were setting nnn to be 1/4 the number of CPU cores. The end result is that at times a quadratic number of compiler processes were being launched. If p is the number of processors, p^2/4 compilations were being launched. This caused build machines with ample amounts of memory to page. It is known that you can accidentally mis-use build parallelism with Visual Studio when using cmake: see the comments section for this Kitware CMake blog post.

The clang build does other things besides invoke the C++ compiler. It invokes tools like tblgen and builds libraries. There is lots of parallelism available in a clang/LLVM build, and the build system is better situated to recognize it and take advantage of it than invocations of the compiler driver. At the same time, the generated build system is invoking the C++ compiler with long lists of file arguments, so ratcheting individual build nodes down to no parallelism seems like a bad idea.

We need to make a trade-off: it seems better to set the build system parallelism to be high, and limit the individual compiler node parallelism to a constant. Modern OSes are very good at time-slicing CPU time across processes. They are not so good at time-slicing physical memory across processes. When there is more demand for physical memory than is actually available, this leads to virtual memory thrashing. Our goal is to create high CPU utilization while limiting memory usage to physical memory.

We have found that limiting the number of parallel C compiler processes spawned by the C++ compiler driver to 4 to 6 and setting the build system parallelism to the number of cores seems to achieve this. This is given 1 GB physical memory/core (166 MByte to 256 MBytes of memory/C++ compiler process). For 6 processes, you can do this by adding the following options to MSBuild:

msbuild /p:CL_CPUCount=6 /m

If you have less memory per core, you'll need to reduce the amount of build parallelism accordingly. We would suggest reducing the number of parallel C compiler launches.