5.7. Releasing the GIL to take advantage of multi-core processors with Cython and OpenMP
This is one of the 100+ free recipes of the IPython Cookbook, Second Edition, by Cyrille Rossant, a guide to numerical computing and data science in the Jupyter Notebook. The ebook and printed book are available for purchase at Packt Publishing.
▶ Text on GitHub with a CC-BY-NC-ND license
▶ Code on GitHub with a MIT license
▶ Go to Chapter 5 : High-Performance Computing
▶ Get the Jupyter notebook
As we have seen in this chapter's introduction, CPython's GIL prevents pure Python code from taking advantage of multi-core processors. With Cython, we have a way to release the GIL temporarily in a portion of the code in order to enable multi-core computing. This is done with OpenMP, a multiprocessing API that is supported by most C compilers.
In this recipe, we will see how to parallelize the previous recipe's code on multiple cores.
Getting ready
To enable OpenMP in Cython, you just need to specify some options to the compiler. There is nothing special to install on your computer besides a good C compiler. See the instructions in this chapter's introduction for more details.
The code of this recipe has been written for gcc on Ubuntu. It can be adapted to other systems with minor changes to the %%cython
options.
How to do it...
Our simple ray tracing engine implementation is "embarrassingly parallel" (see https://en.wikipedia.org/wiki/Embarrassingly_parallel); there is a main loop over all pixels, within which the exact same function is called repetitively. There is no crosstalk between loop iterations. Therefore, it would be theoretically possible to execute all iterations in parallel.
Here, we will execute one loop (over all columns in the image) in parallel with OpenMP.
You will find the entire code on the book's website (ray7
example). We will only show the most important steps here:
1. We use the following magic command:
%%cython --compile-args=-fopenmp --link-args=-fopenmp --force
2. We import the prange()
function:
from cython.parallel import prange
3. We add nogil
after each function definition in order to remove the GIL. We cannot use any Python variable or function inside a function annotated with nogil
. For example:
cdef Vec3 add(Vec3 x, Vec3 y) nogil:
return vec3(x.x + y.x, x.y + y.y, x.z + y.z)
4. To run a loop in parallel over the cores with OpenMP, we use prange()
:
with nogil:
for i in prange(w):
# ...
The GIL needs to be released before using any parallel computing feature such as prange()
.
5. With these changes, we reach a 3x speedup on a quad-core processor compared to the fastest version of the previous recipe.
How it works...
The GIL has been described in the introduction of this chapter. The nogil
keyword tells Cython that a particular function or code section should be executed without the GIL. When the GIL is released, it is not possible to make any Python API calls, meaning that only C variables and C functions (declared with cdef
) can be used.
See also
- Accelerating Python code with Cython
- Optimizing Cython code by writing less Python and more C
- Distributing Python code across multiple cores with IPython