This is one of the 100+ free recipes of the IPython Cookbook, Second Edition, by Cyrille Rossant, a guide to numerical computing and data science in the Jupyter Notebook. The ebook and printed book are available for purchase at Packt Publishing.
As we have seen in this chapter's introduction, CPython's GIL prevents pure Python code from taking advantage of multi-core processors. With Cython, we have a way to release the GIL temporarily in a portion of the code in order to enable multi-core computing. This is done with OpenMP, a multiprocessing API that is supported by most C compilers.
In this recipe, we will see how to parallelize the previous recipe's code on multiple cores.
To enable OpenMP in Cython, you just need to specify some options to the compiler. There is nothing special to install on your computer besides a good C compiler. See the instructions in this chapter's introduction for more details.
The code of this recipe has been written for gcc on Ubuntu. It can be adapted to other systems with minor changes to the
How to do it...
Our simple ray tracing engine implementation is "embarrassingly parallel" (see https://en.wikipedia.org/wiki/Embarrassingly_parallel); there is a main loop over all pixels, within which the exact same function is called repetitively. There is no crosstalk between loop iterations. Therefore, it would be theoretically possible to execute all iterations in parallel.
Here, we will execute one loop (over all columns in the image) in parallel with OpenMP.
You will find the entire code on the book's website (
ray7 example). We will only show the most important steps here:
1. We use the following magic command:
%%cython --compile-args=-fopenmp --link-args=-fopenmp --force
2. We import the
from cython.parallel import prange
3. We add
nogil after each function definition in order to remove the GIL. We cannot use any Python variable or function inside a function annotated with
nogil. For example:
cdef Vec3 add(Vec3 x, Vec3 y) nogil: return vec3(x.x + y.x, x.y + y.y, x.z + y.z)
4. To run a loop in parallel over the cores with OpenMP, we use
with nogil: for i in prange(w): # ...
The GIL needs to be released before using any parallel computing feature such as
5. With these changes, we reach a 3x speedup on a quad-core processor compared to the fastest version of the previous recipe.
How it works...
The GIL has been described in the introduction of this chapter. The
nogil keyword tells Cython that a particular function or code section should be executed without the GIL. When the GIL is released, it is not possible to make any Python API calls, meaning that only C variables and C functions (declared with
cdef) can be used.
- Accelerating Python code with Cython
- Optimizing Cython code by writing less Python and more C
- Distributing Python code across multiple cores with IPython