Engineers, scientists, and financial analysts frequently use optimization methods to solve computationally expensive problems such as simulation of complex systems, high-dimensional data analysis or data compression. Optimisation of algorithms can dramatically improve the performance of computation – even order or two orders of magnitude. One of the methods of optimisation is parallel computing. To derive the full gains of parallelization, it is important to choose an approach that is appropriate for the computation problem.

One of the popular educational problems is finding the primes numbers. Usually, the task is approached using traditional sequential programming. In this tutorial, it will be shown that the time needed for computation can be reduced if for the solution to the problem we use multiple processes working simultaneously. The solution is implemented in Python.

There are some important considerations when using parallelization. The most import issue is to translate the sequential problem into a parallel. Finding prime numbers at the first glance seems to be an iterative task which requires pulling a number, checking if it is a prime or not (isprime function), then taking another number etc. However, we may notice that the numbers don’t have to be pulled one after another. Actually, they can be drawn in any order which means that the computations can be done independently for every pulled number. This observation can be translated into an algorithm which divides the initial sequence of the numbers into separate sets. Then, each set can be processed independently by a worker. But, how many sets should we have? A number of workers or processes which can handle tasks is dependent on the physical cores of the computer processor that is being used. Generally, we will see no improvements after the number of processes in the algorithm exceed the number of physical cores in the CPU (unless the computer has many CPUs). In the code, we assumed that we have 4 processes which are fixed by a CORE constant. The assumption is quite conservative as most of the contemporary CPUs have more than 4 cores, however 4 processes for demonstrative purposes is sufficient

The code comprises the main block and 2 supportive functions: isprime which check if the number if prime and sum_primes which counts the prime numbers. The intention of the latter function is to have a simple validation for two scenarios implemented in the main code: single- and multiprocess. Additionally, sum_primes is also a function which is executed independently by spawned processes which is the core functionality of the presented code. For this purpose, I employed ‘multiprocessing’ package which offers both local and remote concurrency by using subprocesses instead of threads. Due to this, the multiprocessing module allows the programmer to fully leverage multiple processors on a given machine. After spawning the processes, each of them performs its computation on a given sets of numbers. The main process calling the process.wait() waits for each worker to finish its computations. The results are shared in an array accessible for every process as we assume there is shared memory.