Considerations with using multiprocessing

Python’s multiprocessing module enables parallel processing by spawning new processes, within which your code can run. Once completed, the processes are joined back into the main process. This approach introduces a number of potential problems discussed below.

Potential considerations

Functions must support pickling

The method by which Python spawns new processes includes pickling the object using the pickle module. Certain types of classes and objects may not be able to be pickled, in which case using this module or the multiprocessing will fail.

Large outputs add a significant overhead

Since parallel processing is achieved by spawning new processes, which have their own memory, the output of the function needs to be transferred as well back to the main process. If this object is particularly large, the transfer process can become very slow. This can sometimes mean despite using multiple cores to speed up the computation, the data transfer ends up taking as long or longer to take place. The end result being no benefit from parallel processing, or worse an even longer wait time.

To circumvent this issue, parallelize offers the option to pickle the output to a temporary file using the cPickle module and the fastest pickling protocol. The files are then read back by the main process and merged into one file. This can sometimes offer a speed up, however the process of pickling an object is still relatively slow.

This option can be enabled using the argument write_to_file in the parallelize.parallel() function:

>>> from parallelize import parallelize

>>> def foo(iterable: list) -> int:
...    output = 0
...    for i in iterable:
...        output = i**4
...    return output

>>> parallelize.parallel(foo, numbers, n_jobs=6, write_to_file=True)