API

parallelize.parallelize.capture_output(func: Callable, iterable: list, index: int, queue: multiprocessing.context.BaseContext.Queue, write_to_file: bool = False, *args, **kwargs) → None[source]

Captures the output of the function to be parallelised. If specified, output is saved to a temporary file, which is later read in by the master process, to be ultimately merged into one object.

Parameters:
  • func (Callable) – Function which is to be parallelised.
  • iterable (list) – List which func will operate over.
  • index (int) – Index number of current process. Used to sort order of results, since some processes may finish earlier than others.
  • queue (Queue) – Queue object to pass filepaths back to master process.
  • write_to_file (bool) – (Optional) Default False. Set to True if the output from the each file is a large object, in which case writing to disk can help speed up process in recovering data in main process.
Returns:

No output returned.

Return type:

None

parallelize.parallelize.make_divisions(iterable: list, n_splits: int) → list[source]

Generates indices to divide iterable into equal portions.

Parameters:
  • iterable (list) – Iterable to divide.
  • n_splits (int) – Number of splits to make.
Raises:

ValueError – Raised if number of splits is more than the number of items in the iterable.

Returns:

The list of indices

Return type:

list

parallelize.parallelize.merge_results(results: List[list]) → list[source]

Merges a list of lists into a single list.

Parameters:results (List[list]) – List of lists to be merged.
Returns:Flattened list of objects.
Return type:list
parallelize.parallelize.parallel(func: Callable, iterable: list, n_jobs: int = 2, write_to_file: bool = False, args: tuple = (), kwargs: dict = {}) → Any[source]

Parallelises a function that operators on an iterable by splitting the iterable into a suitable divisions, and using multiprocessing module to spawn a new process for each division.

Parameters:
  • func (Callable) – Function to be parallelised
  • iterable (list) – List which func will operate over
  • n_jobs (int) – Number of splits to make, or number of CPU cores to use
  • write_to_file (bool) – (Optional) Default False. Set to True if the output from the each file is a large object, in which case writing to disk can help speed up process in recovering data in main process.
  • args (tuple) – Arguments to pass to func
  • kwargs (dict) – Keyword arguments to pass to func
Raises:

TypeError – Raised if iterable is not a real iterable

Returns:

The output of the original function or no output.

Return type:

Any

parallelize.parallelize.retrieve_output(file_paths: List[Tuple[int, pathlib.Path]]) → list[source]

Retrieves the outputs from the parallelised function which was saved to a temporary pickle file.

Parameters:file_paths (List[Tuple[int, Path]]) – List of file paths corresponding to the temporary files written by pickle.
Returns:List of outputs from the parallelised function.
Return type:list
parallelize.parallelize.time_function(func: Callable) → Callable[source]

Use as a decorator to time a function.

Parameters:func (Callable) – Function to be timed
Returns:Wrapped function
Return type:Callable
parallelize.parallelize.write_output_to_temp_file(output: list) → pathlib.Path[source]

Writes the output from the function being parallelised, and saves it to a temporary file using pickle.

Parameters:output (list) – The output from the parallelised function.
Returns:File path to the temporary file.
Return type:Path