API¶

parallelize.parallelize.capture_output(func: Callable, iterable: list, index: int, queue: multiprocessing.context.BaseContext.Queue, write_to_file: bool = False, *args, **kwargs) → None[source]¶

Captures the output of the function to be parallelised. If specified, output is saved to a temporary file, which is later read in by the master process, to be ultimately merged into one object.

Parameters:	func (Callable) – Function which is to be parallelised. iterable (list) – List which func will operate over. index (int) – Index number of current process. Used to sort order of results, since some processes may finish earlier than others. queue (Queue) – Queue object to pass filepaths back to master process. write_to_file (bool) – (Optional) Default False. Set to True if the output from the each file is a large object, in which case writing to disk can help speed up process in recovering data in main process.
Returns:	No output returned.
Return type:	None

parallelize.parallelize.make_divisions(iterable: list, n_splits: int) → list[source]¶

Generates indices to divide iterable into equal portions.

Parameters:	iterable (list) – Iterable to divide. n_splits (int) – Number of splits to make.
Raises:	`ValueError` – Raised if number of splits is more than the number of items in the iterable.
Returns:	The list of indices
Return type:	list

parallelize.parallelize.merge_results(results: List[list]) → list[source]¶

Merges a list of lists into a single list.

Parameters:	results (List[list]) – List of lists to be merged.
Returns:	Flattened list of objects.
Return type:	list

parallelize.parallelize.parallel(func: Callable, iterable: list, n_jobs: int = 2, write_to_file: bool = False, args: tuple = (), kwargs: dict = {}) → Any[source]¶

Parallelises a function that operators on an iterable by splitting the iterable into a suitable divisions, and using multiprocessing module to spawn a new process for each division.

Parameters:	func (Callable) – Function to be parallelised iterable (list) – List which func will operate over n_jobs (int) – Number of splits to make, or number of CPU cores to use write_to_file (bool) – (Optional) Default False. Set to True if the output from the each file is a large object, in which case writing to disk can help speed up process in recovering data in main process. args (tuple) – Arguments to pass to func kwargs (dict) – Keyword arguments to pass to func
Raises:	`TypeError` – Raised if iterable is not a real iterable
Returns:	The output of the original function or no output.
Return type:	Any

parallelize.parallelize.retrieve_output(file_paths: List[Tuple[int, pathlib.Path]]) → list[source]¶

Retrieves the outputs from the parallelised function which was saved to a temporary pickle file.

Parameters:	file_paths (List[Tuple[int, Path]]) – List of file paths corresponding to the temporary files written by pickle.
Returns:	List of outputs from the parallelised function.
Return type:	list

parallelize.parallelize.time_function(func: Callable) → Callable[source]¶

Use as a decorator to time a function.

Parameters:	func (Callable) – Function to be timed
Returns:	Wrapped function
Return type:	Callable

parallelize.parallelize.write_output_to_temp_file(output: list) → pathlib.Path[source]¶

Writes the output from the function being parallelised, and saves it to a temporary file using pickle.

Parameters:	output (list) – The output from the parallelised function.
Returns:	File path to the temporary file.
Return type:	Path