API¶
-
parallelize.parallelize.
capture_output
(func: Callable, iterable: list, index: int, queue: multiprocessing.context.BaseContext.Queue, write_to_file: bool = False, *args, **kwargs) → None[source]¶ Captures the output of the function to be parallelised. If specified, output is saved to a temporary file, which is later read in by the master process, to be ultimately merged into one object.
Parameters: - func (Callable) – Function which is to be parallelised.
- iterable (list) – List which func will operate over.
- index (int) – Index number of current process. Used to sort order of results, since some processes may finish earlier than others.
- queue (Queue) – Queue object to pass filepaths back to master process.
- write_to_file (bool) – (Optional) Default False. Set to True if the output from the each file is a large object, in which case writing to disk can help speed up process in recovering data in main process.
Returns: No output returned.
Return type: None
-
parallelize.parallelize.
make_divisions
(iterable: list, n_splits: int) → list[source]¶ Generates indices to divide iterable into equal portions.
Parameters: - iterable (list) – Iterable to divide.
- n_splits (int) – Number of splits to make.
Raises: ValueError
– Raised if number of splits is more than the number of items in the iterable.Returns: The list of indices
Return type: list
-
parallelize.parallelize.
merge_results
(results: List[list]) → list[source]¶ Merges a list of lists into a single list.
Parameters: results (List[list]) – List of lists to be merged. Returns: Flattened list of objects. Return type: list
-
parallelize.parallelize.
parallel
(func: Callable, iterable: list, n_jobs: int = 2, write_to_file: bool = False, args: tuple = (), kwargs: dict = {}) → Any[source]¶ Parallelises a function that operators on an iterable by splitting the iterable into a suitable divisions, and using multiprocessing module to spawn a new process for each division.
Parameters: - func (Callable) – Function to be parallelised
- iterable (list) – List which func will operate over
- n_jobs (int) – Number of splits to make, or number of CPU cores to use
- write_to_file (bool) – (Optional) Default False. Set to True if the output from the each file is a large object, in which case writing to disk can help speed up process in recovering data in main process.
- args (tuple) – Arguments to pass to func
- kwargs (dict) – Keyword arguments to pass to func
Raises: TypeError
– Raised if iterable is not a real iterableReturns: The output of the original function or no output.
Return type: Any
-
parallelize.parallelize.
retrieve_output
(file_paths: List[Tuple[int, pathlib.Path]]) → list[source]¶ Retrieves the outputs from the parallelised function which was saved to a temporary pickle file.
Parameters: file_paths (List[Tuple[int, Path]]) – List of file paths corresponding to the temporary files written by pickle. Returns: List of outputs from the parallelised function. Return type: list
-
parallelize.parallelize.
time_function
(func: Callable) → Callable[source]¶ Use as a decorator to time a function.
Parameters: func (Callable) – Function to be timed Returns: Wrapped function Return type: Callable
-
parallelize.parallelize.
write_output_to_temp_file
(output: list) → pathlib.Path[source]¶ Writes the output from the function being parallelised, and saves it to a temporary file using pickle.
Parameters: output (list) – The output from the parallelised function. Returns: File path to the temporary file. Return type: Path