Pathos solves PicklingError for multiprocssing
Python multiprocssing is useful in executing concurrent tasks with multiple processes. But it also requires the objects being executed support pickling, which is not always true for types like class instance methods, staticmethods and etc. Pathos has a multiprocessing implementation that uses dill on the backend which supports serializing and deserializing for almost all types.
Example using builtin multiprocessing that would raise PicklingError
import os
from multiprocessing import Pool
class Tasks:
@staticmethod
def process_some_task(item):
print("Processing...", item, "by pid:", os.getpid())
if __name__ == "__main__":
with Pool(4) as pool:
pool.map(Tasks.process_some_task, range(10))
Error raised running above script
(venv) vagrant@vagrant-ubuntu-trusty-64:~/test$ python test.py
Traceback (most recent call last):
File "test.py", line 13, in <module>
pool.map(Tasks.process_some_task, range(10))
File "/usr/lib/python3.4/multiprocessing/pool.py", line 260, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/usr/lib/python3.4/multiprocessing/pool.py", line 599, in get
raise self._value
File "/usr/lib/python3.4/multiprocessing/pool.py", line 383, in _handle_tasks
put(task)
File "/usr/lib/python3.4/multiprocessing/connection.py", line 206, in send
self._send_bytes(ForkingPickler.dumps(obj))
File "/usr/lib/python3.4/multiprocessing/reduction.py", line 50, in dumps
cls(buf, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <function Tasks.process_some_task at 0x7f45b3e626a8>:
attribute lookup process_some_task on __main__ failed
Solution by pathos
- install pathos
$ pip install pathos
- replace multiprocessing
import os
from pathos.multiprocessing import ProcessingPool as Pool
class Tasks:
@staticmethod
def process_some_task(item):
print("Processing...", item, "by pid:", os.getpid())
if __name__ == "__main__":
with Pool(4) as pool:
pool.map(Tasks.process_some_task, range(10))
Successful output with pathos
(venv) vagrant@vagrant-ubuntu-trusty-64:~/test$ python test.py
Processing... 0 by pid: 3827
Processing... 1 by pid: 3828
Processing... 2 by pid: 3826
Processing... 3 by pid: 3829
Processing... 4 by pid: 3827
Processing... 5 by pid: 3828
Processing... 6 by pid: 3826
Processing... 7 by pid: 3829
Processing... 8 by pid: 3827
Processing... 9 by pid: 3828
References
- What can multiprocessing and dill do together?
- pathos: a framework for parallel graph management and execution in heterogeneous computing
- dill: serialize all of python