用 Pathos 解决多处理中的 PicklingError 问题
Python 多进程在执行具有多个进程的并发任务时非常有用。但它还要求正在执行的对象支持 pickling,而对于类实例方法、静态方法等类型,这并不总是正确的。Pathos 有一个多进程实现,它在后端使用 dill,支持几乎所有类型的序列化和反序列化。
Example using builtin multiprocessing that would raise PicklingError
import os
from multiprocessing import Pool
class Tasks:
@staticmethod
def process_some_task(item):
print("Processing...", item, "by pid:", os.getpid())
if __name__ == "__main__":
with Pool(4) as pool:
pool.map(Tasks.process_some_task, range(10))
Error raised running above script
(venv) vagrant@vagrant-ubuntu-trusty-64:~/test$ python test.py
Traceback (most recent call last):
File "test.py", line 13, in <module>
pool.map(Tasks.process_some_task, range(10))
File "/usr/lib/python3.4/multiprocessing/pool.py", line 260, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/usr/lib/python3.4/multiprocessing/pool.py", line 599, in get
raise self._value
File "/usr/lib/python3.4/multiprocessing/pool.py", line 383, in _handle_tasks
put(task)
File "/usr/lib/python3.4/multiprocessing/connection.py", line 206, in send
self._send_bytes(ForkingPickler.dumps(obj))
File "/usr/lib/python3.4/multiprocessing/reduction.py", line 50, in dumps
cls(buf, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <function Tasks.process_some_task at 0x7f45b3e626a8>:
attribute lookup process_some_task on __main__ failed
Solution by pathos
- install pathos
# install pathos
$ pip install pathos
- replace multiprocessing
import os
from pathos.multiprocessing import ProcessingPool as Pool
class Tasks:
@staticmethod
def process_some_task(item):
print("Processing...", item, "by pid:", os.getpid())
if __name__ == "__main__":
with Pool(4) as pool:
pool.map(Tasks.process_some_task, range(10))
Successful output with pathos
(venv) vagrant@vagrant-ubuntu-trusty-64:~/test$ python test.py
Processing... 0 by pid: 3827
Processing... 1 by pid: 3828
Processing... 2 by pid: 3826
Processing... 3 by pid: 3829
Processing... 4 by pid: 3827
Processing... 5 by pid: 3828
Processing... 6 by pid: 3826
Processing... 7 by pid: 3829
Processing... 8 by pid: 3827
Processing... 9 by pid: 3828
参考资料
- What can multiprocessing and dill do together?
- pathos: a framework for parallel graph management and execution in heterogeneous computing
- dill: serialize all of python