用 Pathos 解決多處理中的 PicklingError 問題
Python 多進程在使用多個進程執行並發任務時很有用。但是它還要求正在執行的物件支援 pickling,而對於類別實例方法、靜態方法等類型來說,這並不總是正確的。 Pathos 有一個多處理實現,它在後端使用 dill,支援幾乎所有類型的序列化和反序列化。
Example using builtin multiprocessing that would raise PicklingError
import os
from multiprocessing import Pool
class Tasks:
@staticmethod
def process_some_task(item):
print("Processing...", item, "by pid:", os.getpid())
if __name__ == "__main__":
with Pool(4) as pool:
pool.map(Tasks.process_some_task, range(10))
Error raised running above script
(venv) vagrant@vagrant-ubuntu-trusty-64:~/test$ python test.py
Traceback (most recent call last):
File "test.py", line 13, in <module>
pool.map(Tasks.process_some_task, range(10))
File "/usr/lib/python3.4/multiprocessing/pool.py", line 260, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/usr/lib/python3.4/multiprocessing/pool.py", line 599, in get
raise self._value
File "/usr/lib/python3.4/multiprocessing/pool.py", line 383, in _handle_tasks
put(task)
File "/usr/lib/python3.4/multiprocessing/connection.py", line 206, in send
self._send_bytes(ForkingPickler.dumps(obj))
File "/usr/lib/python3.4/multiprocessing/reduction.py", line 50, in dumps
cls(buf, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <function Tasks.process_some_task at 0x7f45b3e626a8>:
attribute lookup process_some_task on __main__ failed
Solution by pathos
- install pathos
# install pathos
$ pip install pathos
- replace multiprocessing
import os
from pathos.multiprocessing import ProcessingPool as Pool
class Tasks:
@staticmethod
def process_some_task(item):
print("Processing...", item, "by pid:", os.getpid())
if __name__ == "__main__":
with Pool(4) as pool:
pool.map(Tasks.process_some_task, range(10))
Successful output with pathos
(venv) vagrant@vagrant-ubuntu-trusty-64:~/test$ python test.py
Processing... 0 by pid: 3827
Processing... 1 by pid: 3828
Processing... 2 by pid: 3826
Processing... 3 by pid: 3829
Processing... 4 by pid: 3827
Processing... 5 by pid: 3828
Processing... 6 by pid: 3826
Processing... 7 by pid: 3829
Processing... 8 by pid: 3827
Processing... 9 by pid: 3828
參考文獻
- What can multiprocessing and dill do together?
- pathos: a framework for parallel graph management and execution in heterogeneous computing
- dill: serialize all of python