numpy - Parallelism in (I)Python with large blocks of data -
i've been toiling threads , processes while try speed parallel job in ipython. i'm not sure how detail function i'm calling useful, here's bash ask if need more.
my function's call signature looks like
def intersplit_array(ob,er,nl,m,mi,t,ti,dmax,n0=6,steps=50):
basically, ob
, er
, nl
parameters observed values , m
,mi
,t
,ti
, dmax
parameters represent models against observations compared. (n0
, steps
fixed numerical parameters function.) function loops through models in m
and, using associated information in mi
, t
, ti
, dmax
, calculates probability model matches. note m
quite big: it's list of 700 000 22x3 numpy arrays. mi
, dmax
of similar sizes. if releant, normal ipython instance uses 25% of system memory in top
: 4gb of 16gb of ram.
i've tried parallelize in 2 ways. first, tried use parallel_map
function given on at scipy cookbook. made call
p = parallel_map(lambda i: intersplit_array(ob,er,nl,m[i+1],mi[i:i+2],t[i+1],ti[i:i+2],dmax[i+1],range(1,len(m)-1))
which runs, , provides correct answer. without parallel_
part, result of applying function 1 one each element. slower using single core. guess related global interpreter lock?
second, tried use pool
multiprocessing
. initialized pool with
p = multiprocessing.pool(6)
and tried call function with
p = p.map(lambda i: intersplit_array(ob,er,nl,m[i+1],mi[i:i+2],t[i+1],ti[i:i+2],dmax[i+1],range(1,len(m)-1))
first, error.
exception in thread thread-3: traceback (most recent call last): file "/usr/lib64/python2.7/threading.py", line 551, in __bootstrap_inner self.run() file "/usr/lib64/python2.7/threading.py", line 504, in run self.__target(*self.__args, **self.__kwargs) file "/usr/lib64/python2.7/multiprocessing/pool.py", line 319, in _handle_tasks put(task) picklingerror: can't pickle <type 'function'>: attribute lookup __builtin__.function failed
having in top
, see ipython
processes, each of apparently taking 25% of ram (which can't so, because i've still got 4gb free) , using 0% cpu. presume isn't doing anything. can't use ipython, either. tried ctrl-c while, gave once got passed 300th pool worker.
does work not interactively?
multiprocessing
doesn't play interactively, because of way splits processes. why had trouble killing because spawned many processes. have keep track of master process cancel it.
from the documentation:
note
functionality within package requires__main__
module importable children. covered in programming guidelines worth pointing out here. means examples, suchmultiprocessing.pool
examples not work in interactive interpreter.
...
if try output full tracebacks interleaved in semi-random fashion, , may have stop master process somehow.
the best solution run script command line. alternatively, ipython has own system parallel computing, i've never used it.
Comments
Post a Comment