numpy - Parallelism in (I)Python with large blocks of data -


i've been toiling threads , processes while try speed parallel job in ipython. i'm not sure how detail function i'm calling useful, here's bash ask if need more.

my function's call signature looks like

def intersplit_array(ob,er,nl,m,mi,t,ti,dmax,n0=6,steps=50): 

basically, ob, er , nl parameters observed values , m,mi,t,ti , dmax parameters represent models against observations compared. (n0 , steps fixed numerical parameters function.) function loops through models in m and, using associated information in mi, t, ti , dmax, calculates probability model matches. note m quite big: it's list of 700 000 22x3 numpy arrays. mi , dmax of similar sizes. if releant, normal ipython instance uses 25% of system memory in top: 4gb of 16gb of ram.

i've tried parallelize in 2 ways. first, tried use parallel_map function given on at scipy cookbook. made call

p = parallel_map(lambda i: intersplit_array(ob,er,nl,m[i+1],mi[i:i+2],t[i+1],ti[i:i+2],dmax[i+1],range(1,len(m)-1)) 

which runs, , provides correct answer. without parallel_ part, result of applying function 1 one each element. slower using single core. guess related global interpreter lock?

second, tried use pool multiprocessing. initialized pool with

p = multiprocessing.pool(6) 

and tried call function with

p = p.map(lambda i: intersplit_array(ob,er,nl,m[i+1],mi[i:i+2],t[i+1],ti[i:i+2],dmax[i+1],range(1,len(m)-1)) 

first, error.

exception in thread thread-3: traceback (most recent call last):   file "/usr/lib64/python2.7/threading.py", line 551, in __bootstrap_inner     self.run()   file "/usr/lib64/python2.7/threading.py", line 504, in run     self.__target(*self.__args, **self.__kwargs)   file "/usr/lib64/python2.7/multiprocessing/pool.py", line 319, in _handle_tasks     put(task) picklingerror: can't pickle <type 'function'>: attribute lookup __builtin__.function failed 

having in top, see ipython processes, each of apparently taking 25% of ram (which can't so, because i've still got 4gb free) , using 0% cpu. presume isn't doing anything. can't use ipython, either. tried ctrl-c while, gave once got passed 300th pool worker.

does work not interactively?

multiprocessing doesn't play interactively, because of way splits processes. why had trouble killing because spawned many processes. have keep track of master process cancel it.

from the documentation:

note
functionality within package requires __main__ module importable children. covered in programming guidelines worth pointing out here. means examples, such multiprocessing.pool examples not work in interactive interpreter.
...
if try output full tracebacks interleaved in semi-random fashion, , may have stop master process somehow.

the best solution run script command line. alternatively, ipython has own system parallel computing, i've never used it.


Comments

Popular posts from this blog

c# - DetailsView in ASP.Net - How to add another column on the side/add a control in each row? -

javascript - firefox memory leak -

Trying to import CSV file to a SQL Server database using asp.net and c# - can't find what I'm missing -