multithreading - Python: Can't start new thread. Can I stagger or delay some threads? -


not sure how ask question since beginning learn python here goes:

i have web scrapper uses threading grab info. looking pricing , stock 900 products. when test script half of that, there no problem. when try scrap 900 products can't start new thread error.

i imagine memory constraint or because asking server many requests

i know if there way slow down threads or stagger requests.

error code:

traceback (most recent call last):   file "c:\python27\tests\dxpriceupdates.py", line 78, in <module>     t.start() error: can't start new thread >>>  traceback (most recent call last):exception in thread thread-554: traceback (most recent call last):   file "c:\python27\lib\urllib.py", line 346, in open_http     errcode, errmsg, headers = h.getreply()   file "c:\python27\lib\httplib.py", line 1117, in getreply     response = self._conn.getresponse()   file "c:\python27\lib\httplib.py", line 1045, in getresponse     response.begin()   file "c:\python27\lib\httplib.py", line 441, in begin     self.msg = httpmessage(self.fp, 0)   file "c:\python27\lib\mimetools.py", line 25, in __init__     rfc822.message.__init__(self, fp, seekable)   file "c:\python27\lib\rfc822.py", line 108, in __init__     self.readheaders()   file "c:\python27\lib\httplib.py", line 308, in readheaders     self.addheader(headerseen, line[len(headerseen)+1:].strip()) memoryerror  <bound method thread.__bootstrap of <thread(thread-221, stopped 9512)>>traceback (most recent call last): traceback (most recent call last): traceback (most recent call last):  traceback (most recent call last): unhandled exception in thread started unhandled exception in thread started ... 

here python (the skulist.txt text file 12345, 23445, 5551,...):

from threading import thread import urllib import re import json import math      def th(ur):     site = "http://dx.com/p/getproductinforealtime?skus="+ur     htmltext = urllib.urlopen(site)     data = json.load(htmltext)     htmlrates = urllib.urlopen("http://rate-exchange.appspot.com/currency?from=usd&to=aud")     datarates = json.load(htmlrates)     if data['success'] == true:         if data['data'][0]['discount'] 0:             price = float(data['data'][0]['price'])             rate = float(datarates['rate']) + 0.12             cost = price*rate             if cost <= 5:                 saleprice = math.ceil(cost*1.7) - .05             elif (cost >5) , (cost <= 10):                 saleprice = math.ceil(cost*1.6) - .05             elif (cost >10) , (cost <= 15):                 saleprice = math.ceil(cost*1.55) - .05             else:                 saleprice = math.ceil(cost*1.5) - .05             if data['data'][0]['issoldout']:                 soldout = "out of stock"                 enabled = "disable"                 qty = "0"             else:                 soldout = "in stock"                 enabled = "enabled"                 qty = "9999"              #print model, saleprice, soldout, qty, enabled             myfile.write(str(ur)+","+str(saleprice)+","+str(soldout)+","+str(qty)+","+str(enabled)+"\n")         else:             price = float(data['data'][0]['listprice'])             rate = float(datarates['rate']) + 0.12             cost = price*rate             if cost <= 5:                 saleprice = math.ceil(cost*1.7) - .05             elif (cost >5) , (cost <= 10):                 saleprice = math.ceil(cost*1.6) - .05             elif (cost >10) , (cost <= 15):                 saleprice = math.ceil(cost*1.55) - .05             else:                 saleprice = math.ceil(cost*1.5) - .05             if data['data'][0]['issoldout']:                 soldout = "out of stock"                 enabled = "disable"                 qty = "0"             else:                 soldout = "in stock"                 enabled = "enabled"                 qty = "9999"              #print model, saleprice, soldout, qty, enabled             myfile.write(str(ur)+","+str(saleprice)+","+str(soldout)+","+str(qty)+","+str(enabled)+"\n")     else:         qty = "0"         print ur, "error \n"         myfile.write(str(ur)+","+"0.00"+","+"out of stock"+","+str(qty)+","+"disable\n")   skulist = open("skulist.txt").read() skulist = skulist.replace(" ", "").split(",")  myfile = open("prices/price_update.txt", "w+") myfile.close()  myfile = open("prices/price_update.txt", "a") threadlist = []  u in skulist:     t = thread(target=th,args=(u,))     t.start()     threadlist.append(t)  b in threadlist:     b.join()  myfile.close() 

don't fire 900 threads @ once, pc literally choke! instead, use pool , distribute activity on number of workers. use multiprocessing this:

from multiprocessing import pool  workers = 10 p = pool(workers) p.map(tr, skulist) 

find right value workers experimenting bit.


Comments

Popular posts from this blog

php - cannot display multiple markers in google maps v3 from traceroute result -

c# - DetailsView in ASP.Net - How to add another column on the side/add a control in each row? -

javascript - firefox memory leak -