multithreading - Python: Can't start new thread. Can I stagger or delay some threads? -
not sure how ask question since beginning learn python here goes:
i have web scrapper uses threading grab info. looking pricing , stock 900 products. when test script half of that, there no problem. when try scrap 900 products can't start new thread error.
i imagine memory constraint or because asking server many requests
i know if there way slow down threads or stagger requests.
error code:
traceback (most recent call last): file "c:\python27\tests\dxpriceupdates.py", line 78, in <module> t.start() error: can't start new thread >>> traceback (most recent call last):exception in thread thread-554: traceback (most recent call last): file "c:\python27\lib\urllib.py", line 346, in open_http errcode, errmsg, headers = h.getreply() file "c:\python27\lib\httplib.py", line 1117, in getreply response = self._conn.getresponse() file "c:\python27\lib\httplib.py", line 1045, in getresponse response.begin() file "c:\python27\lib\httplib.py", line 441, in begin self.msg = httpmessage(self.fp, 0) file "c:\python27\lib\mimetools.py", line 25, in __init__ rfc822.message.__init__(self, fp, seekable) file "c:\python27\lib\rfc822.py", line 108, in __init__ self.readheaders() file "c:\python27\lib\httplib.py", line 308, in readheaders self.addheader(headerseen, line[len(headerseen)+1:].strip()) memoryerror <bound method thread.__bootstrap of <thread(thread-221, stopped 9512)>>traceback (most recent call last): traceback (most recent call last): traceback (most recent call last): traceback (most recent call last): unhandled exception in thread started unhandled exception in thread started ...
here python (the skulist.txt text file 12345, 23445, 5551,...):
from threading import thread import urllib import re import json import math def th(ur): site = "http://dx.com/p/getproductinforealtime?skus="+ur htmltext = urllib.urlopen(site) data = json.load(htmltext) htmlrates = urllib.urlopen("http://rate-exchange.appspot.com/currency?from=usd&to=aud") datarates = json.load(htmlrates) if data['success'] == true: if data['data'][0]['discount'] 0: price = float(data['data'][0]['price']) rate = float(datarates['rate']) + 0.12 cost = price*rate if cost <= 5: saleprice = math.ceil(cost*1.7) - .05 elif (cost >5) , (cost <= 10): saleprice = math.ceil(cost*1.6) - .05 elif (cost >10) , (cost <= 15): saleprice = math.ceil(cost*1.55) - .05 else: saleprice = math.ceil(cost*1.5) - .05 if data['data'][0]['issoldout']: soldout = "out of stock" enabled = "disable" qty = "0" else: soldout = "in stock" enabled = "enabled" qty = "9999" #print model, saleprice, soldout, qty, enabled myfile.write(str(ur)+","+str(saleprice)+","+str(soldout)+","+str(qty)+","+str(enabled)+"\n") else: price = float(data['data'][0]['listprice']) rate = float(datarates['rate']) + 0.12 cost = price*rate if cost <= 5: saleprice = math.ceil(cost*1.7) - .05 elif (cost >5) , (cost <= 10): saleprice = math.ceil(cost*1.6) - .05 elif (cost >10) , (cost <= 15): saleprice = math.ceil(cost*1.55) - .05 else: saleprice = math.ceil(cost*1.5) - .05 if data['data'][0]['issoldout']: soldout = "out of stock" enabled = "disable" qty = "0" else: soldout = "in stock" enabled = "enabled" qty = "9999" #print model, saleprice, soldout, qty, enabled myfile.write(str(ur)+","+str(saleprice)+","+str(soldout)+","+str(qty)+","+str(enabled)+"\n") else: qty = "0" print ur, "error \n" myfile.write(str(ur)+","+"0.00"+","+"out of stock"+","+str(qty)+","+"disable\n") skulist = open("skulist.txt").read() skulist = skulist.replace(" ", "").split(",") myfile = open("prices/price_update.txt", "w+") myfile.close() myfile = open("prices/price_update.txt", "a") threadlist = [] u in skulist: t = thread(target=th,args=(u,)) t.start() threadlist.append(t) b in threadlist: b.join() myfile.close()
don't fire 900 threads @ once, pc literally choke! instead, use pool , distribute activity on number of workers. use multiprocessing
this:
from multiprocessing import pool workers = 10 p = pool(workers) p.map(tr, skulist)
find right value workers
experimenting bit.
Comments
Post a Comment