multithreading - Python Multithread or Background Process? -
i know how optimize link-checker i've implemented webservice in python. cache responses database expires every 24 hours. every night have cron job refreshes cache cache never out-of-date.
things slow, of course, if truncate cache, or if page being checked has lot of links not in cache. no computer scientist, i'd advice , concrete on how optimize using either threads, or processes.
i thought of optimizing requesting each url background process (pseudocodish):
# part of code gets response codes not in cache... responses = {} # begin, create dict of url process in background processes = {} url in urls: processes[url] = popen("curl " url + " &") # loop through again , responses url in processes response = processes[url].communicate() responses[url] = response # have responses dict has responses keyed url
this cuts down time of script @ least 1/6 of use-cases opposed looping through each url , waiting each response, concerned overloading server script running on. considered using queue , having batches of maybe 25 or @ time.
would multithreading better overall solution? if so, how using multithreading module?
Comments
Post a Comment