Bounty: 50
I’m trying to scrape information from different sites about some products. Here is the structure of my program:
product_list = [iPad, iPhone, AirPods, ...]
def spider_tmall:
self.driver.find_element_by_id('searchKeywords').send_keys(inputlist[a])
# ...
def spider_jd:
self.driver.find_element_by_id('searchKeywords').send_keys(inputlist[a])
# ...
if __name__ == '__main__':
for a in range(len(inputlist)):
process = CrawlerProcess(settings={
"FEEDS": {
"itemtmall.csv": {"format": "csv",
'fields': ['product_name_tmall', 'product_price_tmall', 'product_discount_tmall'], },
"itemjd.csv": {"format": "csv",
'fields': ['product_name_jd', 'product_price_jd', 'product_discount_jd'], },
})
process.crawl(tmallSpider)
process.crawl(jdSpider)
process.start()
Basically, I want to run all spiders for all inputs in product_list
. Right now, my program only runs through all spiders once (in the case, it does the job for iPad) then there is ReactorNotRestartable
Error and the program terminates. Anybody knows how to fix it?
Also, my overall goal is to run the spider multiple times, the input doesn’t necessarily have to be a list. It can be a CSV file or something else. Any suggestion would be appreciated!