#StackBounty: #python #python-3.x #web-scraping #locking #multiprocessing Implementing lock within a python script

Bounty: 50

I’ve written a script in python using multiprocessing to handle multiple process at the same time and make the scraping process faster. I’ve used lock within it to prevent two processes from changing its internal state. As I’m very new to implement lock within multiprocessing, I suppose there are rooms for improvement.

What the scraper does is collect name, address and phone number of every coffe shop traversing multiple pages from yellowpage.

This is my script:

import requests 
from lxml.html import fromstring
from multiprocessing import Process, Lock

link = "https://www.yellowpages.com/search?search_terms=coffee&geo_location_terms=Los%20Angeles%2C%20CA&page={}"
itemstorage = []

def get_info(url,lock,itemstorage):
    response = requests.get(url).text
    tree = fromstring(response)
    for title in tree.cssselect("div.info"):
        name = title.cssselect("a.business-name span")[0].text
        try:
            street = title.cssselect("span.street-address")[0].text
        except IndexError: street = ""
        try:
            phone = title.cssselect("div[class^=phones]")[0].text
        except IndexError: phone = ""
        itemstorage.extend([name, street, phone])
    return printer(lock,itemstorage)

def printer(lock,data): 
    lock.acquire()
    try:
        print(data)
    finally:
        lock.release()

if __name__ == '__main__':
    lock = Lock()
    for i in [link.format(page) for page in range(1,15)]:
        p = Process(target=get_info, args=(i,lock,itemstorage))
        p.start()


Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.