#StackBounty: #python #asp.net #web-scraping #python-requests Python Scraping .aspx multiple pages, multiple __VIEWSTATES

Bounty: 50

I’m trying to scrape this site:

http://www.occeweb.com/MOEAsearch/index.aspx

If I search for “A”, I get multiple pages.

I can get the results of the 1st page fine, using:

url = 'http://www.occeweb.com/MOEAsearch/index.aspx'

r = requests.get(url)
soup = BeautifulSoup(r.text,'html.parser')
vs = soup.find('input',{'id':'__VIEWSTATE'}).attrs['value']
ev = soup.find('input',{'id':'__EVENTVALIDATION'}).attrs['value']

cookies = {
    'ASP.NET_SessionId': 'f1vztt45bdcvzr45jkrbcoru',
}

headers = {
    'Proxy-Connection': 'keep-alive',
    'Cache-Control': 'max-age=0',
    'Origin': 'http://www.occeweb.com',
    'Upgrade-Insecure-Requests': '1',
    'Content-Type': 'application/x-www-form-urlencoded',
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.143 Safari/537.36',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
    'Referer': url,
    'Accept-Encoding': 'gzip, deflate',
    'Accept-Language': 'en-US,en;q=0.9',
}

data = {
  '__EVENTTARGET': 'gvResults',
  '__EVENTARGUMENT': '',
  '__VIEWSTATE': vs,
  '__VIEWSTATEGENERATOR': '2E193097',
  '__EVENTVALIDATION': ev,
  'txtSearch': 'A',
  'StartsEnds': 'rbBeginswith',
  'TxtSearchFirst': '',
  'btnSearch':'Search'
}

r = requests.post(url, headers=headers, cookies=cookies, data=data)
soup = BeautifulSoup(r.text,'html.parser')

However, when I try to use the same __VIEWSTATE and __EVENTVALIDATION for the 2nd page, it doesn’t work.

I have also tried pulling the __VIEWSTATE from the response of the POST request and using that in the subsequent call, no luck.

Note that I am able to get this to work for the first 11 pages of results by simply copying the __VIEWSTATE and __EVENTVALIDATION from chrome dev tools on page 1 and holding it static (have to remove 'btnSearch':'Search' for pages after 1 for some reason).

However this static __VIEWSTATE and __EVENTVALIDATION fail on page 12. When I copy the page 12 curl, it works until page 22, then page 32, 42 and so on. So it seems the __VIEWSTATE needs to be updated once every 10 pages or so.

Problem is, the __VIEWSTATE I pull from the result of the POST request does not work, and I can’t GET the updated __VIEWSTATE I need.

Thanks for you help!


Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.