#StackBounty: #python #scrapy #recaptcha #incapsula How to bypass Incapsula with Python

Bounty: 500

I use Scrapy and I try to scrape this site that uses Incapsula

<meta name="robots" content="noindex,nofollow">


I had already asked a Question about this issue 2 years ago, but this method (Incapsula-Cracker) does not work anymore.

I tried to understand How Incapsula works and I tried this for bypass it

def start_requests(self):
    yield Request('https://courses-en-ligne.carrefour.fr',  cookies={'store': 92}, dont_filter=True, callback = self.init_shop)
def init_shop(self,response) :
    result_content      = response.body
    RE_ENCODED_FUNCTION = re.compile('var b="(.*?)"', re.DOTALL)
    RE_INCAPSULA        = re.compile('(_Incapsula_Resource?SWHANEDL=.*?)"')
    INCAPSULA_URL       = 'https://courses-en-ligne.carrefour.fr/%s'
    encoded_func        = RE_ENCODED_FUNCTION.search(result_content).group(1)
    decoded_func        = ''.join([chr(int(encoded_func[i:i+2], 16)) for i in xrange(0, len(encoded_func), 2)])
    incapsula_params    = RE_INCAPSULA.search(decoded_func).group(1)
    incap_url           = INCAPSULA_URL % incapsula_params
    yield Request(incap_url)
def parse(self):
    print response.body 

But i’m redirected to RE-Captcha Page

<html style="height:100%">
<head>
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
<meta name="format-detection" content="telephone=no">
<meta name="viewport" content="initial-scale=1.0">
<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
</head>
<body style="margin:0px;height:100%">
Request unsuccessful. Incapsula incident ID: 459000960022408474-41333502566401539

</body>
</html>


Get this bounty!!!

Leave a Reply