CAPTCHA and Challenge in AWS WAF | How to Solve it When Web Scraping

Web Seeker
6 min readJun 24, 2024

--

Just as I’m sure many enterprise or crawler users know, AWS WAF (Web Application Firewall) is a powerful security solution designed to protect web applications from common web attacks and vulnerabilities. One of its key features is the use of CAPTCHA and challenges to distinguish legitimate users from potentially malicious bots. While this enhances security, it can present significant hurdles as well as some unnecessary hassles for web crawling activities. So in this article, we will explore the CAPTCHA and challenges you will encounter in AWS WAF and discuss how to overcome these obstacles to ensure that web crawlers and your enterprise activities run smoothly.

Getting to Know AWS WAF’s CAPTCHA and Challenges

It is well known that AWS WAF employs CAPTCHA and challenges as part of its defence mechanisms to prevent automated attacks and unauthorised access. These measures are designed to verify that the user interacting with a web application is human and not a bot. When, for example, a request during your web crawl is considered questionable, AWS WAF may show you the CAPTCHA or challenge.

Struggling with the repeated failure to completely solve the irritating captcha?

Discover seamless automatic captcha solving with Capsolver AI-powered Auto Web Unblock technology!

Claim Your Bonus Code for top captcha solutions; CapSolver: WEBS. After redeeming it, you will get an extra 5% bonus after each recharge, Unlimited

  1. IP Match Conditions

Amazon WAF can configure up to 10,000 IP address ranges using Classless Inter-Domain Routing (CIDR) notation for each IP match condition. Each list is subject to this limit. The allow list, deny list (manual IP list component), and third-party IP block list (IP list parsing component) are separate lists, each with a limit of 10,000 IP addresses.
IP sets for allowing and denying can be manually modified to add or remove IP addresses as needed.

2. Honey Pot Embedded in Web Applications

A rarely accessed endpoint will be created as a honeypot to detect and divert inbound requests from content scrapers and malicious bots. Normal users will not attempt to access this endpoint. However, content scrapers and malicious bots (such as malware that scans for vulnerabilities and scrapes data) might try to access the honeypot endpoint. In such cases, Amazon will inspect the request to extract its source and then update the associated Amazon WAF rule to block subsequent requests from that IP address.

Also,there are three general types of aws:

  • Below is an example of a picture grid puzzle. The puzzle asks you to select all the pictures in the grid that contain a particular type of object.
  • Another common one is the screenshot below, which asks you to determine the end point of a car path in a drawing.
  • The last one is an audio CAPTCHA, which uses the principle of background noise superimposed on speech. Of course, as with puzzles, audio CAPTCHAs can be solved automatically if you have the right approach.

How do we identify the AWS WAF?

  1. Response Header Check for Request URL
    When requesting a URL, if the response status code is typically 405 and the response header includes the fields X-Amzn-Waf-Action: captcha and X-Amzn-Errortype: ForbiddenException, it indicates that the current access is being blocked by AWS WAF.
  2. Appearance in Response HTML

When the response HTML contains information such asawsWafcaptcha.awswaf.com, etc., it indicates the need for AWS captcha processing.

<script type="text/javascript">
window.awsWafCookieDomainList = [];
window.gokuProps = {
"key":"AQIDAHjcYu/*****",
"iv":"CgAHfjMvRjAAAA3q",
"context":"MK7Z1IlZc****"
};
</script>
<script src="https://***.token.awswaf.com/***/challenge.js"></script>
<script src="https://***.captcha.awswaf.com/****/captcha.js"></script>

Techniques to solve WAF

There is a way for us to achieve compliant automated puzzle solving, and that’s through CapSolver is a service that provides solutions for captcha recognition. It offers various task types for different captcha systems, including WAF.

Capsolver provides two CAPTCHA solving services that can help you to easily solve WAF. One service is using Capsolver’s API, and the other one is downloading the Extension.

Next follow my steps to see how to implement an automated solution in web scraping, it’s simple, lets dig in!

Step 1 Login

You can sign up for CapSolver and get access to our CAPTCHA service, which is currently supported with a free trial.

Step 2 Get your free API!

Once you have registered, you can obtain your api key from the home page panel.

Code example

To get a valid aws-waf-token by code, you can use python, go, javascript and other mainstream languages, the following is the way to get it with python

# pip install requests
import requests
import time
api_key = "YOUR_API_KEY"  # TODO: your api key of capsolver
site_url = "https://efw47fpad9.execute-api.us-east-1.amazonaws.com/latest" # TODO: page url of your site
def capsolver():
payload = {
"clientKey": api_key,
"task": {
"type": 'AntiAwsWafTaskProxyLess',
"websiteURL": site_url
}
}
res = requests.post("https://api.capsolver.com/createTask", json=payload)
resp = res.json()
task_id = resp.get("taskId")
if not task_id:
print("Failed to create task:", res.text)
return
print(f"Got taskId: {task_id} / Getting result...")
while True:
time.sleep(1) # delay
payload = {"clientKey": api_key, "taskId": task_id}
res = requests.post("https://api.capsolver.com/getTaskResult", json=payload)
resp = res.json()
status = resp.get("status")
if status == "ready":
return resp.get("solution", {}).get('cookie')
if status == "failed" or resp.get("errorId"):
print("Solve failed! response:", res.text)
return
token = capsolver()
print(token)

Soon, you’ll see that the output is the required aws-waf-token

2d8415fb-43ec-42c5-8106-c51194d5eb14:EQoAljIa3jkRAAAA:Z+bkUZcJEl90QIM46acsmio......

Next, try to see if the aws-waf-token actually works.

def check_website(token):
headers = {
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
"accept-language": "en-US,en;q=0.9,sq;q=0.8,ak;q=0.7,ar;q=0.6,an;q=0.5,am;q=0.4,as;q=0.3,az;q=0.2,ast;q=0.1,ee;q=0.1,ay;q=0.1,ga;q=0.1,et;q=0.1,oc;q=0.1,or;q=0.1,om;q=0.1,eu;q=0.1,be;q=0.1,bm;q=0.1,bg;q=0.1,nso;q=0.1,is;q=0.1,pl;q=0.1,bs;q=0.1,fa;q=0.1,bho;q=0.1,br;q=0.1,tn;q=0.1,ts;q=0.1,tt;q=0.1,da;q=0.1,de;q=0.1,de-AT;q=0.1,de-DE;q=0.1,de-LI;q=0.1,de-CH;q=0.1,dv;q=0.1,doi;q=0.1,ru;q=0.1,fo;q=0.1,fr;q=0.1,fr-FR;q=0.1,fr-CA;q=0.1,fr-CH;q=0.1,sa;q=0.1,fil;q=0.1,fi;q=0.1,km;q=0.1,ka;q=0.1,gu;q=0.1,gn;q=0.1,ia;q=0.1,kk;q=0.1,ht;q=0.1,ko;q=0.1,ha;q=0.1,nl;q=0.1,gl;q=0.1,ca;q=0.1,cs;q=0.1,kn;q=0.1,ky;q=0.1,xh;q=0.1,co;q=0.1,hr;q=0.1,zu;q=0.1,zh-HK;q=0.1,zh-CN;q=0.1,zh-TW;q=0.1,zh;q=0.1,ckb;q=0.1,jv;q=0.1,vi;q=0.1,yo;q=0.1,en-GB-oxendict;q=0.1,en-IN;q=0.1,en-GB;q=0.1,en-AU;q=0.1,en-CA;q=0.1,en-ZA;q=0.1,en-NZ;q=0.1,yi;q=0.1,hi;q=0.1,id;q=0.1,en-IE;q=0.1,hy;q=0.1,ig;q=0.1,ilo;q=0.1,it;q=0.1,it-CH;q=0.1,it-IT;q=0.1,el;q=0.1,sd;q=0.1,haw;q=0.1,hu;q=0.1,su;q=0.1,es-UY;q=0.1,es-ES;q=0.1,es-CL;q=0.1,es;q=0.1,fy;q=0.1,he;q=0.1,es-HN;q=0.1,es-419;q=0.1,es-US;q=0.1,es-PE;q=0.1,es-MX;q=0.1,es-VE;q=0.1,uk;q=0.1,uz;q=0.1,es-AR;q=0.1,es-CO;q=0.1,es-CR;q=0.1,tk;q=0.1,wa;q=0.1,cy;q=0.1,ug;q=0.1,wo;q=0.1,ur;q=0.1,th;q=0.1,to;q=0.1,ti;q=0.1,tr;q=0.1,ceb;q=0.1,so;q=0.1,tg;q=0.1,te;q=0.1,ta;q=0.1,eo;q=0.1,nb;q=0.1,sk;q=0.1,sl;q=0.1,sw;q=0.1,gd;q=0.1,sv;q=0.1,sm;q=0.1,sh;q=0.1,sr;q=0.1,si;q=0.1,sn;q=0.1,ja;q=0.1,chr;q=0.1,tw;q=0.1,ny;q=0.1,ps;q=0.1,pt-PT;q=0.1,pt;q=0.1,pt-BR;q=0.1,pa;q=0.1,no;q=0.1,nn;q=0.1,ne;q=0.1,st;q=0.1,hmn;q=0.1,af;q=0.1,my;q=0.1,lus;q=0.1,bn;q=0.1,mn;q=0.1,mi;q=0.1,mni-Mtei;q=0.1,mni;q=0.1,mai;q=0.1,mk;q=0.1,ms;q=0.1,ml;q=0.1,mg;q=0.1,mr;q=0.1,mt;q=0.1,rm;q=0.1,mo;q=0.1,ro;q=0.1,rw;q=0.1,lb;q=0.1,lg;q=0.1,ln;q=0.1,lt;q=0.1,lo;q=0.1,lv;q=0.1,la;q=0.1,ku;q=0.1,kok;q=0.1,qu;q=0.1",
"cache-control": "no-cache",
"pragma": "no-cache",
"priority": "u=0, i",
"referer": "https://efw47fpad9.execute-api.us-east-1.amazonaws.com/latest",
"sec-ch-ua": "\"Google Chrome\";v=\"125\", \"Chromium\";v=\"125\", \"Not.A/Brand\";v=\"24\"",
"sec-ch-ua-mobile": "?0",
"sec-ch-ua-platform": "\"macOS\"",
"sec-fetch-dest": "document",
"sec-fetch-mode": "navigate",
"sec-fetch-site": "same-origin",
"sec-fetch-user": "?1",
"upgrade-insecure-requests": "1",
"user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36"
}
cookies = {
"aws-waf-token": token
}
url = "https://efw47fpad9.execute-api.us-east-1.amazonaws.com/latest"
response = requests.get(url, headers=headers, cookies=cookies)
    print(response.text)
print(response)

check_website(token)

Status code 200 and successfully output content

{
"success": true,
"note": "CAPTCHA check resets in 60 seconds."
}
<Response [200]>

Wrapping up

AWS WAF’s CAPTCHA and challenge mechanisms are essential for enhancing web security but can pose significant hurdles for web scraping activities. By understanding the types of challenges AWS WAF employs and using tools like CapSolver, you can effectively overcome these obstacles. CapSolver offers solutions for automated CAPTCHA solving, allowing you to continue your web scraping tasks seamlessly. By following the steps outlined in this article, including obtaining an API key and implementing CapSolver’s services, you can ensure your web scraping operations run smoothly and efficiently, even when faced with AWS WAF protections.

--

--

Web Seeker
Web Seeker

Written by Web Seeker

Passionate about technology and dedicated to sharing insights on network security.

No responses yet