Cloudflare under the hood: how it works and how attackers try to get around it
DEV Community

Cloudflare under the hood: how it works and how attackers try to get around it

What Cloudflare actually is

Cloudflare is not a reverse proxy running on one server somewhere. It is a globally distributed edge network with over 300 points of presence (PoPs). When you put your domain behind Cloudflare, you are routing all traffic through that network before it ever reaches your server.

The mechanism is anycast routing. Cloudflare announces the same IP address from every PoP simultaneously. When a user sends a request to your site, BGP routing automatically directs it to the closest PoP, not to your origin server. From there, Cloudflare decides what to do with it:

User in Tokyo
|
| (anycast routes to nearest PoP)
v
Cloudflare Tokyo PoP
|-- cached? → serve from edge, origin never touched
|-- blocked? → return 403, origin never touched
|-- challenge? → run Turnstile, origin never touched
|-- clean? → forward to origin, return response
v
Your origin server

TLS termination happens at the edge PoP, not at your origin. Cloudflare holds the certificate, decrypts the request, inspects it, then re-encrypts it for the leg to your origin (assuming SSL between Cloudflare and origin is enabled, which it should be). This is why Cloudflare can inspect HTTPS traffic for WAF rules without a man-in-the-middle attack: you are explicitly delegating that decryption to them.

The layers between a request and your server

A request arriving at a Cloudflare PoP passes through several decision layers in order:

  • DDoS mitigation runs first. Volumetric floods are absorbed at the network layer. HTTP floods are identified by rate, pattern, and reputation.
  • IP reputation and geofencing checks the source IP against Cloudflare's threat database. IPs from known botnets, Tor exit nodes, or datacenter ranges are scored.
  • WAF inspects the HTTP layer: headers, path, query params, body. Cloudflare maintains a managed ruleset covering OWASP Top 10 plus known CVEs.
  • Bot management (Turnstile is the visible part) assigns each request a bot score from 1 to 99. Score 1 is almost certainly a bot. Score 99 is almost certainly human.
  • Cache is the last layer before origin. If the response is cacheable and a fresh copy exists at the PoP, Cloudflare serves it without touching your server.

How Turnstile works

Turnstile is Cloudflare's CAPTCHA replacement. Unlike reCAPTCHA v2, it has no image challenge: the goal is to verify a visitor is human without making them solve anything visible.

  1. The widget loads a JS challenge from Cloudflare's edge. The script is different per request, not a static file you can analyze once.
  2. The script collects passive signals:
    • Timing: how long did each JS operation take? Headless browsers running at full CPU speed have suspiciously uniform timing.
    • Interaction: did the mouse move before the form was submitted? Did keystrokes have natural delays?
    • Browser fingerprint: canvas rendering, WebGL renderer, installed fonts, audio context output.
    • Environment: is navigator.webdriver exposed? Are dev tools open?
  3. Cloudflare runs those signals through a model trained on billions of requests and issues a signed token if the request looks human.
  4. Your backend verifies the token against Cloudflare's siteverify API:
POST https://challenges.cloudflare.com/turnstile/v0/siteverify
{
  "secret": "your-secret-key",
  "response": "token-from-widget"
}

If your backend does not make this call, the protection is entirely client-side and trivially bypassed by skipping the form submission step.

Finding the origin server behind Cloudflare

If an attacker finds your origin IP, they can bypass Cloudflare entirely by sending requests directly to that IP. Your WAF, DDoS protection, and Turnstile all disappear. Here are the techniques commonly used, in order of how often they succeed:

SSL certificate history

Before you put a domain behind Cloudflare, it had a certificate issued directly to the origin. Certificate transparency logs are public and record every certificate ever issued:

https://crt.sh/?q=example.com

If the origin IP appeared in a certificate before Cloudflare was enabled, it is in the log forever.

DNS history

Before Cloudflare, your A record pointed directly to your origin. Those records are archived by SecurityTrails, DNSDumpster, and ViewDNS.info, often with timestamps showing exactly when you switched.

Subdomains not behind Cloudflare

Many teams proxy www and the apex but leave other subdomains with a grey cloud (not proxied) by accident:

  • ftp.example.com: legacy, often points to origin
  • dev.example.com, staging.example.com: forgotten
  • api.example.com: sometimes bypasses the proxy for latency reasons

A subdomain enumeration pass reveals which subdomains resolve to a non-Cloudflare IP.

MX records

Mail servers cannot be proxied through Cloudflare. Your MX record points directly to a mail server, often on the same IP block as your web server:

dig MX example.com
# → mail.example.com
dig A mail.example.com
# → 203.0.113.42

SPF records

SPF records list every IP authorized to send email on your behalf. They often include your origin server or hosting provider's IP range:

dig TXT example.com
# v=spf1 ip4:203.0.113.0/24 include:sendgrid.net ~all

Shodan + certificate fingerprint

If your origin uses a Cloudflare origin certificate, its fingerprint is the same regardless of how it is accessed. Shodan and Censys index TLS certificates across the entire IPv4 space: search for your cert fingerprint to find the raw IP.

Bypassing Turnstile

Solving services

2captcha, Anti-Captcha, and CapSolver use human workers who run a real browser session and return the token. This works but is slow (seconds per token) and costs money per solve. Practical at low volume, expensive at scale.

Headless browser spoofing

Playwright and Puppeteer combined with stealth plugins patch the detectable properties:

  • navigator.webdriver set to undefined
  • Spoofed canvas fingerprint
  • Realistic mouse movement and keystroke timing
  • Full Chrome user agent

A well-configured headless browser can pass Turnstile at a reasonable rate. Cloudflare's model is continuously updated, but it is an ongoing arms race.

What actually stops most bots

The visible Turnstile widget is not the main defense. Cloudflare's bot score from network-level signals (IP reputation, ASN, request rate, TLS fingerprint) catches far more traffic than the JS challenge does. A request from AWS Lambda with a clean User-Agent still has a datacenter ASN: that alone raises the bot score before any JS runs. Turnstile alone, validated client-side only, is weak. The combination of network scoring plus behavioral analysis is what makes the system effective.

How to actually protect your origin

Use Cloudflare Tunnel

This is the only approach that fully hides your origin IP. cloudflared opens an outbound connection from your server to Cloudflare's network. No open inbound ports, no IP to find.

cloudflared tunnel create my-tunnel
cloudflared tunnel route dns my-tunnel example.com
cloudflared tunnel run my-tunnel

Firewall your origin to Cloudflare IPs only

If you cannot use Tunnel, firewall your origin to Cloudflare IPs only. Cloudflare publishes its full IP range at cloudflare.com/ips-v4. Allow only those ranges on 80 and 443. Drop everything else.

Proxy every subdomain

Audit your DNS records. Every subdomain that should be proxied must have the orange cloud enabled. Grey-cloud records pointing to your origin are a bypass by design.

Keep mail on a separate IP

Your mail server should not share an IP or IP block with your web server.

Validate Turnstile server-side, always

The token must be verified by your backend on every form submission.

Check your certificate history now

Run your domain through crt.sh and SecurityTrails. If your old origin IP is visible, either move to a new IP (and use Tunnel going forward) or rely entirely on the firewall approach.

Originally published on jguillaumesio.com

Comments

No comments yet. Start the discussion.