Skip to content
shiva.wm
--:--
~/writing/detection-resistant-crawling.md0% · [esc] back

Detection-resistant crawling at web scale

2026·06·01 · 2 min · #scraping #infra

Every anti-bot vendor sells the same promise: we'll tell humans from machines. The truth is messier — detection is a spectrum, and at scale you're not trying to look human, you're trying to be boring enough that nobody bothers looking.

After a few years of running the crawler infrastructure behind an image-rights platform, here's the mental model I keep coming back to.

The fingerprint is the tell

Most crawlers die on the handshake, not the page. TLS cipher ordering, header casing, the JS environment — each is a fingerprint, and a mismatch between what your headers claim and what your runtime actually is gets you flagged before you render a single pixel.

// crawler.ts — match the runtime to the claim
const ctx = await pool.acquire({ proxy: residential });
await page.emulateFingerprint(ctx.profile);
return page.goto(url, { waitUntil: "networkidle" });

The win isn't a single clever spoof. It's that the IP, the TLS stack, the headers, and the JS surface all agree with one another. Coherence is the product.

Residential proxy pools

Datacenter IPs are a tell on their own. A pool of residential exits, rotated per-session and sticky for the lifetime of a task, removes the cheapest signal a WAF has. The hard part isn't acquiring exits — it's health-checking them continuously and retiring the ones that have already been burned, before they burn your task.

Human-shaped pacing

Bots are fast and regular; people are slow and noisy. Jitter your timings, respect the implicit rate a real session would produce, and back off the moment a host gets defensive. A crawler that waits outlives one that's clever.

What still breaks

Nothing here is permanent — it's an arms race, and the other side ships too. The infrastructure that lasts is the one you can observe: every request labeled, every block surfaced, every burned exit accounted for. You don't win detection. You stay boring, and you stay watching.