Skip to content
shiva.wm
--:--

reverse-image-search

reverse image search at scale, engineered to stay undetected

productioncrawling / anti-bot2024 — present

A backend crawler that performs reverse image search at scale against a major provider while staying ahead of layered bot detection. Anti-detection is a first-class subsystem: a stealth-hardened headless-browser fork plus a stealth plugin and a dozen hand-written runtime patches, dynamic client-hint data kept coherent with rotated user agents, pre-seeded consent cookies across many regional domains, randomized viewports, and a deliberately fast interaction path that races client-side telemetry.

Results are recovered with a dual strategy — primary interception of the provider's internal batch RPC (stripping its security prefix and walking the protobuf-as-JSON payloads) with DOM scraping as fallback, then merged — before passing through a three-layer deduplication pipeline spanning an in-memory set, a relational store and a document store. The runtime splits into a Fastify REST and WebSocket API and a queue worker guarded by an atomic distributed lock with a heartbeat, Lua-atomic refresh/release and stale-lock stealing, plus a browser-cluster pool that gives every image a fully isolated browser with stall detection and automatic restart.

A self-healing proxy layer adds weighted multi-factor health scoring with cooldown/recovery and live rotation strategies, while a 34-file React/Ant Design dashboard streams per-image step progress over WebSocket — built end to end in strict TypeScript (~7,500 LOC across 39 backend modules) and hardened through a documented six-round internal audit that resolved roughly 130 issues.

stack
TypeScriptNode.jsPuppeteer (stealth fork)puppeteer-extra-plugin-stealthpuppeteer-clusterBullMQValkey / RedisMongoDBMariaDBFastifyWebSocketZodReactAnt DesignDocker