Skip to content
shiva.wm
--:--

maps-scraper

beating google maps' result cap, engineered as a distributed system

productioncrawling / data2024 — present

A production-grade Google Maps business-data crawler that defeats Google's hard ~120-result-per-search cap by subdividing any rectangle or freehand polygon into overlapping grid cells (each kept under the cap) using a Turf.js geometry engine with per-cell optimal-zoom selection and polygon-intersection filtering. It is built as a genuine distributed system rather than a script: a Fastify API and a separate BullMQ/Valkey worker coordinate through a singleton-worker lock, per-cell distributed locks, and atomic SELECT … FOR UPDATE SKIP LOCKED cell claiming, while a self-healing health monitor recovers stalled cells, orphaned jobs and abandoned locks after browser-pool crashes with no human intervention.

The scraping layer drives pooled, stealth-hardened headless Chrome with network-pattern-based waiting, classified CAPTCHA/rate-limit detection, and an asymmetric proxy health-scoring system (EMA-smoothed latency, weighted success rate, +10/−20 over a 0–100 range) that rotates residential and premium proxies by quality.

Results land in a 10-table PostgreSQL schema with earthdistance geo-indexing and tsvector full-text search, evolved through 12 reversible migrations, and surface through a large React/Ant Design dashboard with job wizards, live Leaflet grid visualization and CSV/JSON/XLSX export. The whole stack runs as a PM2-managed three-process deployment over Dockerized PostgreSQL and Valkey, instrumented with Prometheus metrics and structured per-cell telemetry — roughly 61,000 lines across 92 backend modules and 143 frontend files, reflecting end-to-end ownership from geometry math to operational tooling.

stack
TypeScriptNode.jsFastifyBullMQValkey / RedisPostgreSQLearthdistance / tsvector@turf/turf@googlemaps/placesPuppeteer (stealth, cluster)PrometheusReactAnt DesignReact LeafletPM2Docker