Shortening a URL is trivial; serving redirects at speed while surviving abuse and capturing analytics is not. The system had to keep redirect latency near-zero, protect itself from hot-key floods and scrapers, and record every click without slowing the redirect path.
▸ Architecture & System Design
01
Redirect hot path: request → Redis Lua token-bucket rate limiter → Redis lookup → 302 redirect. The database is never touched on a warm-cache hit.
02
Base62 short-code generation uses PostgreSQL sequence blocks of 1,000 IDs per instance, dispensed via AtomicLong — eliminating per-request database calls for ID generation.
03
Read-through Redis caching with 1-hour TTL; cache misses fall back to PostgreSQL, then warm Redis. Cache self-heals after restarts with zero operator intervention.
04
Click analytics are fire-and-forget: events enqueue into a bounded LinkedBlockingQueue (capacity 10,000), then a scheduler batch-inserts 500-row chunks to PostgreSQL — analytics load never touches redirect latency.
05
Redis Lua script guarantees atomic two-tier quota enforcement: 100 req/min for anonymous IPs, 1,000 req/min for authenticated users — abuse becomes cheap 429s, not database load.
06
Micrometer + Prometheus metrics exposed through authenticated /actuator/prometheus; GitHub Actions CI with Docker-based health checks validates each commit.
▸ Engineering Decisions
Redis in the read path, not as decoration
Redirects are a 99%-read workload, so the design optimizes for the cache hit: sub-millisecond Redis lookup with PostgreSQL as durable fallback. Cache-aside (vs write-through) keeps writes simple and tolerates cold starts without complex invalidation.
Analytics must never block a redirect
Click recording is fully decoupled from the response path. Increment in Redis, flush in batches. A slow analytics write can cost a data point — it can never cost a user-facing millisecond.
Batch ID allocation for high throughput and short codes
Allocating 1,000-ID blocks per instance trades small crash gaps (lost IDs between block boundary and crash) for high throughput — no per-redirect database round-trips, short Base62 codes, and no distributed locking.