BACKEND SYSTEM

Distributed URL Shortener

Production-Grade URL Shortening Platform

Redirect p99: 4.13ms
Cache hit rate: 99.98%
Throughput: 100+ RPS
ID block size: 1,000

▸ The Problem

Shortening a URL is trivial; serving redirects at speed while surviving abuse and capturing analytics is not. The system had to keep redirect latency near-zero, protect itself from hot-key floods and scrapers, and record every click without slowing the redirect path.

▸ Architecture & System Design

01
Redirect hot path: request → Redis Lua token-bucket rate limiter → Redis lookup → 302 redirect. The database is never touched on a warm-cache hit.
02
Base62 short-code generation uses PostgreSQL sequence blocks of 1,000 IDs per instance, dispensed via AtomicLong — eliminating per-request database calls for ID generation.
03
Read-through Redis caching with 1-hour TTL; cache misses fall back to PostgreSQL, then warm Redis. Cache self-heals after restarts with zero operator intervention.
04
Click analytics are fire-and-forget: events enqueue into a bounded LinkedBlockingQueue (capacity 10,000), then a scheduler batch-inserts 500-row chunks to PostgreSQL — analytics load never touches redirect latency.
05
Redis Lua script guarantees atomic two-tier quota enforcement: 100 req/min for anonymous IPs, 1,000 req/min for authenticated users — abuse becomes cheap 429s, not database load.
06
Micrometer + Prometheus metrics exposed through authenticated /actuator/prometheus; GitHub Actions CI with Docker-based health checks validates each commit.

▸ Engineering Decisions

Redis in the read path, not as decoration

Redirects are a 99%-read workload, so the design optimizes for the cache hit: sub-millisecond Redis lookup with PostgreSQL as durable fallback. Cache-aside (vs write-through) keeps writes simple and tolerates cold starts without complex invalidation.

Analytics must never block a redirect

Click recording is fully decoupled from the response path. Increment in Redis, flush in batches. A slow analytics write can cost a data point — it can never cost a user-facing millisecond.

Batch ID allocation for high throughput and short codes

Allocating 1,000-ID blocks per instance trades small crash gaps (lost IDs between block boundary and crash) for high throughput — no per-redirect database round-trips, short Base62 codes, and no distributed locking.

▸ Tech Stack

Java 17Spring Boot 3PostgreSQL 16Redis 7Docker ComposeGitHub ActionsFlywayJJWTBCryptMicrometerPrometheusJUnit 5MockitoTestcontainers

▸ Lessons Learned

▸Design for the dominant access pattern — a 99%-read system should look nothing like a CRUD app; put Redis in the read path, not as a nice-to-have.
▸Every external call on a hot path is a liability. Move everything non-essential off it — analytics, metrics, logging all run async.
▸Lua scripts in Redis are the right tool for atomic rate limiting — they execute transactionally, no race conditions, no distributed lock overhead.

Want the full system walkthrough?

Get in touch