#mobile-menu-toggle { display: none !important; }

Hire me Resume (PDF)

▸ Tag · #kv-cache

Posts tagged #kv-cache.

2 posts with this tag.

← Back to all posts

Architecture May 25, 2026

Scaling on demand: smart auto-scaling for modern AI apps

CPU autoscaling is a lie for GPU workloads. Why queue depth, KV-cache pressure, and TTFT beat CPU as scaling triggers — KEDA-driven patterns, ARIMA forecasting, and composite metrics that scale your AI SaaS before users hit the spinner.

Read post →
Architecture May 24, 2026

GPU-aware load balancing: managing AI compute like a pro

Round-robin is a relic when LLM requests span 50 tokens to 50,000. Prefill vs decode disaggregation, KV-cache-aware routing, prefix matching, and the four metrics that matter — how to route AI traffic so your P99 stops bleeding.

Read post →