Definition
Latency measures how long users wait for your product to respond. P50 (median) represents the typical experience, while P99 (99th percentile) captures the worst 1% of experiences — which is often where user frustration and abandonment live. Monitoring both is essential because averages hide outliers: a product with 200ms average latency might have a P99 of 5 seconds, meaning 1 in 100 requests is painfully slow. Studies consistently show that every 100ms increase in page load time reduces conversion by 1–2%.
How to measure
Instrument via APM (Application Performance Monitoring) tools like Datadog, New Relic, or Grafana. For frontend, use Real User Monitoring (RUM) to capture actual user experience. Report P50 and P99 for critical paths (page loads, API calls, checkout flows). Set alerts on P99 exceeding thresholds.
Industry benchmarks
Web pages: P50 under 1s, P99 under 3s. APIs: P50 under 200ms, P99 under 800ms. Checkout/payment flows: P50 under 500ms. Google recommends LCP (Largest Contentful Paint) under 2.5s for good Core Web Vitals scores.