LLM Gateway Architecture for Reliability
LLM providers fail, throttle, or change behavior. A gateway keeps your assistant stable by encapsulating routing, retries, and logging.
Core responsibilities
- Provider abstraction (Gemini, OpenAI, Anthropic, etc.).
- Request shaping (prompt templates, metadata, temperature).
- Timeout and retry management with jitter.
- Circuit breaker state per provider and per tenant.
- Metrics and fallback_reason logging.
Flow overview
- Request arrives from answerer service with tenant context.
- Gateway selects primary provider (e.g., Gemini) based on policy.
- Apply timeout (e.g., 6 seconds) and retry rules (max 2, exponential backoff).
- On failure, log fallback_reason and switch to secondary provider.
- Return response plus metadata (token counts, provider used) to caller.
Design tips
- Keep prompts deterministic; version them alongside providers.
- Support per-tenant overrides (enterprise clients may mandate a specific provider).
- Expose admin controls to pause a provider globally.
- Emit metrics: latency percentiles, error rates, token usage by provider.
- Integrate with alerting (Google Chat, PagerDuty) when circuit breakers trip.
Security and compliance
- Store API keys in Secret Manager, rotate regularly.
- Mask keys in logs; only log request IDs, not prompts.
- Apply rate limits per provider and per tenant to avoid surprise bills.
CrawlBot example
CrawlBot’s gateway defaults to Gemini, falls back to OpenAI, records fallback_reason, and logs token usage per tenant. Replicate this pattern to keep LLM-dependent assistants resilient.***