Why log fallback reasons?

They tell you whether hallucinations stem from retrieval quality, provider hiccups, or prompt overflow, letting you fix the right layer fast.

What categories matter most?

Start with low_score, no_context, provider_error, timeout, context_overflow. CrawlBot records these automatically per message.

How often should I review?

Daily during launch and weekly afterward. Alert when any category spikes above 5 percent for a tenant or embed.

LLM Fallback Reason Triage

2/24/2025

llm • observability • fallback • rag

LLM Fallback Reason Triage

Grounded assistants rely on retrieval quality and reliable LLM providers. When something goes wrong, the model should refuse and log why. Fallback reasons turn a vague “I don’t know” into actionable telemetry. Here is how to triage them systematically.

Core categories

Fallback reason	Meaning	Next action
low_score	Retrieval confidence below threshold	Expand crawl scope, tune chunking, adjust thresholds cautiously.
no_context	Nothing retrieved due to empty corpus filters	Verify tenant allowlists, language filters, or embed tokens.
provider_error	Upstream LLM rejected the call (5xx, throttling)	Check LLM gateway retries, provider status, failover to alternate model.
timeout	Provider or network exceeded SLA	Tune timeout budget, add jitter, or reduce payload size.
context_overflow	Prompt + excerpts exceed context window	Trim preamble, reduce number of chunks, or upgrade model window.

CrawlBot stores these reasons per message with timestamps, tenant IDs, embed IDs, and retrieval metadata.

Triage workflow

Monitor dashboards: Plot fallback_reason counts per tenant and embed. Set alerts when any category exceeds 5 percent of chats.
Drill into transcripts: For low_score/no_context, review retrieved chunks, scores, and citations to spot crawl gaps.
Check LLM gateway logs: Correlate provider_error/timeout spikes with upstream provider incidents; ensure failover to OpenAI or Gemini fallback stays healthy.
Validate prompts: Context_overflow often signals prompts with too much boilerplate; ensure instructions stay tight and reference citations succinctly.
Feed results to ops: Tag fallback events with severity and route them to the AI ops Chat space. Include direct links to CrawlBot’s analytics view.

Prevention tactics

Keep crawl runs fresh (IndexNow + scheduled crawls) to avoid low_score from stale content.
Use adaptive thresholds seeded from P95 historical scores per AGENTS spec.
Trim citations to the minimum required per policy; long legal disclaimers eat context budget.
Store fallback reason along with user feedback. If thumbs-down correlate with provider_error, escalate to LLM vendor.

Reporting cadence

Daily: Quick scan of fallback charts for enterprise tenants and demo environments.
Weekly: Generate a summary with counts per reason, top affected tenants, and remediation status.
Quarterly: Review threshold settings, prompt boilerplate, and crawler coverage using aggregated fallback trends.

Fallback reason triage keeps AI quality measurable. Instead of shipping band-aid prompts, use these signals to fix the right layer: crawl scope, retrieval thresholds, or provider reliability.***

Next Step

Evaluation Guide (Score Vendors)

Use rubric to compare solutions

Enterprise Security & SLA

Controls, retention, guarantees

Start Free 5‑Page Crawl

Hands-on trial environment