Overview
A mid-market B2B SaaS platform (anonymized) struggled with low documentation discoverability and rising support volume. CrawlBot AI was implemented to deliver instant, context-grounded answers and surface long-tail product content.
| Baseline → 60 Days | Before | After | Delta |
|---|---|---|---|
| Avg Session Depth | 2.4 pages | 4.9 pages | +104% |
| Qualified Lead Rate | 18% | 37% | +106% |
| L1 Ticket Deflection (Containment) | 22% | 63% | +41pp |
| Median First Response (Support) | 58s | 1.9s | -56.1s |
| Hallucination Rate (sampled) | 11% | 3.2% | -7.8pp |
Goals & Constraints
- Increase mid‑funnel education without bloating human chat staffing.
- Improve accuracy & trust vs legacy keyword chatbot.
- Provide measurable analytics to justify broader rollout.
Implementation Timeline
| Week | Activities | Outputs |
|---|---|---|
| 0 | Content scoping, sitemap crawl, allow/deny rules | Indexed 487 pages |
| 1 | Baseline retrieval tuning, prompt guardrail pass | Relevance threshold set @ 0.82 |
| 2 | Launch shadow mode, collect anonymized queries | 1,900 queries logged |
| 3 | Live cutover + escalation rules | 68% containment first week |
| 4–8 | Iterations: add clarifications, re-index deltas | Hallucination below 4% |
Key Levers
- Content Dedup & Canonicalization: removed outdated pages reducing retrieval noise.
- Semantic + Lexical Hybrid Retrieval: improved niche config query recall.
- Refusal + Escalation Policy: protected trust while routing complex multi‑system issues.
- Weekly Evaluation Harness: sampled 100 queries → tracked precision@5, refusal appropriateness.
- Lead Form Timing Experiment: shifting form after second value answer raised completion 19%.
Architecture Snapshot
- Multi-tenant isolation via tenant metadata filters at Qdrant layer
- Provider abstraction: Gemini primary, OpenAI fallback on timeout/error
- Observability: OpenTelemetry traces per answer (crawl → retrieve → synthesize)
- Adaptive threshold: auto monitors false positive drift; small negative adjustment after recall test
Results Analysis
Engagement depth doubled primarily due to contextual deep links in answers. Support deflection accelerated onboarding; human agents refocused on escalated edge cases. Lower hallucination built internal confidence leading to expansion across additional product lines.
Lessons Learned
- Invest early in content hygiene; retrieval tuning ROI multiplies.
- Track refusal taxonomy—over-refusal hides recall issues; under-refusal inflates hallucinations.
- Treat lead prompt placement as an iterative experiment, not static design.
Next Steps for the Client
- Expand multilingual content indexing (ES, DE)
- Integrate CRM enrichment (company firmographics)
- Add evaluation automation (nightly regression queries)