Index Diff Review Workflows
Every crawl run should produce a diff you can inspect. It keeps operations honest and speeds up debugging.
Diff ingredients
- Page URL and canonical hash
- Previous vs current token counts
- Chunk IDs added, updated, removed
- Language and metadata (plan tier, page type)
- Crawl run ID and timestamp
Workflow
- Run crawl.
- Generate diff report (JSON/CSV).
- Auto-highlight high impact changes (token delta >20 percent, new FAQs, removed pricing tables).
- Notify ops via Chat with summary metrics.
- Review diffs when fallback_reason=low_score spikes.
Tooling tips
- Store diffs in object storage for 90 days.
- Provide diffs to tenants when they request change history.
- Link diffs to analytics annotations so you know why metrics moved.
CrawlBot implementation
CrawlBot’s indexer writes diff records with chunk IDs and metadata. Use similar workflows to keep your assistant trustable.***