Enterprise AI Chat Assistant with SSO & Compliance

Deploy a production-grade, retrieval-grounded AI assistant with enterprise SSO, auditability, threat modeling, per-tenant isolation, and GDPR controls.

Identity & Access

  • SAML & OIDC SSO (multi‑tenant mapping)
  • Role-based access (Admin / Editor / Viewer)
  • Just‑in‑time user provisioning; invite flows
  • Session + refresh token lifetime policies

Security & Compliance

  • Formal threat model (prompt injection, SSRF, data exfiltration)
  • PII redaction & field‑level encryption (sensitive fields)
  • Configurable data retention (90d → 730d)
  • Audit logging & export (GCS / BigQuery)

Reliability & Control

  • Adaptive relevance threshold (≤5% false positive goal)
  • Fallback reason telemetry (low score, timeout, provider error)
  • Per‑tenant prompt version history & rollback
  • Kill switch & forced re‑index controls

Architecture Highlights

CrawlBot AI runs as a GCP‑native microservice platform: Cloud Run for stateless services, MongoDB Atlas for operational data and vector search, and a provider‑agnostic LLM gateway (Gemini primary, OpenAI fallback). Tenant isolation enforced at service boundaries with scoped service accounts and per‑tenant metadata filters. Observability via OpenTelemetry traces + structured logs for every retrieval and answer synthesis path.

Security posture includes strict CSP/SRI for embed scripts, robots.txt compliance & domain allowlists for crawling, secret management via GCP Secret Manager, and quarterly threat model review. All changes to infrastructure are codified via Pulumi with preview + apply pipelines (no console drift).

Why Enterprises Choose CrawlBot AI

  • Fast time to value: crawl → configure → embed in under an hour.
  • Grounded answers with strict refusal when context insufficient.
  • Per‑embed analytics & audit trails build trust and ROI transparency.
  • Programmatic control (gRPC + upcoming admin APIs) for integration.

FAQ

SAML 2.0 and OpenID Connect (OIDC) at launch; SCIM user provisioning is on the roadmap.

Logical isolation via tenant IDs at every storage + retrieval boundary, row-level filters, and scoped service accounts; no cross-tenant vector queries.

Yes, with retention windows configurable per tenant (default 90 days for chat logs) and PII redaction/anonymization rules applied before persistence.

Foundational controls align with SOC2 readiness; formal threat model maintained; audit logging, principle of least privilege IAM, secret rotation schedule.