Manual Upload Workflows for AI Assistants
Manual uploads complement crawlers when content lives behind auth or needs curated ingestion.
Workflow
- Admin selects files (PDF, DOCX, HTML).
- Upload service scans for malware, validates size (<25 MB), and confirms MIME type.
- Extract text, chunk content with metadata (tenant, file_id, updated_at).
- Store original file in object storage with lifecycle policies.
- Schedule re-index job referencing the uploaded asset.
Safety controls
- Virus scan: ClamAV or cloud antivirus before storage.
- Quota: Limit total storage per tenant; alert when near cap.
- Access: Restrict downloads to authorized roles; audit every download.
- Versioning: Track file versions and allow rollback to prior revisions.
UX tips
- Show upload progress, extraction summary (word count, sample headings).
- Auto-tag content type (case study, legal, pricing) for retrieval filtering.
- Provide delete buttons with confirmation modals and audit logs.
CrawlBot approach
CrawlBot’s file-ingest microservice handles scanning, extraction, and metadata logging. Mirror this workflow to keep manual uploads safe and auditable.***