Knowledge Bases
Knowledge bases (KBs) are the admin-managed content collections that power retrieval and chat.
This page explains:
- what you can do from the KB list and KB detail pages
- the KB lifecycle/status model
- common ingestion and maintenance operations
- practical troubleshooting
KB List Page (Overview)
The main KB list is designed for quick operations:
- search/filter knowledge bases
- see status at a glance (ready vs scraping vs failed)
- open chat quickly for a KB
- open configuration / maintenance actions
Common information shown per KB:
- Name (display name)
- Source URL (if URL-based)
- Status (queued/scraping/indexing/ready/failed/paused variants)
- Document count
- Last updated or processing timestamps
Creating a Knowledge Base
URL ingestion (most common)
Use this when the content is reachable by URL.
Typical inputs:
source_url- optional
name - optional
description - optional
preset_id(ingestion preset)
What happens after creation:
- KB record is created immediately.
- Background processing begins:
- scraping/extraction
- chunking/embedding (indexing)
- The status updates as the pipeline progresses.
File upload ingestion (if enabled)
Use this when you have a local file to ingest.
What to expect:
- the upload endpoint receives the file
- extracted text is stored as a document
- indexing runs similarly to URL ingestion
KB Lifecycle & Status Model
KBs move through a pipeline. The exact implementation can vary, but the operational meaning is consistent.
Common statuses
queued: waiting to start workscraping: fetching/extracting contentindexing: chunking and embedding contentready: searchable/chat-readyfailed: pipeline stopped due to an error
Paused variants
Paused variants may appear when an in-flight job is intentionally stopped:
- paused during scraping
- paused during indexing
Operational definition of “ready”
Treat ready as:
- documents exist
- index exists
- chat and search can run reliably
If a KB is “ready” but answers are poor, use document preview + tracing to diagnose retrieval quality.
KB Detail Page (Maintenance)
The KB detail view is the main “maintenance” screen.
Documents
Typical document operations:
- list documents in the KB
- search within documents
- preview extracted content
- delete a single document
When previewing a document, look for:
- missing sections (bad extraction)
- duplicated content (scrape loops)
- navigation boilerplate dominating (needs smarter extraction)
Embedded Test Chat
The KB detail page often includes an embedded test chat panel for fast iteration.
Use it to:
- run 2–3 quick smoke-test questions
- validate retrieval before sharing the full chat link
- compare behavior with/without citations (if configurable)
Configuration
Common configuration fields (names may vary by UI):
- Name / description
- Language
- Theme color / icon
- Hide sources: suppress visible citations in the chat UI
- Scheduling: automatic re-scrape settings (if available)
Reindexing and Rescraping
These operations exist because content and indexing can drift.
Reindex only
Use reindex when:
- the stored extracted content is correct
- you want to rebuild chunking/embedding
Rescrape then index
Use rescrape when:
- the upstream site/file changed
- extraction was wrong (missing/garbled text)
- you updated ingestion settings that affect extraction
Pause / Resume
Pause/resume is useful when:
- a scrape is taking too long with the wrong settings
- you need to temporarily reduce system load
- you want to stop and adjust parameters before continuing
Archiving vs Deletion
Archive
- removes the KB from the main list
- keeps it restorable (recommended when you may need it later)
Delete
- permanently removes the KB and its documents
- use only when you do not need recovery
Troubleshooting
KB is stuck in scraping or indexing
Check:
- whether there is a pause control available
- tracing/pipeline logs for an error or long-running step
- whether the source URL is slow or blocked
KB is failed
Common causes:
- unreachable URL / blocked by remote host
- extraction error (invalid document, unsupported format)
- indexing error (vector store connectivity, embedding errors)
Recommended workflow:
- Open KB detail → look for error message, document counts, and timestamps.
- Try rescrape with a smaller scope (less depth, fewer links) if available.
- Validate the source URL is reachable from the deployment environment.
Chat answers are irrelevant
Common causes:
- content was extracted incorrectly (garbage in → garbage out)
- KB is too broad or poorly structured (retrieval noise)
- embeddings/index not aligned with content (needs reindex)
Recommended workflow:
- Preview documents to confirm extracted text quality.
- Reindex the KB.
- Use tracing to inspect retrieved chunks for a query.