Documentation

INFEREN

Clear guides, reference pages, and operational notes in one place.

Ready to read

Knowledge Bases

Knowledge bases (KBs) are the admin-managed content collections that power retrieval and chat.

This page explains:

  • what you can do from the KB list and KB detail pages
  • the KB lifecycle/status model
  • common ingestion and maintenance operations
  • practical troubleshooting

KB List Page (Overview)

The main KB list is designed for quick operations:

  • search/filter knowledge bases
  • see status at a glance (ready vs scraping vs failed)
  • open chat quickly for a KB
  • open configuration / maintenance actions

Common information shown per KB:

  • Name (display name)
  • Source URL (if URL-based)
  • Status (queued/scraping/indexing/ready/failed/paused variants)
  • Document count
  • Last updated or processing timestamps

Creating a Knowledge Base

URL ingestion (most common)

Use this when the content is reachable by URL.

Typical inputs:

  • source_url
  • optional name
  • optional description
  • optional preset_id (ingestion preset)

What happens after creation:

  1. KB record is created immediately.
  2. Background processing begins:
  3. scraping/extraction
  4. chunking/embedding (indexing)
  5. The status updates as the pipeline progresses.

File upload ingestion (if enabled)

Use this when you have a local file to ingest.

What to expect:

  • the upload endpoint receives the file
  • extracted text is stored as a document
  • indexing runs similarly to URL ingestion

KB Lifecycle & Status Model

KBs move through a pipeline. The exact implementation can vary, but the operational meaning is consistent.

Common statuses

  • queued: waiting to start work
  • scraping: fetching/extracting content
  • indexing: chunking and embedding content
  • ready: searchable/chat-ready
  • failed: pipeline stopped due to an error

Paused variants

Paused variants may appear when an in-flight job is intentionally stopped:

  • paused during scraping
  • paused during indexing

Operational definition of “ready”

Treat ready as:

  • documents exist
  • index exists
  • chat and search can run reliably

If a KB is “ready” but answers are poor, use document preview + tracing to diagnose retrieval quality.

KB Detail Page (Maintenance)

The KB detail view is the main “maintenance” screen.

Documents

Typical document operations:

  • list documents in the KB
  • search within documents
  • preview extracted content
  • delete a single document

When previewing a document, look for:

  • missing sections (bad extraction)
  • duplicated content (scrape loops)
  • navigation boilerplate dominating (needs smarter extraction)

Embedded Test Chat

The KB detail page often includes an embedded test chat panel for fast iteration.

Use it to:

  • run 2–3 quick smoke-test questions
  • validate retrieval before sharing the full chat link
  • compare behavior with/without citations (if configurable)

Configuration

Common configuration fields (names may vary by UI):

  • Name / description
  • Language
  • Theme color / icon
  • Hide sources: suppress visible citations in the chat UI
  • Scheduling: automatic re-scrape settings (if available)

Reindexing and Rescraping

These operations exist because content and indexing can drift.

Reindex only

Use reindex when:

  • the stored extracted content is correct
  • you want to rebuild chunking/embedding

Rescrape then index

Use rescrape when:

  • the upstream site/file changed
  • extraction was wrong (missing/garbled text)
  • you updated ingestion settings that affect extraction

Pause / Resume

Pause/resume is useful when:

  • a scrape is taking too long with the wrong settings
  • you need to temporarily reduce system load
  • you want to stop and adjust parameters before continuing

Archiving vs Deletion

Archive

  • removes the KB from the main list
  • keeps it restorable (recommended when you may need it later)

Delete

  • permanently removes the KB and its documents
  • use only when you do not need recovery

Troubleshooting

KB is stuck in scraping or indexing

Check:

  • whether there is a pause control available
  • tracing/pipeline logs for an error or long-running step
  • whether the source URL is slow or blocked

KB is failed

Common causes:

  • unreachable URL / blocked by remote host
  • extraction error (invalid document, unsupported format)
  • indexing error (vector store connectivity, embedding errors)

Recommended workflow:

  1. Open KB detail → look for error message, document counts, and timestamps.
  2. Try rescrape with a smaller scope (less depth, fewer links) if available.
  3. Validate the source URL is reachable from the deployment environment.

Chat answers are irrelevant

Common causes:

  • content was extracted incorrectly (garbage in → garbage out)
  • KB is too broad or poorly structured (retrieval noise)
  • embeddings/index not aligned with content (needs reindex)

Recommended workflow:

  1. Preview documents to confirm extracted text quality.
  2. Reindex the KB.
  3. Use tracing to inspect retrieved chunks for a query.