Skip to content

Context Knowledge Base

Overview

context.kb() returns a NodeKB facade that provides methods for indexing, searching, and answering queries on documents.

By default, the knowledge base scope is per-user and can be further customized using namespace and scope parameters. It replaces the original RAGFacade with support for multiple search backends including vector, lexical, and structural (standard DB) search.


1. Core API

upsert_docs(corpus_id, docs, *, kb_namespace)

Ingest or update documents in a corpus under the bound KB scope.

This forwards to backend.upsert_docs(...) and always injects scope=self.scope so dedupe and metadata scoping are applied by the configured backend implementation.

Examples:

Ingest inline text:

result = await context.kb().upsert_docs(
    corpus_id="product_docs",
    docs=[{"text": "Returns accepted for 30 days.", "labels": {"topic": "policy"}}],
    kb_namespace="support",
)

Ingest from file path with title:

result = await context.kb().upsert_docs(
    corpus_id="product_docs",
    docs=[{"path": "C:/docs/refund.md", "title": "Refund Policy"}],
)

Parameters:

Name Type Description Default
corpus_id str

Logical corpus identifier to write into.

required
docs list[dict[str, Any]]

Documents to ingest. Each item is typically path-based ({"path": ...}) or inline text ({"text": ...}), with optional title and labels.

required
kb_namespace str | None

Optional namespace partition inside the corpus.

None

Returns:

Type Description
dict[str, Any]

dict[str, Any]: Backend-defined ingestion summary such as added document and chunk counts.

Notes

Scope and index-level labels are derived from self.scope by the backend; callers should pass only document-level inputs.

search(corpus_id, query, top_k, kb_namespace, level, ... )

Retrieve relevant KB chunks for a query from the scoped corpus.

This method forwards all search controls to backend.search(...) and injects scope=self.scope so tenant and KB scope filters are applied consistently by the backend.

Examples:

Basic semantic retrieval:

hits = await context.kb().search(
    corpus_id="product_docs",
    query="refund timeline",
    top_k=5,
    kb_namespace="support",
)

Retrieval with filters and explicit mode:

hits = await context.kb().search(
    corpus_id="engineering_runbook",
    query="how to rotate credentials",
    filters={"labels.env": "prod"},
    mode="hybrid",
    lexical_rerank=True,
    created_at_min=1735707600.0,
)

Parameters:

Name Type Description Default
corpus_id str

Logical corpus identifier to query.

required
query str

User query text to retrieve against.

required
top_k int

Maximum number of hits to return.

10
kb_namespace str | None

Optional namespace partition inside the corpus.

None
filters Mapping[str, Any] | None

Optional metadata filters merged with scope-derived filters.

None
level ScopeLevel | None

Optional scope level hint for backend-specific behavior.

None
time_window str | None

Optional relative time filter (backend-defined format).

None
created_at_min float | None

Optional inclusive lower bound (epoch seconds).

None
created_at_max float | None

Optional inclusive upper bound (epoch seconds).

None
mode SearchMode | None

Optional backend search mode (semantic, lexical, hybrid, etc., depending on backend support).

None
lexical_rerank bool

Whether backend should apply lexical reranking when supported.

True

Returns:

Type Description
list[KBSearchHit]

list[KBSearchHit]: Normalized chunk hits ranked by backend scoring policy.

Notes

NodeKB does not merge filters itself; it forwards arguments and relies on backend semantics.

answer(corpus_id, question, *, style, kb_namespace)

Generate an answer using corpus retrieval plus backend QA logic.

This forwards to backend.answer(...) with scope=self.scope. The backend owns retrieval, prompting, and citation shaping, while this facade keeps caller code scope-safe and concise.

Examples:

Concise QA response:

result = await context.kb().answer(
    corpus_id="product_docs",
    question="What is the refund window?",
    style="concise",
    kb_namespace="support",
)

Detailed QA with metadata filtering:

result = await context.kb().answer(
    corpus_id="engineering_runbook",
    question="How should I recover from token leak?",
    style="detailed",
    filters={"labels.team": "platform"},
    k=8,
    mode="semantic",
)

Parameters:

Name Type Description Default
corpus_id str

Logical corpus identifier to query.

required
question str

Question to answer from indexed corpus content.

required
style str

Backend prompt style hint (for example concise or detailed).

'concise'
kb_namespace str | None

Optional namespace partition inside the corpus.

None
filters Mapping[str, Any] | None

Optional metadata filters merged with scoped filters by the backend.

None
k int

Retrieval depth used by backend QA before synthesis.

10
level ScopeLevel | None

Optional scope level hint for backend-specific behavior.

None
time_window str | None

Optional relative time filter.

None
created_at_min float | None

Optional inclusive lower bound (epoch seconds).

None
created_at_max float | None

Optional inclusive upper bound (epoch seconds).

None
mode SearchMode | None

Optional backend search mode used by retrieval.

None
lexical_rerank bool

Whether retrieval should apply lexical reranking when supported.

True

Returns:

Name Type Description
KBAnswer KBAnswer

Answer payload with text and citation metadata.

Notes

Empty retrieval handling (for example returning blank answer/citations) is backend-defined.

2. Doc Management

list_corpora()

List corpora visible to the bound scope.

This delegates to backend.list_corpora(scope=self.scope) so callers can enumerate available corpora without manually threading scope.

Examples:

List corpora for the current identity:

corpora = await context.kb().list_corpora()

Extract corpus ids for UI options:

corpus_ids = [row["corpus_id"] for row in await context.kb().list_corpora()]

Parameters:

Name Type Description Default
None

This method accepts no caller parameters.

required

Returns:

Type Description
list[dict[str, Any]]

list[dict[str, Any]]: Backend-provided corpus records.

Notes

Record shape is backend-defined, commonly including corpus_id and metadata.

list_docs(*, corpus_id, limit, after)

List documents for a corpus under the bound scope.

This forwards pagination parameters to backend.list_docs(...) and injects scope=self.scope automatically.

Examples:

First page of docs:

docs = await context.kb().list_docs(corpus_id="product_docs", limit=50)

Continue after a known document id:

next_docs = await context.kb().list_docs(
    corpus_id="product_docs",
    limit=50,
    after="doc_abc123",
)

Parameters:

Name Type Description Default
corpus_id str

Logical corpus identifier to inspect.

required
limit int

Maximum number of docs to return.

200
after str | None

Optional pagination cursor (backend-specific doc id semantics).

None

Returns:

Type Description
list[dict[str, Any]]

list[dict[str, Any]]: Backend-provided document metadata records.

Notes

Ordering and cursor semantics are backend-defined.

delete_docs(corpus_id, doc_ids)

Delete one or more documents from a corpus.

This delegates to backend.delete_docs(...) with the bound scope and returns backend deletion counters/status information.

Examples:

Delete a single document:

result = await context.kb().delete_docs(
    corpus_id="product_docs",
    doc_ids=["doc_abc123"],
)

Delete a batch:

result = await context.kb().delete_docs(
    corpus_id="engineering_runbook",
    doc_ids=["doc_1", "doc_2", "doc_3"],
)

Parameters:

Name Type Description Default
corpus_id str

Logical corpus identifier to mutate.

required
doc_ids list[str]

Document ids to remove.

required

Returns:

Type Description
dict[str, Any]

dict[str, Any]: Backend-defined deletion summary (for example removed docs/chunks).

Notes

Partial deletion behavior and error reporting are backend-defined.

reembed(*, corpus_id, doc_ids, batch)

Recompute embeddings for documents in a corpus.

This forwards to backend.reembed(...) with scope=self.scope. Backends typically re-upsert chunk vectors in batches.

Examples:

Re-embed all docs in a corpus:

result = await context.kb().reembed(corpus_id="product_docs")

Re-embed selected docs with smaller batch size:

result = await context.kb().reembed(
    corpus_id="engineering_runbook",
    doc_ids=["doc_abc123", "doc_def456"],
    batch=16,
)

Parameters:

Name Type Description Default
corpus_id str

Logical corpus identifier to process.

required
doc_ids list[str] | None

Optional subset of document ids. None means all docs.

None
batch int

Batch size hint for embedding/upsert loops.

64

Returns:

Type Description
dict[str, Any]

dict[str, Any]: Backend-defined re-embedding summary.

Notes

Embedding model name and exact counters are backend-defined.

stats(*, corpus_id)

Return corpus-level statistics for the bound scope.

This method forwards to backend.stats(...) with scope=self.scope and returns backend-provided counters/metadata.

Examples:

Fetch high-level stats:

stats = await context.kb().stats(corpus_id="product_docs")

Read document and chunk counts:

stats = await context.kb().stats(corpus_id="engineering_runbook")
docs = stats.get("docs", 0)
chunks = stats.get("chunks", 0)

Parameters:

Name Type Description Default
corpus_id str

Logical corpus identifier to inspect.

required

Returns:

Type Description
dict[str, Any]

dict[str, Any]: Backend-defined corpus statistics payload.

Notes

Metric names and additional fields depend on backend implementation.