Context Knowledge Base¶

Overview¶

context.kb() returns a NodeKB facade that provides methods for indexing, searching, and answering queries on documents.

By default, the knowledge base scope is per-user and can be further customized using namespace and scope parameters. It replaces the original RAGFacade with support for multiple search backends including vector, lexical, and structural (standard DB) search.

1. Core API¶

upsert_docs(corpus_id, docs, *, kb_namespace)

Ingest or update documents in a corpus under the bound KB scope.

This forwards to backend.upsert_docs(...) and always injects scope=self.scope so dedupe and metadata scoping are applied by the configured backend implementation.

Examples:

Ingest inline text:

result = await context.kb().upsert_docs(
    corpus_id="product_docs",
    docs=[{"text": "Returns accepted for 30 days.", "labels": {"topic": "policy"}}],
    kb_namespace="support",
)

Ingest from file path with title:

result = await context.kb().upsert_docs(
    corpus_id="product_docs",
    docs=[{"path": "C:/docs/refund.md", "title": "Refund Policy"}],
)

Parameters:

Name	Type	Description	Default
`corpus_id`	`str`	Logical corpus identifier to write into.	required
`docs`	`list[dict[str, Any]]`	Documents to ingest. Each item is typically path-based (`{"path": ...}`) or inline text (`{"text": ...}`), with optional `title` and `labels`.	required
`kb_namespace`	`str \| None`	Optional namespace partition inside the corpus.	`None`

Returns:

Type	Description
`dict[str, Any]`	dict[str, Any]: Backend-defined ingestion summary such as added document and chunk counts.

Notes

Scope and index-level labels are derived from self.scope by the backend; callers should pass only document-level inputs.

search(corpus_id, query, top_k, kb_namespace, level, ... )

Retrieve relevant KB chunks for a query from the scoped corpus.

This method forwards all search controls to backend.search(...) and injects scope=self.scope so tenant and KB scope filters are applied consistently by the backend.

Examples:

Basic semantic retrieval:

hits = await context.kb().search(
    corpus_id="product_docs",
    query="refund timeline",
    top_k=5,
    kb_namespace="support",
)

Retrieval with filters and explicit mode:

hits = await context.kb().search(
    corpus_id="engineering_runbook",
    query="how to rotate credentials",
    filters={"labels.env": "prod"},
    mode="hybrid",
    lexical_rerank=True,
    created_at_min=1735707600.0,
)

Parameters:

Name	Type	Description	Default
`corpus_id`	`str`	Logical corpus identifier to query.	required
`query`	`str`	User query text to retrieve against.	required
`top_k`	`int`	Maximum number of hits to return.	`10`
`kb_namespace`	`str \| None`	Optional namespace partition inside the corpus.	`None`
`filters`	`Mapping[str, Any] \| None`	Optional metadata filters merged with scope-derived filters.	`None`
`level`	`ScopeLevel \| None`	Optional scope level hint for backend-specific behavior.	`None`
`time_window`	`str \| None`	Optional relative time filter (backend-defined format).	`None`
`created_at_min`	`float \| None`	Optional inclusive lower bound (epoch seconds).	`None`
`created_at_max`	`float \| None`	Optional inclusive upper bound (epoch seconds).	`None`
`mode`	`SearchMode \| None`	Optional backend search mode (`semantic`, `lexical`, `hybrid`, etc., depending on backend support).	`None`
`lexical_rerank`	`bool`	Whether backend should apply lexical reranking when supported.	`True`

Returns:

Type	Description
`list[KBSearchHit]`	list[KBSearchHit]: Normalized chunk hits ranked by backend scoring policy.

Notes

NodeKB does not merge filters itself; it forwards arguments and relies on backend semantics.

answer(corpus_id, question, *, style, kb_namespace)

Generate an answer using corpus retrieval plus backend QA logic.

This forwards to backend.answer(...) with scope=self.scope. The backend owns retrieval, prompting, and citation shaping, while this facade keeps caller code scope-safe and concise.

Examples:

Concise QA response:

result = await context.kb().answer(
    corpus_id="product_docs",
    question="What is the refund window?",
    style="concise",
    kb_namespace="support",
)

Detailed QA with metadata filtering:

result = await context.kb().answer(
    corpus_id="engineering_runbook",
    question="How should I recover from token leak?",
    style="detailed",
    filters={"labels.team": "platform"},
    k=8,
    mode="semantic",
)

Parameters:

Name	Type	Description	Default
`corpus_id`	`str`	Logical corpus identifier to query.	required
`question`	`str`	Question to answer from indexed corpus content.	required
`style`	`str`	Backend prompt style hint (for example `concise` or `detailed`).	`'concise'`
`kb_namespace`	`str \| None`	Optional namespace partition inside the corpus.	`None`
`filters`	`Mapping[str, Any] \| None`	Optional metadata filters merged with scoped filters by the backend.	`None`
`k`	`int`	Retrieval depth used by backend QA before synthesis.	`10`
`level`	`ScopeLevel \| None`	Optional scope level hint for backend-specific behavior.	`None`
`time_window`	`str \| None`	Optional relative time filter.	`None`
`created_at_min`	`float \| None`	Optional inclusive lower bound (epoch seconds).	`None`
`created_at_max`	`float \| None`	Optional inclusive upper bound (epoch seconds).	`None`
`mode`	`SearchMode \| None`	Optional backend search mode used by retrieval.	`None`
`lexical_rerank`	`bool`	Whether retrieval should apply lexical reranking when supported.	`True`

Returns:

Name	Type	Description
`KBAnswer`	`KBAnswer`	Answer payload with text and citation metadata.

Notes

Empty retrieval handling (for example returning blank answer/citations) is backend-defined.

2. Doc Management¶

list_corpora()

List corpora visible to the bound scope.

This delegates to backend.list_corpora(scope=self.scope) so callers can enumerate available corpora without manually threading scope.

Examples:

List corpora for the current identity:

corpora = await context.kb().list_corpora()

Extract corpus ids for UI options:

corpus_ids = [row["corpus_id"] for row in await context.kb().list_corpora()]

Parameters:

Name	Type	Description	Default
`None`		This method accepts no caller parameters.	required

Returns:

Type	Description
`list[dict[str, Any]]`	list[dict[str, Any]]: Backend-provided corpus records.

Notes

Record shape is backend-defined, commonly including corpus_id and metadata.

list_docs(*, corpus_id, limit, after)

List documents for a corpus under the bound scope.

This forwards pagination parameters to backend.list_docs(...) and injects scope=self.scope automatically.

Examples:

First page of docs:

docs = await context.kb().list_docs(corpus_id="product_docs", limit=50)

Continue after a known document id:

next_docs = await context.kb().list_docs(
    corpus_id="product_docs",
    limit=50,
    after="doc_abc123",
)

Parameters:

Name	Type	Description	Default
`corpus_id`	`str`	Logical corpus identifier to inspect.	required
`limit`	`int`	Maximum number of docs to return.	`200`
`after`	`str \| None`	Optional pagination cursor (backend-specific doc id semantics).	`None`

Returns:

Type	Description
`list[dict[str, Any]]`	list[dict[str, Any]]: Backend-provided document metadata records.

Notes

Ordering and cursor semantics are backend-defined.

delete_docs(corpus_id, doc_ids)

Delete one or more documents from a corpus.

This delegates to backend.delete_docs(...) with the bound scope and returns backend deletion counters/status information.

Examples:

Delete a single document:

result = await context.kb().delete_docs(
    corpus_id="product_docs",
    doc_ids=["doc_abc123"],
)

Delete a batch:

result = await context.kb().delete_docs(
    corpus_id="engineering_runbook",
    doc_ids=["doc_1", "doc_2", "doc_3"],
)

Parameters:

Name	Type	Description	Default
`corpus_id`	`str`	Logical corpus identifier to mutate.	required
`doc_ids`	`list[str]`	Document ids to remove.	required

Returns:

Type	Description
`dict[str, Any]`	dict[str, Any]: Backend-defined deletion summary (for example removed docs/chunks).

Notes

Partial deletion behavior and error reporting are backend-defined.

reembed(*, corpus_id, doc_ids, batch)

Recompute embeddings for documents in a corpus.

This forwards to backend.reembed(...) with scope=self.scope. Backends typically re-upsert chunk vectors in batches.

Examples:

Re-embed all docs in a corpus:

result = await context.kb().reembed(corpus_id="product_docs")

Re-embed selected docs with smaller batch size:

result = await context.kb().reembed(
    corpus_id="engineering_runbook",
    doc_ids=["doc_abc123", "doc_def456"],
    batch=16,
)

Parameters:

Name	Type	Description	Default
`corpus_id`	`str`	Logical corpus identifier to process.	required
`doc_ids`	`list[str] \| None`	Optional subset of document ids. `None` means all docs.	`None`
`batch`	`int`	Batch size hint for embedding/upsert loops.	`64`

Returns:

Type	Description
`dict[str, Any]`	dict[str, Any]: Backend-defined re-embedding summary.

Notes

Embedding model name and exact counters are backend-defined.

stats(*, corpus_id)

Return corpus-level statistics for the bound scope.

This method forwards to backend.stats(...) with scope=self.scope and returns backend-provided counters/metadata.

Examples:

Fetch high-level stats:

stats = await context.kb().stats(corpus_id="product_docs")

Read document and chunk counts:

stats = await context.kb().stats(corpus_id="engineering_runbook")
docs = stats.get("docs", 0)
chunks = stats.get("chunks", 0)

Parameters:

Name	Type	Description	Default
`corpus_id`	`str`	Logical corpus identifier to inspect.	required

Returns:

Type	Description
`dict[str, Any]`	dict[str, Any]: Backend-defined corpus statistics payload.

Notes

Metric names and additional fields depend on backend implementation.