Context Knowledge Base¶
Overview¶
context.kb() returns a NodeKB facade that provides methods for indexing, searching, and answering queries on documents.
By default, the knowledge base scope is per-user and can be further customized using namespace and scope parameters. It replaces the original RAGFacade with support for multiple search backends including vector, lexical, and structural (standard DB) search.
1. Core API¶
upsert_docs(corpus_id, docs, *, kb_namespace)
Ingest or update documents in a corpus under the bound KB scope.
This forwards to backend.upsert_docs(...) and always injects
scope=self.scope so dedupe and metadata scoping are applied by the
configured backend implementation.
Examples:
Ingest inline text:
result = await context.kb().upsert_docs(
corpus_id="product_docs",
docs=[{"text": "Returns accepted for 30 days.", "labels": {"topic": "policy"}}],
kb_namespace="support",
)
Ingest from file path with title:
result = await context.kb().upsert_docs(
corpus_id="product_docs",
docs=[{"path": "C:/docs/refund.md", "title": "Refund Policy"}],
)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
corpus_id
|
str
|
Logical corpus identifier to write into. |
required |
docs
|
list[dict[str, Any]]
|
Documents to ingest. Each item is typically path-based
( |
required |
kb_namespace
|
str | None
|
Optional namespace partition inside the corpus. |
None
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
dict[str, Any]: Backend-defined ingestion summary such as added document and chunk counts. |
Notes
Scope and index-level labels are derived from self.scope by the
backend; callers should pass only document-level inputs.
search(corpus_id, query, top_k, kb_namespace, level, ... )
Retrieve relevant KB chunks for a query from the scoped corpus.
This method forwards all search controls to backend.search(...) and
injects scope=self.scope so tenant and KB scope filters are applied
consistently by the backend.
Examples:
Basic semantic retrieval:
hits = await context.kb().search(
corpus_id="product_docs",
query="refund timeline",
top_k=5,
kb_namespace="support",
)
Retrieval with filters and explicit mode:
hits = await context.kb().search(
corpus_id="engineering_runbook",
query="how to rotate credentials",
filters={"labels.env": "prod"},
mode="hybrid",
lexical_rerank=True,
created_at_min=1735707600.0,
)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
corpus_id
|
str
|
Logical corpus identifier to query. |
required |
query
|
str
|
User query text to retrieve against. |
required |
top_k
|
int
|
Maximum number of hits to return. |
10
|
kb_namespace
|
str | None
|
Optional namespace partition inside the corpus. |
None
|
filters
|
Mapping[str, Any] | None
|
Optional metadata filters merged with scope-derived filters. |
None
|
level
|
ScopeLevel | None
|
Optional scope level hint for backend-specific behavior. |
None
|
time_window
|
str | None
|
Optional relative time filter (backend-defined format). |
None
|
created_at_min
|
float | None
|
Optional inclusive lower bound (epoch seconds). |
None
|
created_at_max
|
float | None
|
Optional inclusive upper bound (epoch seconds). |
None
|
mode
|
SearchMode | None
|
Optional backend search mode ( |
None
|
lexical_rerank
|
bool
|
Whether backend should apply lexical reranking when supported. |
True
|
Returns:
| Type | Description |
|---|---|
list[KBSearchHit]
|
list[KBSearchHit]: Normalized chunk hits ranked by backend scoring policy. |
Notes
NodeKB does not merge filters itself; it forwards arguments and
relies on backend semantics.
answer(corpus_id, question, *, style, kb_namespace)
Generate an answer using corpus retrieval plus backend QA logic.
This forwards to backend.answer(...) with scope=self.scope. The
backend owns retrieval, prompting, and citation shaping, while this
facade keeps caller code scope-safe and concise.
Examples:
Concise QA response:
result = await context.kb().answer(
corpus_id="product_docs",
question="What is the refund window?",
style="concise",
kb_namespace="support",
)
Detailed QA with metadata filtering:
result = await context.kb().answer(
corpus_id="engineering_runbook",
question="How should I recover from token leak?",
style="detailed",
filters={"labels.team": "platform"},
k=8,
mode="semantic",
)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
corpus_id
|
str
|
Logical corpus identifier to query. |
required |
question
|
str
|
Question to answer from indexed corpus content. |
required |
style
|
str
|
Backend prompt style hint (for example |
'concise'
|
kb_namespace
|
str | None
|
Optional namespace partition inside the corpus. |
None
|
filters
|
Mapping[str, Any] | None
|
Optional metadata filters merged with scoped filters by the backend. |
None
|
k
|
int
|
Retrieval depth used by backend QA before synthesis. |
10
|
level
|
ScopeLevel | None
|
Optional scope level hint for backend-specific behavior. |
None
|
time_window
|
str | None
|
Optional relative time filter. |
None
|
created_at_min
|
float | None
|
Optional inclusive lower bound (epoch seconds). |
None
|
created_at_max
|
float | None
|
Optional inclusive upper bound (epoch seconds). |
None
|
mode
|
SearchMode | None
|
Optional backend search mode used by retrieval. |
None
|
lexical_rerank
|
bool
|
Whether retrieval should apply lexical reranking when supported. |
True
|
Returns:
| Name | Type | Description |
|---|---|---|
KBAnswer |
KBAnswer
|
Answer payload with text and citation metadata. |
Notes
Empty retrieval handling (for example returning blank answer/citations) is backend-defined.
2. Doc Management¶
list_corpora()
List corpora visible to the bound scope.
This delegates to backend.list_corpora(scope=self.scope) so callers can
enumerate available corpora without manually threading scope.
Examples:
List corpora for the current identity:
corpora = await context.kb().list_corpora()
Extract corpus ids for UI options:
corpus_ids = [row["corpus_id"] for row in await context.kb().list_corpora()]
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
None
|
This method accepts no caller parameters. |
required |
Returns:
| Type | Description |
|---|---|
list[dict[str, Any]]
|
list[dict[str, Any]]: Backend-provided corpus records. |
Notes
Record shape is backend-defined, commonly including corpus_id and
metadata.
list_docs(*, corpus_id, limit, after)
List documents for a corpus under the bound scope.
This forwards pagination parameters to backend.list_docs(...) and
injects scope=self.scope automatically.
Examples:
First page of docs:
docs = await context.kb().list_docs(corpus_id="product_docs", limit=50)
Continue after a known document id:
next_docs = await context.kb().list_docs(
corpus_id="product_docs",
limit=50,
after="doc_abc123",
)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
corpus_id
|
str
|
Logical corpus identifier to inspect. |
required |
limit
|
int
|
Maximum number of docs to return. |
200
|
after
|
str | None
|
Optional pagination cursor (backend-specific doc id semantics). |
None
|
Returns:
| Type | Description |
|---|---|
list[dict[str, Any]]
|
list[dict[str, Any]]: Backend-provided document metadata records. |
Notes
Ordering and cursor semantics are backend-defined.
delete_docs(corpus_id, doc_ids)
Delete one or more documents from a corpus.
This delegates to backend.delete_docs(...) with the bound scope and
returns backend deletion counters/status information.
Examples:
Delete a single document:
result = await context.kb().delete_docs(
corpus_id="product_docs",
doc_ids=["doc_abc123"],
)
Delete a batch:
result = await context.kb().delete_docs(
corpus_id="engineering_runbook",
doc_ids=["doc_1", "doc_2", "doc_3"],
)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
corpus_id
|
str
|
Logical corpus identifier to mutate. |
required |
doc_ids
|
list[str]
|
Document ids to remove. |
required |
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
dict[str, Any]: Backend-defined deletion summary (for example removed docs/chunks). |
Notes
Partial deletion behavior and error reporting are backend-defined.
reembed(*, corpus_id, doc_ids, batch)
Recompute embeddings for documents in a corpus.
This forwards to backend.reembed(...) with scope=self.scope.
Backends typically re-upsert chunk vectors in batches.
Examples:
Re-embed all docs in a corpus:
result = await context.kb().reembed(corpus_id="product_docs")
Re-embed selected docs with smaller batch size:
result = await context.kb().reembed(
corpus_id="engineering_runbook",
doc_ids=["doc_abc123", "doc_def456"],
batch=16,
)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
corpus_id
|
str
|
Logical corpus identifier to process. |
required |
doc_ids
|
list[str] | None
|
Optional subset of document ids. |
None
|
batch
|
int
|
Batch size hint for embedding/upsert loops. |
64
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
dict[str, Any]: Backend-defined re-embedding summary. |
Notes
Embedding model name and exact counters are backend-defined.
stats(*, corpus_id)
Return corpus-level statistics for the bound scope.
This method forwards to backend.stats(...) with scope=self.scope and
returns backend-provided counters/metadata.
Examples:
Fetch high-level stats:
stats = await context.kb().stats(corpus_id="product_docs")
Read document and chunk counts:
stats = await context.kb().stats(corpus_id="engineering_runbook")
docs = stats.get("docs", 0)
chunks = stats.get("chunks", 0)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
corpus_id
|
str
|
Logical corpus identifier to inspect. |
required |
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
dict[str, Any]: Backend-defined corpus statistics payload. |
Notes
Metric names and additional fields depend on backend implementation.