Tutorial 5: Add Intelligence — LLM & RAG¶

This tutorial adds language models and retrieval‑augmented generation (RAG) to your agents. You’ll:

set up an LLM profile
chat from a graph function
build a searchable RAG corpus from your files/memory
answer questions grounded by retrieved context (with optional citations)

Works with OpenAI, Anthropic, Google (Gemini), OpenRouter, LM Studio, and Ollama via a unified GenericLLMClient.

0. Mental model¶

LLM: a provider‑agnostic client you access via context.llm(...) for chat and embeddings.
RAG: a corpus of documents (from files and/or Memory events) that are chunked, embedded, and retrieved to ground LLM answers.

llm = context.llm()    # chat & embed & image generation
rag = context.rag()    # corpora, upsert, search, answer

1. Prerequisites¶

API keys for the providers you want (e.g., OpenAI, Anthropic, Gemini, OpenRouter).
If using local models: LM Studio or Ollama running locally and a base URL.

2. Configure LLMs (Profiles)¶

You can configure profiles in environment variables (recommended) or at runtime. See docs for complete setup method.

A) `.env` profiles (recommended)¶

Profiles are named by the section after LLM__. Example: a profile called MY_OPENAI:

AETHERGRAPH_LLM__MY_OPENAI__PROVIDER=openai
AETHERGRAPH_LLM__MY_OPENAI__MODEL=gpt-4o-mini
AETHERGRAPH_LLM__MY_OPENAI__TIMEOUT=60
AETHERGRAPH_LLM__MY_OPENAI__API_KEY=sk-...
AETHERGRAPH_LLM__MY_OPENAI__EMBED_MODEL=text-embedding-3-small  # needed for llm().embed() or RAG

Then in code:

llm = context.llm(profile="my_openai")
text, usage = await llm.chat([...])

The default profile comes from your container config. Use profiles when you want to switch providers/models per node or per run.

B) Register at runtime (programmatic)¶

Useful for notebooks/demos or dynamically wiring services:

from aethergraph.llm import register_llm_client, set_rag_llm_client

client = register_llm_client(
    profile="runtime_openai",
    provider="openai",
    model="gpt-4o-mini",
    api_key="sk-...",
)

# RAG can use a dedicated LLM (for embedding + answering). If not set, it uses the default profile.
set_rag_llm_client(client=client)

You can also pass parameters directly to set_rag_llm_client(provider=..., model=..., embed_model=..., api_key=...).

C) One‑off key injection¶

If you just need to override a key in memory for a demo:

context.llm_set_key(provider="openai", api_key="sk-...")

Sidecar note: If your run needs channels, resumable waits, or shared services, start the sidecar server before using runtime registration.

3. Chat & Embed from a Graph Function¶

Chat (provider‑agnostic)¶

@graph_fn(name="ask_llm")
async def ask_llm(question: str, *, context):
    llm = context.llm(profile="my_openai")  # or omit profile for default
    messages = [
        {"role": "system", "content": "You are concise and helpful."},
        {"role": "user",   "content": question},
    ]
    reply, usage = await llm.chat(messages)
    return {"answer": reply, "usage": usage}

Embeddings¶

vectors = await context.llm(profile="my_openai").embed([
    "First text chunk", "Second text chunk"
])

RAG needs an embed model configured on the chosen profile.

Optional knobs¶

The chat() API allows various parameters for reasoning, json output etc. See the API reference for detailed usage.

4. Raw API escape hatch¶

For power users who need endpoints not yet covered by the high‑level client (such as low-level inputs, VLM models, custom models):

openai = context.llm(profile="my_openai")
payload = {
    "model": "gpt-4o-mini",
    "input": [
        {"role": "system", "content": "You are concise."},
        {"role": "user",   "content": "Explain attention in one sentence."}
    ],
    "max_output_tokens": 128,
    "temperature": 0.3,
}
raw = await openai.raw(path="/responses", json=payload)

raw(path=..., json=...) sends a verbatim request to the provider base URL.
You are responsible for parsing the returned JSON shape.

Use this when experimenting with new provider features before first‑class support lands in the client.

5. RAG: From Docs & Memory to Grounded Answers¶

Flow: Files/Events → chunk + embed → index → retrieve top‑k → LLM answers with context.

Corpora live behind context.rag().
Ingest files (by path) and inline text, and/or promote Memory events into a corpus.

A) Backend & storage¶

Default vector index: SQLite (local, zero‑dep) — great for laptops and small corpora.

Switch to FAISS: faster ANN search for larger corpora.

Set up RAG backend:

Env:

# RAG Settings
AETHERGRAPH_RAG__BACKEND=faiss        # or sqlite
AETHERGRAPH_RAG__DIM=1536             # embedding dimension (e.g., OpenAI text-embedding-3-small)

Runtime:

from aethergraph.services.rag import set_rag_index_backend

set_rag_index_backend(backend="faiss", dim=1536)
# If FAISS is not installed, it logs a warning and falls back to SQLite automatically.

On‑disk layout: each corpus stores corpus.json, docs.jsonl, chunks.jsonl; source files are saved as Artifacts for provenance.

B) Build / update a corpus from files & text¶

await context.rag().upsert_docs(
    corpus_id="my_docs",
    docs=[
        {"path": "data/report.pdf", "labels": {"type": "report"}},
        {"text": "Experiment hit 91.2% accuracy on CIFAR-10.", "title": "exp-log"},
    ],
)

Use file docs when you already have a local file: {"path": "/abs/or/relative.ext", "labels": {...}}. Supported “smart-parsed” types are .pdf, .md/markdown, and .txt (others are treated as plain text). The original file is saved as an Artifact for provenance; if your PDF is a scan, run OCR first (we only extract selectable text).
Use inline docs when you have content in memory: {"text": "...", "title": "nice-short-title", "labels": {...}}. Keep titles short and meaningful; add 1–3 optional labels you’ll actually filter by (e.g., {"source":"lab", "week":2}).

Behind the scenes: documents are stored as Artifacts, parsed, chunked, embedded, and added to the vector index.

C) Search, retrieve, answer (with citations)¶

hits = await context.rag().search("my_docs", "key findings", k=8, mode="hybrid")
ans  = await context.rag().answer(
    corpus_id="my_docs",
    question="Summarize the main findings and list key metrics.",
    style="concise",
    with_citations=True,
    k=6,
)
# ans → { "answer": str, "citations": [...], "resolved_citations": [...], "usage": {...} }

Use resolved_citations to map snippets back to Artifact URIs for auditability.

D) Choosing the LLM for RAG¶

RAG uses a dedicated RAG LLM client that must have both model and embed_model set.

Runtime:

from aethergraph.llm import set_rag_llm_client
set_rag_llm_client(provider="openai", model="gpt-4o-mini", embed_model="text-embedding-3-small", api_key="sk-…")

If you don’t set one, it falls back to the default LLM profile (ensure that profile also has an embed_model).

E) Corpus management (ops)¶

For maintenance and ops you can:

List corpora / docs to inspect what’s indexed.
Delete docs to remove vectors and records.
Re‑embed to refresh vectors after changing embed model or chunking.
Stats to view counts of docs/chunks and corpus metadata.

These live on the same facade: rag.list_corpora(), rag.list_docs(...), rag.delete_docs(...), rag.reembed(...), rag.stats(...). See API reference for details.

6. Practical recipes¶

Switch providers by changing profile= in context.llm(...) without touching your code elsewhere.
Save docs as Artifacts (e.g., save_text, save(path=...)) and ingest by {"path": local_path} so RAG can cite their URIs.

Summary¶

Configure LLM profiles via .env or runtime registration, then use llm.chat() / llm.embed().
Build RAG corpora from files and Memory events, then call rag.answer(..., with_citations=True) for grounded responses.