Skip to content

context.llm() – LLM Client & Profiles API Reference

context.llm() returns an LLM client (profile‑aware) with a consistent API across providers (OpenAI, Azure OpenAI, Anthropic, Google, OpenRouter, LM Studio, Ollama). Use it for chat, embeddings, and raw HTTP calls with built‑in retries and sane defaults.

See LLM Setup & Providers for configuring providers, base URLs, and API keys.


Profiles & Configuration

  • Profiles: Named client configs (default: "default").
  • Get existing: client = context.llm() or context.llm(profile="myprofile").
  • Override/update: Pass any of provider/model/base_url/api_key/azure_deployment/timeout to create or update a profile at runtime.
  • Quick set key: context.llm_set_key(provider, model, api_key, profile="default").

Supported providers: openai, azure, anthropic, google, openrouter, lmstudio, ollama.


0. LLM Setup

context.llm(profile, *, provider, model, base_url, ...)

Retrieve or configure an LLM client for this context.

This method allows you to access a language model client by profile name, or dynamically override its configuration at runtime.

Examples:

Get the default LLM client:

llm = context.llm()
response = await llm.complete("Hello, world!")

Use a custom profile:

llm = context.llm(profile="my-profile")

Override provider and model for a one-off call:

llm = context.llm(
    provider=Provider.OpenAI,
    model="gpt-4-turbo",
    api_key="sk-...",
)

Parameters:

Name Type Description Default
profile str

The profile name to use (default: "default"). Set up in .env or register_llm_client() method.

'default'
provider Provider | None

Optionally override the provider (e.g., Provider.OpenAI).

None
model str | None

Optionally override the model name.

None
base_url str | None

Optionally override the base URL for the LLM API.

None
api_key str | None

Optionally override the API key for authentication.

None
azure_deployment str | None

Optionally specify an Azure deployment name.

None
timeout float | None

Optionally set a request timeout (in seconds).

None

Returns:

Name Type Description
LLMClientProtocol LLMClientProtocol

The configured LLM client instance for this context.

context.llm_set_key.channel(provider, model, api_key, profile)

Quickly configure or override the LLM provider, model, and API key for a given profile.

This method allows you to update the credentials and model configuration for a specific LLM profile at runtime. It is useful for dynamically switching providers or rotating keys without restarting the application.

Examples:

Set the OpenAI API key for the default profile:

context.llm_set_key(
    provider="openai",
    model="gpt-4-turbo",
    api_key="sk-...",
)

Configure a custom profile for Anthropic:

context.llm_set_key(
    provider="anthropic",
    model="claude-3-opus",
    api_key="sk-ant-...",
    profile="anthropic-profile"
)

Parameters:

Name Type Description Default
provider str

The LLM provider name (e.g., "openai", "anthropic").

required
model str

The model name or identifier to use.

required
api_key str

The API key or credential for the provider.

required
profile str

The profile name to update (default: "default").

'default'

Returns:

Type Description

None. The profile is updated in-place and will be used for subsequent calls

to context.llm(profile=...).

1. Main APIs

chat(messages, *, reasoning_effort, max_output_tokens, ...)

Send a chat request to the LLM provider and return the response in a normalized format. This method handles provider-specific dispatch, output postprocessing, rate limiting, and usage metering. It supports structured output via JSON schema validation and flexible output formats.

Examples:

Basic usage with a list of messages:

response, usage = await context.llm().chat([
    {"role": "user", "content": "Hello, assistant!"}
])

Requesting structured output with a JSON schema:

response, usage = await context.llm().chat(
    messages=[{"role": "user", "content": "Summarize this text."}],
    output_format="json",
    json_schema={"type": "object", "properties": {"summary": {"type": "string"}}}

Parameters:

Name Type Description Default
messages list[dict[str, Any]]

List of message dicts, each with "role" and "content" keys.

required
reasoning_effort str | None

Optional string to control model reasoning depth.

None
max_output_tokens int | None

Optional maximum number of output tokens.

None
output_format ChatOutputFormat

Output format, e.g., "text" or "json".

'text'
json_schema dict[str, Any] | None

Optional JSON schema for validating structured output.

None
schema_name str

Name for the root schema object (default: "output").

'output'
strict_schema bool

If True, enforce strict schema validation.

True
validate_json bool

If True, validate JSON output against schema.

True
fail_on_unsupported bool

If True, raise error for unsupported features.

True
**kw Any

Additional provider-specific keyword arguments. Common cross-provider options include: - model: override default model name. - tools: OpenAI-style tools / functions description. - tool_choice: tool selection strategy (e.g., "auto", "none", or provider-specific dict).

{}

Returns:

Type Description
tuple[str, dict[str, int]]

tuple[str, dict[str, int]]: The model response (text or structured output) and usage statistics.

Raises:

Type Description
NotImplementedError

If the provider is not supported.

RuntimeError

For various errors including invalid JSON output or rate limit violations.

LLMUnsupportedFeatureError

If a requested feature is unsupported by the provider.

Notes
  • This method centralizes handling of different LLM providers, ensuring consistent behavior.
  • Structured output support allows for robust integration with downstream systems.
  • Rate limiting and metering help manage resource usage effectively.
chat_stream(messages, *, reasoning_effort, max_output_tokens, ...)

Stream a chat request to the LLM provider and return the accumulated response.

This method handles provider-specific streaming paths, falling back to non-streaming chat() if streaming is not implemented. It supports real-time delta updates via a callback function and returns the full response text and usage statistics at the end.

Examples:

Basic usage with a list of messages:

response, usage = await context.llm().chat_stream(
messages=[{"role": "user", "content": "Hello, assistant!"}]
)

Using a delta callback for real-time updates:

async def on_delta(delta):
    print(delta, end="")

response, usage = await context.llm().chat_stream(
    messages=[{"role": "user", "content": "Tell me a joke."}],
    on_delta=on_delta
)

Parameters:

Name Type Description Default
messages list[dict[str, Any]]

List of message dicts, each with "role" and "content" keys.

required
reasoning_effort str | None

Optional string to control model reasoning depth.

None
max_output_tokens int | None

Optional maximum number of output tokens.

None
output_format ChatOutputFormat

Output format, e.g., "text" or "json".

'text'
json_schema dict[str, Any] | None

Optional JSON schema for validating structured output.

None
schema_name str

Name for the root schema object (default: "output").

'output'
strict_schema bool

If True, enforce strict schema validation.

True
validate_json bool

If True, validate JSON output against schema.

True
fail_on_unsupported bool

If True, raise error for unsupported features.

True
on_delta DeltaCallback | None

Optional callback function to handle real-time text deltas.

None
**kw Any

Additional provider-specific keyword arguments.

{}

Returns:

Type Description
tuple[str, dict[str, int]]

tuple[str, dict[str, int]]: The accumulated response text and usage statistics.

Raises:

Type Description
NotImplementedError

If the provider is not supported.

RuntimeError

For various errors including invalid JSON output or rate limit violations.

LLMUnsupportedFeatureError

If a requested feature is unsupported by the provider.

Notes
  • This method centralizes handling of streaming and non-streaming paths for LLM providers.
  • The on_delta callback allows for real-time updates, making it suitable for interactive applications.
  • Rate limiting and usage metering are applied consistently across providers.
  • Currently, only OpenAI's Responses API streaming is implemented; other providers will fall back to the non-streaming chat() method.
generate_image(prompt, *, model, n, ...)

Generate images from a text prompt using the configured LLM provider.

This method supports provider-agnostic image generation, including OpenAI, Azure, and Google Gemini. It automatically handles rate limiting, usage metering, and provider-specific options.

Examples:

Basic usage with a prompt:

result = await context.llm().generate_image("A cat riding a bicycle")

Requesting multiple images with custom size and style:

result = await context.llm().generate_image(
    "A futuristic cityscape",
    n=3,
    size="1024x1024",
    style="vivid"
)

Supplying input images for edit-style generation (Gemini):

result = await context.llm().generate_image(
    "Make this image brighter",
    input_images=[my_data_url]
)

Parameters:

Name Type Description Default
prompt str

The text prompt describing the desired image(s).

required
model str | None

Optional model name to override the default.

None
n int

Number of images to generate (default: 1).

1
size str | None

Image size, e.g., "1024x1024".

None
quality str | None

Image quality setting (provider-specific).

None
style str | None

Artistic style (provider-specific).

None
output_format ImageFormat | None

Desired image format, e.g., "png", "jpeg".

None
response_format ImageResponseFormat | None

Response format, e.g., "url" or "b64_json".

None
background str | None

Background setting, e.g., "transparent".

None
input_images list[str] | None

List of input images (as data URLs) for edit-style generation.

None
azure_api_version str | None

Azure-specific API version override.

None
**kw Any

Additional provider-specific keyword arguments.

{}

Returns:

Name Type Description
ImageGenerationResult ImageGenerationResult

An object containing generated images, usage statistics, and raw response data.

Raises:

Type Description
LLMUnsupportedFeatureError

If the provider does not support image generation.

RuntimeError

For provider-specific errors or invalid configuration.

Notes
  • This method is accessed via context.llm().generate_image(...).
  • Usage metering and rate limits are enforced automatically. However, token usage is typically not reported for image generation.
  • The returned ImageGenerationResult includes both images and metadata.
embed(texts, ...)

2. Raw API

raw(*, method, path, url, ...)

Send a low-level HTTP request using the configured LLM provider’s client.

This method provides direct access to the underlying HTTP transport, automatically applying provider-specific authentication, base URL resolution, and retry logic. It is intended for advanced use cases where you need to call custom endpoints or experiment with provider APIs not covered by higher-level methods.

Examples:

Basic usage with a relative path:

result = await context.llm().raw(
    method="POST",
    path="/custom/endpoint",
    json={"foo": "bar"}
)

Sending a GET request to an absolute URL:

response = await context.llm().raw(
    method="GET",
    url="https://api.openai.com/v1/models",
    return_response=True
)

Overriding headers and query parameters:

result = await context.llm().raw(
    path="/v1/special",
    headers={"X-Custom": "123"},
    params={"q": "search"}
)

Parameters:

Name Type Description Default
method str

HTTP method to use (e.g., "POST", "GET").

'POST'
path str | None

Relative path to append to the provider’s base URL.

None
url str | None

Absolute URL to call (overrides path and base_url).

None
json Any | None

JSON-serializable body to send with the request.

None
params dict[str, Any] | None

Dictionary of query parameters.

None
headers dict[str, str] | None

Dictionary of HTTP headers to override defaults.

None
return_response bool

If True, return the raw httpx.Response object; otherwise, return the parsed JSON response.

False

Returns:

Name Type Description
Any Any

The parsed JSON response by default, or the raw httpx.Response

Any

if return_response=True.

Raises:

Type Description
ValueError

If neither url nor path is provided.

RuntimeError

For HTTP errors or provider-specific failures.

Notes
  • This method is accessed via context.llm().raw(...).
  • Provider authentication and retry logic are handled automatically.
  • Use with caution; malformed requests may result in provider errors.