context.llm() – LLM Client & Profiles API Reference¶
context.llm() returns an LLM client (profile‑aware) with a consistent API across providers (OpenAI, Azure OpenAI, Anthropic, Google, OpenRouter, LM Studio, Ollama). Use it for chat, embeddings, and raw HTTP calls with built‑in retries and sane defaults.
See LLM Setup & Providers for configuring providers, base URLs, and API keys.
Profiles & Configuration¶
- Profiles: Named client configs (default:
"default"). - Get existing:
client = context.llm()orcontext.llm(profile="myprofile"). - Override/update: Pass any of
provider/model/base_url/api_key/azure_deployment/timeoutto create or update a profile at runtime. - Quick set key:
context.llm_set_key(provider, model, api_key, profile="default").
Supported providers: openai, azure, anthropic, google, openrouter, lmstudio, ollama.
0. LLM Setup¶
context.llm(profile, *, provider, model, base_url, ...)
Retrieve or configure an LLM client for this context.
This method allows you to access a language model client by profile name, or dynamically override its configuration at runtime.
Examples:
Get the default LLM client:
llm = context.llm()
response = await llm.complete("Hello, world!")
Use a custom profile:
llm = context.llm(profile="my-profile")
Override provider and model for a one-off call:
llm = context.llm(
provider=Provider.OpenAI,
model="gpt-4-turbo",
api_key="sk-...",
)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
profile
|
str
|
The profile name to use (default: "default"). Set up in |
'default'
|
provider
|
Provider | None
|
Optionally override the provider (e.g., |
None
|
model
|
str | None
|
Optionally override the model name. |
None
|
base_url
|
str | None
|
Optionally override the base URL for the LLM API. |
None
|
api_key
|
str | None
|
Optionally override the API key for authentication. |
None
|
azure_deployment
|
str | None
|
Optionally specify an Azure deployment name. |
None
|
timeout
|
float | None
|
Optionally set a request timeout (in seconds). |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
LLMClientProtocol |
LLMClientProtocol
|
The configured LLM client instance for this context. |
context.llm_set_key.channel(provider, model, api_key, profile)
Quickly configure or override the LLM provider, model, and API key for a given profile.
This method allows you to update the credentials and model configuration for a specific LLM profile at runtime. It is useful for dynamically switching providers or rotating keys without restarting the application.
Examples:
Set the OpenAI API key for the default profile:
context.llm_set_key(
provider="openai",
model="gpt-4-turbo",
api_key="sk-...",
)
Configure a custom profile for Anthropic:
context.llm_set_key(
provider="anthropic",
model="claude-3-opus",
api_key="sk-ant-...",
profile="anthropic-profile"
)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
provider
|
str
|
The LLM provider name (e.g., "openai", "anthropic"). |
required |
model
|
str
|
The model name or identifier to use. |
required |
api_key
|
str
|
The API key or credential for the provider. |
required |
profile
|
str
|
The profile name to update (default: "default"). |
'default'
|
Returns:
| Type | Description |
|---|---|
|
None. The profile is updated in-place and will be used for subsequent calls |
|
|
to |
1. Main APIs¶
chat(messages, *, reasoning_effort, max_output_tokens, ...)
Send a chat request to the LLM provider and return the response in a normalized format. This method handles provider-specific dispatch, output postprocessing, rate limiting, and usage metering. It supports structured output via JSON schema validation and flexible output formats.
Examples:
Basic usage with a list of messages:
response, usage = await context.llm().chat([
{"role": "user", "content": "Hello, assistant!"}
])
Requesting structured output with a JSON schema:
response, usage = await context.llm().chat(
messages=[{"role": "user", "content": "Summarize this text."}],
output_format="json",
json_schema={"type": "object", "properties": {"summary": {"type": "string"}}}
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
messages
|
list[dict[str, Any]]
|
List of message dicts, each with "role" and "content" keys. |
required |
reasoning_effort
|
str | None
|
Optional string to control model reasoning depth. |
None
|
max_output_tokens
|
int | None
|
Optional maximum number of output tokens. |
None
|
output_format
|
ChatOutputFormat
|
Output format, e.g., "text" or "json". |
'text'
|
json_schema
|
dict[str, Any] | None
|
Optional JSON schema for validating structured output. |
None
|
schema_name
|
str
|
Name for the root schema object (default: "output"). |
'output'
|
strict_schema
|
bool
|
If True, enforce strict schema validation. |
True
|
validate_json
|
bool
|
If True, validate JSON output against schema. |
True
|
fail_on_unsupported
|
bool
|
If True, raise error for unsupported features. |
True
|
**kw
|
Any
|
Additional provider-specific keyword arguments. Common cross-provider options include: - model: override default model name. - tools: OpenAI-style tools / functions description. - tool_choice: tool selection strategy (e.g., "auto", "none", or provider-specific dict). |
{}
|
Returns:
| Type | Description |
|---|---|
tuple[str, dict[str, int]]
|
tuple[str, dict[str, int]]: The model response (text or structured output) and usage statistics. |
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
If the provider is not supported. |
RuntimeError
|
For various errors including invalid JSON output or rate limit violations. |
LLMUnsupportedFeatureError
|
If a requested feature is unsupported by the provider. |
Notes
- This method centralizes handling of different LLM providers, ensuring consistent behavior.
- Structured output support allows for robust integration with downstream systems.
- Rate limiting and metering help manage resource usage effectively.
chat_stream(messages, *, reasoning_effort, max_output_tokens, ...)
Stream a chat request to the LLM provider and return the accumulated response.
This method handles provider-specific streaming paths, falling back to non-streaming chat() if streaming is not implemented. It supports real-time delta updates via a callback function and returns the full response text and usage statistics at the end.
Examples:
Basic usage with a list of messages:
response, usage = await context.llm().chat_stream(
messages=[{"role": "user", "content": "Hello, assistant!"}]
)
Using a delta callback for real-time updates:
async def on_delta(delta):
print(delta, end="")
response, usage = await context.llm().chat_stream(
messages=[{"role": "user", "content": "Tell me a joke."}],
on_delta=on_delta
)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
messages
|
list[dict[str, Any]]
|
List of message dicts, each with "role" and "content" keys. |
required |
reasoning_effort
|
str | None
|
Optional string to control model reasoning depth. |
None
|
max_output_tokens
|
int | None
|
Optional maximum number of output tokens. |
None
|
output_format
|
ChatOutputFormat
|
Output format, e.g., "text" or "json". |
'text'
|
json_schema
|
dict[str, Any] | None
|
Optional JSON schema for validating structured output. |
None
|
schema_name
|
str
|
Name for the root schema object (default: "output"). |
'output'
|
strict_schema
|
bool
|
If True, enforce strict schema validation. |
True
|
validate_json
|
bool
|
If True, validate JSON output against schema. |
True
|
fail_on_unsupported
|
bool
|
If True, raise error for unsupported features. |
True
|
on_delta
|
DeltaCallback | None
|
Optional callback function to handle real-time text deltas. |
None
|
**kw
|
Any
|
Additional provider-specific keyword arguments. |
{}
|
Returns:
| Type | Description |
|---|---|
tuple[str, dict[str, int]]
|
tuple[str, dict[str, int]]: The accumulated response text and usage statistics. |
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
If the provider is not supported. |
RuntimeError
|
For various errors including invalid JSON output or rate limit violations. |
LLMUnsupportedFeatureError
|
If a requested feature is unsupported by the provider. |
Notes
- This method centralizes handling of streaming and non-streaming paths for LLM providers.
- The
on_deltacallback allows for real-time updates, making it suitable for interactive applications. - Rate limiting and usage metering are applied consistently across providers.
- Currently, only OpenAI's Responses API streaming is implemented; other providers will fall back to the non-streaming
chat()method.
generate_image(prompt, *, model, n, ...)
Generate images from a text prompt using the configured LLM provider.
This method supports provider-agnostic image generation, including OpenAI, Azure, and Google Gemini. It automatically handles rate limiting, usage metering, and provider-specific options.
Examples:
Basic usage with a prompt:
result = await context.llm().generate_image("A cat riding a bicycle")
Requesting multiple images with custom size and style:
result = await context.llm().generate_image(
"A futuristic cityscape",
n=3,
size="1024x1024",
style="vivid"
)
Supplying input images for edit-style generation (Gemini):
result = await context.llm().generate_image(
"Make this image brighter",
input_images=[my_data_url]
)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
prompt
|
str
|
The text prompt describing the desired image(s). |
required |
model
|
str | None
|
Optional model name to override the default. |
None
|
n
|
int
|
Number of images to generate (default: 1). |
1
|
size
|
str | None
|
Image size, e.g., "1024x1024". |
None
|
quality
|
str | None
|
Image quality setting (provider-specific). |
None
|
style
|
str | None
|
Artistic style (provider-specific). |
None
|
output_format
|
ImageFormat | None
|
Desired image format, e.g., "png", "jpeg". |
None
|
response_format
|
ImageResponseFormat | None
|
Response format, e.g., "url" or "b64_json". |
None
|
background
|
str | None
|
Background setting, e.g., "transparent". |
None
|
input_images
|
list[str] | None
|
List of input images (as data URLs) for edit-style generation. |
None
|
azure_api_version
|
str | None
|
Azure-specific API version override. |
None
|
**kw
|
Any
|
Additional provider-specific keyword arguments. |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
ImageGenerationResult |
ImageGenerationResult
|
An object containing generated images, usage statistics, and raw response data. |
Raises:
| Type | Description |
|---|---|
LLMUnsupportedFeatureError
|
If the provider does not support image generation. |
RuntimeError
|
For provider-specific errors or invalid configuration. |
Notes
- This method is accessed via
context.llm().generate_image(...). - Usage metering and rate limits are enforced automatically. However, token usage is typically not reported for image generation.
- The returned
ImageGenerationResultincludes both images and metadata.
2. Raw API¶
raw(*, method, path, url, ...)
Send a low-level HTTP request using the configured LLM provider’s client.
This method provides direct access to the underlying HTTP transport, automatically applying provider-specific authentication, base URL resolution, and retry logic. It is intended for advanced use cases where you need to call custom endpoints or experiment with provider APIs not covered by higher-level methods.
Examples:
Basic usage with a relative path:
result = await context.llm().raw(
method="POST",
path="/custom/endpoint",
json={"foo": "bar"}
)
Sending a GET request to an absolute URL:
response = await context.llm().raw(
method="GET",
url="https://api.openai.com/v1/models",
return_response=True
)
Overriding headers and query parameters:
result = await context.llm().raw(
path="/v1/special",
headers={"X-Custom": "123"},
params={"q": "search"}
)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
method
|
str
|
HTTP method to use (e.g., "POST", "GET"). |
'POST'
|
path
|
str | None
|
Relative path to append to the provider’s base URL. |
None
|
url
|
str | None
|
Absolute URL to call (overrides |
None
|
json
|
Any | None
|
JSON-serializable body to send with the request. |
None
|
params
|
dict[str, Any] | None
|
Dictionary of query parameters. |
None
|
headers
|
dict[str, str] | None
|
Dictionary of HTTP headers to override defaults. |
None
|
return_response
|
bool
|
If True, return the raw |
False
|
Returns:
| Name | Type | Description |
|---|---|---|
Any |
Any
|
The parsed JSON response by default, or the raw |
Any
|
if |
Raises:
| Type | Description |
|---|---|
ValueError
|
If neither |
RuntimeError
|
For HTTP errors or provider-specific failures. |
Notes
- This method is accessed via
context.llm().raw(...). - Provider authentication and retry logic are handled automatically.
- Use with caution; malformed requests may result in provider errors.