majordomo-llm¶
A unified async Python interface for multiple LLM providers with built-in cost tracking, automatic retries, and structured outputs.
Why majordomo-llm?¶
Building with LLMs often means dealing with:
- Different APIs for each provider — OpenAI, Anthropic, and Gemini all have different client libraries and response formats
- Hidden costs — Token usage and spending are hard to track across providers
- Fragile integrations — When one provider goes down, your application goes down
- Inconsistent structured outputs — Each provider handles JSON schemas differently
majordomo-llm solves these problems with a single, consistent interface that works across all major providers.
Quick Example¶
import asyncio
from pydantic import BaseModel
from majordomo_llm import get_llm_instance
class Summary(BaseModel):
title: str
key_points: list[str]
word_count: int
async def main():
# Works with any provider: openai, anthropic, gemini, deepseek, cohere
llm = get_llm_instance("anthropic", "claude-sonnet-4-20250514")
response = await llm.get_structured_json_response(
response_model=Summary,
user_prompt="Summarize the benefits of async programming in Python",
)
print(response.content.title)
print(response.content.key_points)
print(f"Cost: ${response.total_cost:.6f}")
asyncio.run(main())
Key Features¶
Unified Provider Interface¶
Write once, run on any provider. Switch between OpenAI, Anthropic, Gemini, DeepSeek, and Cohere with a single line change.
llm = get_llm_instance("openai", "gpt-4.1")
llm = get_llm_instance("anthropic", "claude-sonnet-4-20250514")
llm = get_llm_instance("gemini", "gemini-2.5-flash")
Streaming Responses¶
Stream text token-by-token as it's generated. Usage and cost metrics are available after the stream completes.
stream = await llm.get_response_stream("Explain quantum computing")
async for chunk in stream:
print(chunk, end="", flush=True)
print(f"\nCost: ${stream.usage.total_cost:.6f}")
Structured Outputs with Pydantic¶
Get validated, typed Python objects instead of raw JSON. Provider-specific implementation details are handled internally.
response = await llm.get_structured_json_response(
response_model=MyPydanticModel,
user_prompt="Extract data from this text...",
)
result: MyPydanticModel = response.content # Fully typed
Built-in Cost Tracking¶
Every response includes token counts and calculated costs. No external tracking needed.
print(f"Tokens: {response.input_tokens} in / {response.output_tokens} out")
print(f"Cost: ${response.total_cost:.6f}")
Cascade Failover¶
Automatically fall back to alternative providers when one fails.
from majordomo_llm import LLMCascade
cascade = LLMCascade([
("anthropic", "claude-sonnet-4-20250514"),
("openai", "gpt-4.1"),
("gemini", "gemini-2.5-flash"),
])
response = await cascade.get_response("Hello!") # Tries each until one succeeds
Custom Endpoints & Proxy Routing¶
Route requests through any gateway or proxy with custom base URLs and HTTP headers. Pass api_key directly or let providers read from environment variables. Headers can be set at instance level (default_headers) or per request (extra_headers).
llm = get_llm_instance(
"anthropic", "claude-sonnet-4-20250514",
api_key="sk-ant-...",
base_url="https://gateway.example.com",
default_headers={"X-Majordomo-Key": "mdm_key_here"},
)
response = await llm.get_response("Hello!", extra_headers={"X-Request-Id": "req_123"})
Optional Request Logging¶
Persist all requests for analytics, debugging, and compliance with pluggable database and storage adapters.
from majordomo_llm.logging import LoggingLLM, SqliteAdapter, FileStorageAdapter
db = await SqliteAdapter.create("logs.db")
storage = await FileStorageAdapter.create("./request_logs")
logged_llm = LoggingLLM(llm, db, storage)
Supported Providers¶
| Provider | Recent Models |
|---|---|
| OpenAI | gpt-5.5, gpt-5.4, gpt-5.4-mini, gpt-5, gpt-4.1, o3, o4-mini |
| Anthropic | claude-opus-4-6, claude-sonnet-4-6, claude-haiku-4-5, claude-sonnet-4 |
| Google Gemini | gemini-3.1-pro-preview, gemini-3-flash-preview, gemini-2.5-flash |
| DeepSeek | deepseek-v4-flash, deepseek-v4-pro, deepseek-chat, deepseek-reasoner |
| Cohere | command-a, command-r-plus, command-r |
All providers support structured outputs and streaming. Additional models are available—see llm_config.yaml for the complete list with pricing.
Next Steps¶
- Getting Started — Installation and quickstart
- Core Concepts — Understand the key capabilities
- Recipes — Practical examples and patterns
- API Reference — Detailed API documentation