Providers

AgentKavach supports OpenAI, Anthropic, Google, and Mistral. The same guard.create() method works for every provider.

OpenAI #

Standard call

python

from agentkavach import AgentKavach, Budget

guard = AgentKavach(
    provider="openai",
    api_key="ak_prod_...",
    llm_key="sk-...",
    budget=Budget.daily(50),
)

response = guard.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
)

Native namespace

python

response = guard.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
)

The native namespace mirrors the OpenAI client. Use it when you want the familiar shape.

Streaming

python

stream = guard.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Write a poem"}],
    stream=True,
)

for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="")

Anthropic #

Standard call

python

from agentkavach import AgentKavach, Budget

guard = AgentKavach(
    provider="anthropic",
    api_key="ak_prod_...",
    llm_key="sk-ant-...",
    budget=Budget.daily(50),
)

response = guard.create(
    model="claude-sonnet-4-6",
    messages=[{"role": "user", "content": "Summarize this report"}],
    max_tokens=1024,
)

Native namespace

python

response = guard.messages.create(
    model="claude-sonnet-4-6",
    messages=[{"role": "user", "content": "Summarize this report"}],
    max_tokens=1024,
)

max_tokens is required

Anthropic requires max_tokens on every call. The provider rejects the request without it.

Streaming

Pass stream=True and iterate the result. Anthropic emits typed events; the SDK reads output tokens from the trailing message_delta so cost stays accurate, and records partial usage if you break out early.

python

stream = guard.create(
    model="claude-sonnet-4-6",
    messages=[{"role": "user", "content": "Explain HMAC."}],
    max_tokens=1024,
    stream=True,
)

for event in stream:
    delta = getattr(event, "delta", None)
    text = getattr(delta, "text", None) if delta is not None else None
    if text:
        print(text, end="", flush=True)

Google #

Standard call

python

from agentkavach import AgentKavach, Budget

guard = AgentKavach(
    provider="google",
    api_key="ak_prod_...",
    llm_key="AIza...",
    budget=Budget.monthly(300),
)

response = guard.create(
    model="gemini-2.5-flash",
    contents="Generate a project outline",
)

Native namespace

python

response = guard.generate_content(
    model="gemini-2.5-flash",
    contents="Generate a project outline",
)

contents, not messages

Google uses contents. Passing messages raises an error.

Streaming

Pass stream=True to get an iterator of chunks, each exposing .text for the incremental output. The final chunk carries the exact token counts the SDK uses for cost.

python

stream = guard.create(
    model="gemini-2.5-flash",
    contents="Write a haiku about TLS.",
    stream=True,
)

for chunk in stream:
    if chunk.text:
        print(chunk.text, end="", flush=True)

Mistral #

Standard call

python

from agentkavach import AgentKavach, Budget

guard = AgentKavach(
    provider="mistral",
    api_key="ak_prod_...",
    llm_key="your-mistral-api-key",
    budget=Budget.daily(50),
)

response = guard.create(
    model="mistral-large-latest",
    messages=[{"role": "user", "content": "Hello!"}],
)

Native namespace

python

response = guard.chat.complete(
    model="mistral-large-latest",
    messages=[{"role": "user", "content": "Hello!"}],
)

Mistral uses an OpenAI-compatible shape. The native namespace maps to client.chat.complete().

Streaming

python

stream = guard.create(
    model="mistral-large-latest",
    messages=[{"role": "user", "content": "Write a poem"}],
    stream=True,
)

for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="")

Token counting #

AgentKavach counts input tokens before each call so it can estimate cost. OpenAI and Mistral count locally with tiktoken, which is effectively free. Anthropic and Google count through a provider API call, which adds a network round trip.

Provider	Method	Typical latency
OpenAI	Local (tiktoken)	~0.1 ms
Anthropic	Provider API call	~150 ms
Google	Provider API call	~150 ms
Mistral	Local (tiktoken)	~0.1 ms

Cross-provider comparison #

Feature	OpenAI	Anthropic	Google	Mistral
Parameter name	`messages`	`messages`	`contents`	`messages`
Native namespace	`guard.chat.completions.create()`	`guard.messages.create()`	`guard.generate_content()`	`guard.chat.complete()`
Unified API	`guard.create()`	`guard.create()`	`guard.create()`	`guard.create()`
Streaming	`stream=True`	`stream=True`	`stream=True`	`stream=True`