Models¶
For more detailed usage information please refer to our cookbook: Models Cookbook
1. Concept¶
The Models module in Sikka Agent provides a unified interface for working with various AI model providers. It abstracts away provider-specific implementation details, enabling consistent API usage across different model platforms while handling token counting, rate limiting, and other model-specific requirements.
Sikka Agent's model system consists of several key components:
- ModelConfigure: The central class that manages model configuration and provides a unified interface
- Model Backend: Implementations for specific providers (OpenAI, AWS Bedrock, Ollama, etc.)
- Token Counter: Utilities for counting tokens in messages and managing context limits
- Audio Models: Specialized models for speech-to-text and text-to-speech operations
2. Get Started¶
2.1 Basic Usage¶
Here's a quick example of how to use the ModelConfigure
class:
from sikkaagent.models import ModelConfigure
from sikkaagent.utils.enums import ModelPlatformType
# OpenAI model
openai_model = ModelConfigure(
model="gpt-4o",
model_platform=ModelPlatformType.OPENAI,
api_key="your-api-key" # Or set OPENAI_API_KEY environment variable
)
# Run the model
response = openai_model.run(
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello, how are you?"}
]
)
print(response.choices[0].message.content)
Using Ollama (local inference):
from sikkaagent.models import ModelConfigure
from sikkaagent.utils.enums import ModelPlatformType
# Ollama model (local)
ollama_model = ModelConfigure(
model="llama3.1:8b",
model_platform=ModelPlatformType.OLLAMA,
url="http://localhost:11434/v1" # Default Ollama endpoint
)
# Run the model
response = ollama_model.run(
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain quantum computing in simple terms."}
]
)
print(response.choices[0].message.content)
3. Core Components¶
3.1 ModelConfigure¶
The primary interface for configuring and using models.
Parameters:¶
model
: (Optional) Model identifier (e.g., "gpt-4o", "llama3.1:8b"). Defaults to "gpt-4o-mini" if not provided.model_platform
: (Optional) Provider platform enum. Defaults to standard OpenAI client if not provided.api_key
: (Optional) API key for authentication. Defaults to environment variable if not provided.url
: (Optional) API endpoint URL. Defaults to environment variable or provider default if not provided.config
: (Optional) Model-specific configuration. Defaults to empty ModelConfig if not provided.token_counter
: (Optional) Custom token counter. Automatically initialized if not provided.aws_access_key
: (Optional) AWS access key for Bedrock. Defaults to environment variable if not provided.aws_secret_key
: (Optional) AWS secret key for Bedrock. Defaults to environment variable if not provided.aws_region_name
: (Optional) AWS region for Bedrock. Defaults to "us-east-1" if not provided.
Methods:¶
run(messages)
: Run the model with the given messagestoken_counter
: Property that returns the token counter for the model
3.2 ModelConfig¶
Configuration for model parameters using Pydantic for validation.
Attributes:¶
temperature
: (Optional) Controls randomness (0.0-2.0, default: 0.7)top_p
: (Optional) Controls diversity via nucleus sampling (0.0-1.0, default: 1.0)frequency_penalty
: (Optional) Penalizes repeated tokens (-2.0-2.0, default: 0.0)presence_penalty
: (Optional) Penalizes repeated topics (-2.0-2.0, default: 0.0)max_tokens
: (Optional) Maximum number of tokens to generate (default: 2048)n
: (Optional) Number of completions to generate (default: 1)stream
: (Optional) Whether to stream the response (default: False)tool_choice
: (Optional) Tool choice configuration (default: None)tools
: (Optional) List of tools available to the model (default: None)user
: (Optional) User identifier (default: "")
3.3 BaseModelBackend¶
Abstract base class for different model backends.
Parameters:¶
model
: (Required) Model identifierconfig
: (Optional) Configuration dictionary. Defaults to empty dict.api_key
: (Optional) API key for authenticationurl
: (Optional) API endpoint URLtoken_counter
: (Optional) Custom token counter
Methods:¶
run(messages)
: (Required) Run the model with the given messagestoken_counter
: (Required) Property that returns the token counter for the modelpreprocess_messages(messages)
: (Optional) Preprocess messages before sending to the model
3.4 TokenCounter¶
Utility for counting tokens in messages.
Parameters:¶
tokenizer
: (Required) The tokenizer to use for counting tokens
Methods:¶
count_tokens_from_messages(messages)
: (Required) Count tokens in a list of messagescount_text(text)
: (Required) Count tokens in a text stringcount_image(image_item)
: (Required) Count tokens for an image based on detail levelcount_tool_calls(tool_calls)
: (Required) Count tokens for tool callscount_tool_responses(tool_responses)
: (Required) Count tokens for tool responsescount_content(content)
: (Required) Calculate tokens for message content
4. Model Implementations¶
4.1 OpenAI¶
The standard OpenAI client is used when model_platform
is not specified or set to ModelPlatformType.OPENAI
.
from sikkaagent.models import ModelConfigure
# OpenAI model
model = ModelConfigure(
model="gpt-4o",
model_platform=ModelPlatformType.OPENAI
)
4.2 Ollama¶
Local inference using Ollama.
from sikkaagent.models import ModelConfigure
from sikkaagent.utils.enums import ModelPlatformType
# Ollama model
model = ModelConfigure(
model="llama3.1:8b",
model_platform=ModelPlatformType.OLLAMA,
url="http://localhost:11434/v1" # Default Ollama endpoint
)
4.3 AWS Bedrock¶
AWS Bedrock models.
Setup your AWS Credentials in environment file to use AWS Bedrock
from sikkaagent.models import ModelConfigure
from sikkaagent.utils.enums import ModelPlatformType
# AWS Bedrock model
model = ModelConfigure(
model="us.meta.llama3-1-8b-instruct-v1:0",
model_platform=ModelPlatformType.AWS_BEDROCK,
)
4.4 OpenRouter¶
Access to many models through one API.
from sikkaagent.models import ModelConfigure
from sikkaagent.utils.enums import ModelPlatformType
# OpenRouter model
model = ModelConfigure(
model="anthropic/claude-3-opus",
model_platform=ModelPlatformType.OPENROUTER,
api_key="your-api-key" # Or set OPENROUTER_API_KEY environment variable
)
4.5 OpenAI-Compatible¶
Any OpenAI-compatible endpoint.
from sikkaagent.models import ModelConfigure
from sikkaagent.utils.enums import ModelPlatformType
# OpenAI-compatible model
model = ModelConfigure(
model="mistralai/Mistral-7B-Instruct-v0.2",
model_platform=ModelPlatformType.OPENAI_COMPATIBLE_MODEL,
url="https://api.together.xyz/v1",
api_key="your-api-key"
)
5. Integration with Other Modules¶
5.1 With ChatAgent¶
Models are primarily used with the ChatAgent class:
from sikkaagent.agents import ChatAgent
from sikkaagent.models import ModelConfigure
from sikkaagent.utils.enums import ModelPlatformType
# Create model
model = ModelConfigure(
model="llama3.1:8b",
model_platform=ModelPlatformType.OLLAMA
)
# Create agent with model
agent = ChatAgent(
model=model,
system_prompt="You are a helpful assistant."
)
# Use the agent
response = agent.step("Hello, how are you?")
print(response)
5.2 With Memory¶
Models provide token counting for Memory:
from sikkaagent.memories import Memory
from sikkaagent.storages import InMemoryStorage
from sikkaagent.models import ModelConfigure
from sikkaagent.utils.enums import ModelPlatformType
# Initialize model for token counting
model = ModelConfigure(
model="llama3.1:8b",
model_platform=ModelPlatformType.OLLAMA
)
# Create memory with token limit
memory = Memory(
storage=InMemoryStorage(),
token_counter=model.token_counter,
token_limit=4000 # Maximum tokens for context
)
6. Advanced Topics¶
6.1 Tool Calling¶
Models can use tools to perform actions:
from sikkaagent.models import ModelConfigure
from sikkaagent.utils.enums import ModelPlatformType
# Create model
model = ModelConfigure(
model="llama3.1:8b",
model_platform=ModelPlatformType.OLLAMA
)
# Define tools
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather in a location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA"
}
},
"required": ["location"]
}
}
}
]
# Run the model with tools
response = model.run(
messages=[
{"role": "user", "content": "What's the weather like in Boston?"}
],
tools=tools
)
# Process tool calls
tool_calls = response.choices[0].message.tool_calls
if tool_calls:
# Handle tool calls
print(f"Tool: {tool_calls[0].function.name}")
print(f"Arguments: {tool_calls[0].function.arguments}")
6.2 Streaming Responses¶
Models can stream responses for better user experience:
from sikkaagent.models import ModelConfigure
from sikkaagent.utils.enums import ModelPlatformType
# Create model with streaming enabled
model = ModelConfigure(
model="llama3.1:8b",
model_platform=ModelPlatformType.OLLAMA,
config={"stream": True}
)
# Run the model with streaming
stream = model.run(
messages=[
{"role": "user", "content": "Write a short story about a robot."}
]
)
# Process the stream
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
6.3 Token Management¶
Managing tokens is important for staying within model context limits:
from sikkaagent.models import ModelConfigure
from sikkaagent.utils.enums import ModelPlatformType
# Create model
model = ModelConfigure(
model="llama3.1:8b",
model_platform=ModelPlatformType.OLLAMA
)
# Count tokens in messages
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello, how are you?"}
]
token_count = model.token_counter.count_tokens_from_messages(messages)
print(f"Token count: {token_count}")
# Check if within token limit
if token_count < model.token_limit:
print("Within token limit")
else:
print("Exceeds token limit")
7. Best Practices¶
- Model Selection: Choose the appropriate model based on your needs:
- OpenAI models for production applications requiring high reliability
- AWS Bedrock for organizations with existing AWS infrastructure
- Ollama for development, privacy-sensitive applications, or cost-sensitive deployments
-
OpenRouter for experimenting with different model providers
-
Token Management: Be aware of each model's context window limitations:
- Use token counting to stay within limits
- Implement windowing or summarization for long conversations
-
Consider using smaller models for simpler tasks
-
Error Handling: Implement robust error handling:
- Retry logic for API-based models
- Fallback models for critical applications
-
Graceful degradation when models are unavailable
-
Security: Protect API keys and credentials:
- Use environment variables for API keys
- Implement proper access controls
-
Consider using AWS IAM roles for Bedrock
-
Cost Optimization: Manage costs effectively:
- Use smaller models for simpler tasks
- Implement caching for common queries
- Monitor usage and set up alerts