Models¶

For more detailed usage information please refer to our cookbook: Models Cookbook

1. Concept¶

The Models module in Sikka Agent provides a unified interface for working with various AI model providers. It abstracts away provider-specific implementation details, enabling consistent API usage across different model platforms while handling token counting, rate limiting, and other model-specific requirements.

Sikka Agent's model system consists of several key components:

ModelConfigure: The central class that manages model configuration and provides a unified interface
Model Backend: Implementations for specific providers (OpenAI, AWS Bedrock, Ollama, etc.)
Token Counter: Utilities for counting tokens in messages and managing context limits
Audio Models: Specialized models for speech-to-text and text-to-speech operations

2. Get Started¶

2.1 Basic Usage¶

Here's a quick example of how to use the ModelConfigure class:

from sikkaagent.models import ModelConfigure
from sikkaagent.utils.enums import ModelPlatformType

# OpenAI model
openai_model = ModelConfigure(
    model="gpt-4o",
    model_platform=ModelPlatformType.OPENAI,
    api_key="your-api-key"  # Or set OPENAI_API_KEY environment variable
)

# Run the model
response = openai_model.run(
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello, how are you?"}
    ]
)

print(response.choices[0].message.content)

Using Ollama (local inference):

from sikkaagent.models import ModelConfigure
from sikkaagent.utils.enums import ModelPlatformType

# Ollama model (local)
ollama_model = ModelConfigure(
    model="llama3.1:8b",
    model_platform=ModelPlatformType.OLLAMA,
    url="http://localhost:11434/v1"  # Default Ollama endpoint
)

# Run the model
response = ollama_model.run(
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain quantum computing in simple terms."}
    ]
)

print(response.choices[0].message.content)

3. Core Components¶

3.1 ModelConfigure¶

The primary interface for configuring and using models.

Parameters:¶

model: (Optional) Model identifier (e.g., "gpt-4o", "llama3.1:8b"). Defaults to "gpt-4o-mini" if not provided.
model_platform: (Optional) Provider platform enum. Defaults to standard OpenAI client if not provided.
api_key: (Optional) API key for authentication. Defaults to environment variable if not provided.
url: (Optional) API endpoint URL. Defaults to environment variable or provider default if not provided.
config: (Optional) Model-specific configuration. Defaults to empty ModelConfig if not provided.
token_counter: (Optional) Custom token counter. Automatically initialized if not provided.
aws_access_key: (Optional) AWS access key for Bedrock. Defaults to environment variable if not provided.
aws_secret_key: (Optional) AWS secret key for Bedrock. Defaults to environment variable if not provided.
aws_region_name: (Optional) AWS region for Bedrock. Defaults to "us-east-1" if not provided.

Methods:¶

run(messages): Run the model with the given messages
token_counter: Property that returns the token counter for the model

3.2 ModelConfig¶

Configuration for model parameters using Pydantic for validation.

Attributes:¶

temperature: (Optional) Controls randomness (0.0-2.0, default: 0.7)
top_p: (Optional) Controls diversity via nucleus sampling (0.0-1.0, default: 1.0)
frequency_penalty: (Optional) Penalizes repeated tokens (-2.0-2.0, default: 0.0)
presence_penalty: (Optional) Penalizes repeated topics (-2.0-2.0, default: 0.0)
max_tokens: (Optional) Maximum number of tokens to generate (default: 2048)
n: (Optional) Number of completions to generate (default: 1)
stream: (Optional) Whether to stream the response (default: False)
tool_choice: (Optional) Tool choice configuration (default: None)
tools: (Optional) List of tools available to the model (default: None)
user: (Optional) User identifier (default: "")

3.3 BaseModelBackend¶

Abstract base class for different model backends.

Parameters:¶

model: (Required) Model identifier
config: (Optional) Configuration dictionary. Defaults to empty dict.
api_key: (Optional) API key for authentication
url: (Optional) API endpoint URL
token_counter: (Optional) Custom token counter

Methods:¶

run(messages): (Required) Run the model with the given messages
token_counter: (Required) Property that returns the token counter for the model
preprocess_messages(messages): (Optional) Preprocess messages before sending to the model

3.4 TokenCounter¶

Utility for counting tokens in messages.

Parameters:¶

tokenizer: (Required) The tokenizer to use for counting tokens

Methods:¶

count_tokens_from_messages(messages): (Required) Count tokens in a list of messages
count_text(text): (Required) Count tokens in a text string
count_image(image_item): (Required) Count tokens for an image based on detail level
count_tool_calls(tool_calls): (Required) Count tokens for tool calls
count_tool_responses(tool_responses): (Required) Count tokens for tool responses
count_content(content): (Required) Calculate tokens for message content

4. Model Implementations¶

4.1 OpenAI¶

The standard OpenAI client is used when model_platform is not specified or set to ModelPlatformType.OPENAI.

from sikkaagent.models import ModelConfigure

# OpenAI model
model = ModelConfigure(
    model="gpt-4o",
    model_platform=ModelPlatformType.OPENAI
)

4.2 Ollama¶

Local inference using Ollama.

from sikkaagent.models import ModelConfigure
from sikkaagent.utils.enums import ModelPlatformType

# Ollama model
model = ModelConfigure(
    model="llama3.1:8b",
    model_platform=ModelPlatformType.OLLAMA,
    url="http://localhost:11434/v1"  # Default Ollama endpoint
)

4.3 AWS Bedrock¶

AWS Bedrock models.

Setup your AWS Credentials in environment file to use AWS Bedrock

from sikkaagent.models import ModelConfigure
from sikkaagent.utils.enums import ModelPlatformType

# AWS Bedrock model
model = ModelConfigure(
    model="us.meta.llama3-1-8b-instruct-v1:0",
    model_platform=ModelPlatformType.AWS_BEDROCK,
)

4.4 OpenRouter¶

Access to many models through one API.

from sikkaagent.models import ModelConfigure
from sikkaagent.utils.enums import ModelPlatformType

# OpenRouter model
model = ModelConfigure(
    model="anthropic/claude-3-opus",
    model_platform=ModelPlatformType.OPENROUTER,
    api_key="your-api-key"  # Or set OPENROUTER_API_KEY environment variable
)

4.5 OpenAI-Compatible¶

Any OpenAI-compatible endpoint.

from sikkaagent.models import ModelConfigure
from sikkaagent.utils.enums import ModelPlatformType

# OpenAI-compatible model
model = ModelConfigure(
    model="mistralai/Mistral-7B-Instruct-v0.2",
    model_platform=ModelPlatformType.OPENAI_COMPATIBLE_MODEL,
    url="https://api.together.xyz/v1",
    api_key="your-api-key"
)

5. Integration with Other Modules¶

5.1 With ChatAgent¶

Models are primarily used with the ChatAgent class:

from sikkaagent.agents import ChatAgent
from sikkaagent.models import ModelConfigure
from sikkaagent.utils.enums import ModelPlatformType

# Create model
model = ModelConfigure(
    model="llama3.1:8b",
    model_platform=ModelPlatformType.OLLAMA
)

# Create agent with model
agent = ChatAgent(
    model=model,
    system_prompt="You are a helpful assistant."
)

# Use the agent
response = agent.step("Hello, how are you?")
print(response)

5.2 With Memory¶

Models provide token counting for Memory:

from sikkaagent.memories import Memory
from sikkaagent.storages import InMemoryStorage
from sikkaagent.models import ModelConfigure
from sikkaagent.utils.enums import ModelPlatformType

# Initialize model for token counting
model = ModelConfigure(
    model="llama3.1:8b",
    model_platform=ModelPlatformType.OLLAMA
)

# Create memory with token limit
memory = Memory(
    storage=InMemoryStorage(),
    token_counter=model.token_counter,
    token_limit=4000  # Maximum tokens for context
)

6. Advanced Topics¶

6.1 Tool Calling¶

Models can use tools to perform actions:

from sikkaagent.models import ModelConfigure
from sikkaagent.utils.enums import ModelPlatformType

# Create model
model = ModelConfigure(
    model="llama3.1:8b",
    model_platform=ModelPlatformType.OLLAMA
)

# Define tools
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather in a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g. San Francisco, CA"
                    }
                },
                "required": ["location"]
            }
        }
    }
]

# Run the model with tools
response = model.run(
    messages=[
        {"role": "user", "content": "What's the weather like in Boston?"}
    ],
    tools=tools
)

# Process tool calls
tool_calls = response.choices[0].message.tool_calls
if tool_calls:
    # Handle tool calls
    print(f"Tool: {tool_calls[0].function.name}")
    print(f"Arguments: {tool_calls[0].function.arguments}")

6.2 Streaming Responses¶

Models can stream responses for better user experience:

from sikkaagent.models import ModelConfigure
from sikkaagent.utils.enums import ModelPlatformType

# Create model with streaming enabled
model = ModelConfigure(
    model="llama3.1:8b",
    model_platform=ModelPlatformType.OLLAMA,
    config={"stream": True}
)

# Run the model with streaming
stream = model.run(
    messages=[
        {"role": "user", "content": "Write a short story about a robot."}
    ]
)

# Process the stream
for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

6.3 Token Management¶

Managing tokens is important for staying within model context limits:

from sikkaagent.models import ModelConfigure
from sikkaagent.utils.enums import ModelPlatformType

# Create model
model = ModelConfigure(
    model="llama3.1:8b",
    model_platform=ModelPlatformType.OLLAMA
)

# Count tokens in messages
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello, how are you?"}
]

token_count = model.token_counter.count_tokens_from_messages(messages)
print(f"Token count: {token_count}")

# Check if within token limit
if token_count < model.token_limit:
    print("Within token limit")
else:
    print("Exceeds token limit")

7. Best Practices¶

Model Selection: Choose the appropriate model based on your needs:
OpenAI models for production applications requiring high reliability
AWS Bedrock for organizations with existing AWS infrastructure
Ollama for development, privacy-sensitive applications, or cost-sensitive deployments
OpenRouter for experimenting with different model providers
Token Management: Be aware of each model's context window limitations:
Use token counting to stay within limits
Implement windowing or summarization for long conversations
Consider using smaller models for simpler tasks
Error Handling: Implement robust error handling:
Retry logic for API-based models
Fallback models for critical applications
Graceful degradation when models are unavailable
Security: Protect API keys and credentials:
Use environment variables for API keys
Implement proper access controls
Consider using AWS IAM roles for Bedrock
Cost Optimization: Manage costs effectively:
Use smaller models for simpler tasks
Implement caching for common queries
Monitor usage and set up alerts