Skip to content

Models

For more detailed usage information please refer to our cookbook: Models Cookbook

1. Concept

The Models module in Sikka Agent provides a unified interface for working with various AI model providers. It abstracts away provider-specific implementation details, enabling consistent API usage across different model platforms while handling token counting, rate limiting, and other model-specific requirements.

Sikka Agent's model system consists of several key components:

  1. ModelConfigure: The central class that manages model configuration and provides a unified interface
  2. Model Backend: Implementations for specific providers (OpenAI, AWS Bedrock, Ollama, etc.)
  3. Token Counter: Utilities for counting tokens in messages and managing context limits
  4. Audio Models: Specialized models for speech-to-text and text-to-speech operations

2. Get Started

2.1 Basic Usage

Here's a quick example of how to use the ModelConfigure class:

from sikkaagent.models import ModelConfigure
from sikkaagent.utils.enums import ModelPlatformType

# OpenAI model
openai_model = ModelConfigure(
    model="gpt-4o",
    model_platform=ModelPlatformType.OPENAI,
    api_key="your-api-key"  # Or set OPENAI_API_KEY environment variable
)

# Run the model
response = openai_model.run(
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello, how are you?"}
    ]
)

print(response.choices[0].message.content)

Using Ollama (local inference):

from sikkaagent.models import ModelConfigure
from sikkaagent.utils.enums import ModelPlatformType

# Ollama model (local)
ollama_model = ModelConfigure(
    model="llama3.1:8b",
    model_platform=ModelPlatformType.OLLAMA,
    url="http://localhost:11434/v1"  # Default Ollama endpoint
)

# Run the model
response = ollama_model.run(
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain quantum computing in simple terms."}
    ]
)

print(response.choices[0].message.content)

3. Core Components

3.1 ModelConfigure

The primary interface for configuring and using models.

Parameters:

  • model: (Optional) Model identifier (e.g., "gpt-4o", "llama3.1:8b"). Defaults to "gpt-4o-mini" if not provided.
  • model_platform: (Optional) Provider platform enum. Defaults to standard OpenAI client if not provided.
  • api_key: (Optional) API key for authentication. Defaults to environment variable if not provided.
  • url: (Optional) API endpoint URL. Defaults to environment variable or provider default if not provided.
  • config: (Optional) Model-specific configuration. Defaults to empty ModelConfig if not provided.
  • token_counter: (Optional) Custom token counter. Automatically initialized if not provided.
  • aws_access_key: (Optional) AWS access key for Bedrock. Defaults to environment variable if not provided.
  • aws_secret_key: (Optional) AWS secret key for Bedrock. Defaults to environment variable if not provided.
  • aws_region_name: (Optional) AWS region for Bedrock. Defaults to "us-east-1" if not provided.

Methods:

  • run(messages): Run the model with the given messages
  • token_counter: Property that returns the token counter for the model

3.2 ModelConfig

Configuration for model parameters using Pydantic for validation.

Attributes:

  • temperature: (Optional) Controls randomness (0.0-2.0, default: 0.7)
  • top_p: (Optional) Controls diversity via nucleus sampling (0.0-1.0, default: 1.0)
  • frequency_penalty: (Optional) Penalizes repeated tokens (-2.0-2.0, default: 0.0)
  • presence_penalty: (Optional) Penalizes repeated topics (-2.0-2.0, default: 0.0)
  • max_tokens: (Optional) Maximum number of tokens to generate (default: 2048)
  • n: (Optional) Number of completions to generate (default: 1)
  • stream: (Optional) Whether to stream the response (default: False)
  • tool_choice: (Optional) Tool choice configuration (default: None)
  • tools: (Optional) List of tools available to the model (default: None)
  • user: (Optional) User identifier (default: "")

3.3 BaseModelBackend

Abstract base class for different model backends.

Parameters:

  • model: (Required) Model identifier
  • config: (Optional) Configuration dictionary. Defaults to empty dict.
  • api_key: (Optional) API key for authentication
  • url: (Optional) API endpoint URL
  • token_counter: (Optional) Custom token counter

Methods:

  • run(messages): (Required) Run the model with the given messages
  • token_counter: (Required) Property that returns the token counter for the model
  • preprocess_messages(messages): (Optional) Preprocess messages before sending to the model

3.4 TokenCounter

Utility for counting tokens in messages.

Parameters:

  • tokenizer: (Required) The tokenizer to use for counting tokens

Methods:

  • count_tokens_from_messages(messages): (Required) Count tokens in a list of messages
  • count_text(text): (Required) Count tokens in a text string
  • count_image(image_item): (Required) Count tokens for an image based on detail level
  • count_tool_calls(tool_calls): (Required) Count tokens for tool calls
  • count_tool_responses(tool_responses): (Required) Count tokens for tool responses
  • count_content(content): (Required) Calculate tokens for message content

4. Model Implementations

4.1 OpenAI

The standard OpenAI client is used when model_platform is not specified or set to ModelPlatformType.OPENAI.

from sikkaagent.models import ModelConfigure

# OpenAI model
model = ModelConfigure(
    model="gpt-4o",
    model_platform=ModelPlatformType.OPENAI
)

4.2 Ollama

Local inference using Ollama.

from sikkaagent.models import ModelConfigure
from sikkaagent.utils.enums import ModelPlatformType

# Ollama model
model = ModelConfigure(
    model="llama3.1:8b",
    model_platform=ModelPlatformType.OLLAMA,
    url="http://localhost:11434/v1"  # Default Ollama endpoint
)

4.3 AWS Bedrock

AWS Bedrock models.

Setup your AWS Credentials in environment file to use AWS Bedrock

from sikkaagent.models import ModelConfigure
from sikkaagent.utils.enums import ModelPlatformType

# AWS Bedrock model
model = ModelConfigure(
    model="us.meta.llama3-1-8b-instruct-v1:0",
    model_platform=ModelPlatformType.AWS_BEDROCK,
)

4.4 OpenRouter

Access to many models through one API.

from sikkaagent.models import ModelConfigure
from sikkaagent.utils.enums import ModelPlatformType

# OpenRouter model
model = ModelConfigure(
    model="anthropic/claude-3-opus",
    model_platform=ModelPlatformType.OPENROUTER,
    api_key="your-api-key"  # Or set OPENROUTER_API_KEY environment variable
)

4.5 OpenAI-Compatible

Any OpenAI-compatible endpoint.

from sikkaagent.models import ModelConfigure
from sikkaagent.utils.enums import ModelPlatformType

# OpenAI-compatible model
model = ModelConfigure(
    model="mistralai/Mistral-7B-Instruct-v0.2",
    model_platform=ModelPlatformType.OPENAI_COMPATIBLE_MODEL,
    url="https://api.together.xyz/v1",
    api_key="your-api-key"
)

5. Integration with Other Modules

5.1 With ChatAgent

Models are primarily used with the ChatAgent class:

from sikkaagent.agents import ChatAgent
from sikkaagent.models import ModelConfigure
from sikkaagent.utils.enums import ModelPlatformType

# Create model
model = ModelConfigure(
    model="llama3.1:8b",
    model_platform=ModelPlatformType.OLLAMA
)

# Create agent with model
agent = ChatAgent(
    model=model,
    system_prompt="You are a helpful assistant."
)

# Use the agent
response = agent.step("Hello, how are you?")
print(response)

5.2 With Memory

Models provide token counting for Memory:

from sikkaagent.memories import Memory
from sikkaagent.storages import InMemoryStorage
from sikkaagent.models import ModelConfigure
from sikkaagent.utils.enums import ModelPlatformType

# Initialize model for token counting
model = ModelConfigure(
    model="llama3.1:8b",
    model_platform=ModelPlatformType.OLLAMA
)

# Create memory with token limit
memory = Memory(
    storage=InMemoryStorage(),
    token_counter=model.token_counter,
    token_limit=4000  # Maximum tokens for context
)

6. Advanced Topics

6.1 Tool Calling

Models can use tools to perform actions:

from sikkaagent.models import ModelConfigure
from sikkaagent.utils.enums import ModelPlatformType

# Create model
model = ModelConfigure(
    model="llama3.1:8b",
    model_platform=ModelPlatformType.OLLAMA
)

# Define tools
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather in a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g. San Francisco, CA"
                    }
                },
                "required": ["location"]
            }
        }
    }
]

# Run the model with tools
response = model.run(
    messages=[
        {"role": "user", "content": "What's the weather like in Boston?"}
    ],
    tools=tools
)

# Process tool calls
tool_calls = response.choices[0].message.tool_calls
if tool_calls:
    # Handle tool calls
    print(f"Tool: {tool_calls[0].function.name}")
    print(f"Arguments: {tool_calls[0].function.arguments}")

6.2 Streaming Responses

Models can stream responses for better user experience:

from sikkaagent.models import ModelConfigure
from sikkaagent.utils.enums import ModelPlatformType

# Create model with streaming enabled
model = ModelConfigure(
    model="llama3.1:8b",
    model_platform=ModelPlatformType.OLLAMA,
    config={"stream": True}
)

# Run the model with streaming
stream = model.run(
    messages=[
        {"role": "user", "content": "Write a short story about a robot."}
    ]
)

# Process the stream
for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

6.3 Token Management

Managing tokens is important for staying within model context limits:

from sikkaagent.models import ModelConfigure
from sikkaagent.utils.enums import ModelPlatformType

# Create model
model = ModelConfigure(
    model="llama3.1:8b",
    model_platform=ModelPlatformType.OLLAMA
)

# Count tokens in messages
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello, how are you?"}
]

token_count = model.token_counter.count_tokens_from_messages(messages)
print(f"Token count: {token_count}")

# Check if within token limit
if token_count < model.token_limit:
    print("Within token limit")
else:
    print("Exceeds token limit")

7. Best Practices

  • Model Selection: Choose the appropriate model based on your needs:
  • OpenAI models for production applications requiring high reliability
  • AWS Bedrock for organizations with existing AWS infrastructure
  • Ollama for development, privacy-sensitive applications, or cost-sensitive deployments
  • OpenRouter for experimenting with different model providers

  • Token Management: Be aware of each model's context window limitations:

  • Use token counting to stay within limits
  • Implement windowing or summarization for long conversations
  • Consider using smaller models for simpler tasks

  • Error Handling: Implement robust error handling:

  • Retry logic for API-based models
  • Fallback models for critical applications
  • Graceful degradation when models are unavailable

  • Security: Protect API keys and credentials:

  • Use environment variables for API keys
  • Implement proper access controls
  • Consider using AWS IAM roles for Bedrock

  • Cost Optimization: Manage costs effectively:

  • Use smaller models for simpler tasks
  • Implement caching for common queries
  • Monitor usage and set up alerts