How to Integrate a New Agent with Braintrust

This guide will walk you through the process of integrating a new LLM agent with our Braintrust evaluation framework.

Prerequisites

Before you begin, make sure you have:

A working LLM agent implementation
Access to the Braintrust platform
Your Braintrust API key
Familiarity with our agent architecture

Step 1: Update the BraintrustProjectName Enum

First, add your new agent to the BraintrustProjectName enum in trially_agents/trially_agents/base.py:

class BraintrustProjectName(StrEnum):
    PATIENT_MATCHER = "patient_matcher"
    PROTOCOL_PARSER = "protocol_parser"
    YOUR_NEW_AGENT = "your_new_agent"  # Add your new agent name here

    def from_environment(self, environment: Literal["production", "staging"]) -> str:
        """returns a project name that is environment-specific by adding the environment as a suffix"""
        return f"{self.value}_{environment}"

Step 2: Create Versioned Prompts

Create versioned prompts for your agent using the Prompt class:

from trially_agents.base import Prompt, BraintrustProjectName, ModelName

# Define your agent prompt
system_message = """You are an AI assistant designed to help with [your agent's specific task].
Your goal is to [describe the objective of your agent]."""

user_message_template = """[Include any placeholders for variables using {{variable_name}}]
Input: {{input}}"""

# Create and push the prompt to Braintrust
your_agent_prompt = Prompt.create_and_push(
    name="Your Agent Prompt",
    version="v1",
    messages=[
        {"role": "system", "content": system_message},
        {"role": "user", "content": user_message_template}
    ],
    model=ModelName.OPENAI_O3_MINI,  # Choose appropriate model
    project_name=BraintrustProjectName.YOUR_NEW_AGENT,
    description="Initial prompt for your agent"
)

Step 3: Implement Your Agent Class

Create your agent class, integrating with Braintrust:

from typing import Dict, Any, Optional
from pydantic import BaseModel
from trially_agents.base import (
    Prompt, BraintrustProjectName, ModelName
)

class YourAgentInputSchema(BaseModel):
    input: str
    # Add any other input fields your agent requires

class YourAgentOutputSchema(BaseModel):
    result: str
    confidence: float
    # Add any other output fields your agent produces

class YourAgent:
    def __init__(self, enable_tracing: bool = False):
        self.enable_tracing = enable_tracing

        # Initialize your prompts
        from trially_agents.prompts.your_agent import your_agent_prompt
        self.prompt = your_agent_prompt

    def process(self, input_text: str) -> Dict[str, Any]:
        """Process input and return results."""

        # Prepare input
        input_data = {"input": input_text}

        # Call LLM with tracing if enabled
        if self.enable_tracing:
            response = self.prompt.invoke_traced(input_data)
        else:
            response = self.prompt.invoke(input_data)

        # Process and return results
        return {
            "result": response.get("result", ""),
            "confidence": response.get("confidence", 0.0)
        }

Step 4: Create Scoring Functions

Create custom scoring functions for your agent in trially_agents/trially_agents/evals/scores/your_new_agent.py:

from trially_agents.base import PythonScore, BraintrustProjectName
from typing import Dict, Any, Optional

def accuracy_score(input: Dict[str, Any], expected: Dict[str, Any], output: Dict[str, Any], **kwargs) -> float:
    """
    Measures the accuracy of your agent's output.

    Args:
        input: The input given to the agent
        expected: The expected output (ground truth)
        output: The actual output from the agent

    Returns:
        float: Score between 0.0 and 1.0
    """
    expected_result = expected.get("result", "")
    actual_result = output.get("result", "")

    # Implement your scoring logic here
    # For example, a simple exact match:
    if not expected_result or not actual_result:
        return 0.0

    return 1.0 if expected_result == actual_result else 0.0

# Register the scoring function with Braintrust
accuracy_score_fn = PythonScore.create_and_push(
    name="Accuracy Score",
    version="v1",
    description="Measures the accuracy of the agent's output",
    handler=accuracy_score,
    project_name=BraintrustProjectName.YOUR_NEW_AGENT,
)

# Add more scoring functions as needed for your agent

Step 5: Create Evaluation Script

Create an evaluation script for your agent in trially_agents/trially_agents/evals/your_new_agent/eval.py:

## Step 6: Update GitHub Workflow

Update your GitHub workflow to include your new agent:

1. Add path filters for your agent in `.github/workflows/llm-evals.yml`:

```yaml
filters: |
  patient_matcher:
    - 'trially_agents/trially_agents/evals/scores/patient_matcher.py'
    - 'trially_agents/trially_agents/prompts/patient_matcher.py'
    - 'trially_agents/trially_agents/patient_matcher.py'
  protocol_parser:
    - 'trially_agents/trially_agents/evals/scores/protocol_parser.py'
    - 'trially_agents/trially_agents/prompts/protocol_parser.py'
    - 'trially_agents/trially_agents/protocol_parser.py'
  your_new_agent:
    - 'trially_agents/trially_agents/evals/scores/your_new_agent.py'
    - 'trially_agents/trially_agents/prompts/your_new_agent.py'
    - 'trially_agents/trially_agents/your_new_agent.py'

Best Practices

Version Control: Keep all prompts and scoring functions versioned
Test Cases: Create diverse test cases that cover edge cases
Documentation: Document all scoring functions with clear explanations
Modularity: Keep agent logic separate from evaluation code
Reusability: Reuse common scoring components where possible
Error Handling: Add robust error handling in evaluation scripts

Troubleshooting

Common Issues

Authentication errors: Verify your API key is correctly set
Scoring errors: Make sure you are using a model that supports tools

For additional help, refer to the Braintrust API Reference or the Debugging Guide.