How to Integrate a New Agent with Braintrust
This guide will walk you through the process of integrating a new LLM agent with our Braintrust evaluation framework.
Prerequisites
Before you begin, make sure you have:
- A working LLM agent implementation
- Access to the Braintrust platform
- Your Braintrust API key
- Familiarity with our agent architecture
Step 1: Update the BraintrustProjectName Enum
First, add your new agent to the BraintrustProjectName enum in trially_agents/trially_agents/base.py:
class BraintrustProjectName(StrEnum):
PATIENT_MATCHER = "patient_matcher"
PROTOCOL_PARSER = "protocol_parser"
YOUR_NEW_AGENT = "your_new_agent" # Add your new agent name here
def from_environment(self, environment: Literal["production", "staging"]) -> str:
"""returns a project name that is environment-specific by adding the environment as a suffix"""
return f"{self.value}_{environment}"
Step 2: Create Versioned Prompts
Create versioned prompts for your agent using the Prompt class:
from trially_agents.base import Prompt, BraintrustProjectName, ModelName
# Define your agent prompt
system_message = """You are an AI assistant designed to help with [your agent's specific task].
Your goal is to [describe the objective of your agent]."""
user_message_template = """[Include any placeholders for variables using {{variable_name}}]
Input: {{input}}"""
# Create and push the prompt to Braintrust
your_agent_prompt = Prompt.create_and_push(
name="Your Agent Prompt",
version="v1",
messages=[
{"role": "system", "content": system_message},
{"role": "user", "content": user_message_template}
],
model=ModelName.OPENAI_O3_MINI, # Choose appropriate model
project_name=BraintrustProjectName.YOUR_NEW_AGENT,
description="Initial prompt for your agent"
)
Step 3: Implement Your Agent Class
Create your agent class, integrating with Braintrust:
from typing import Dict, Any, Optional
from pydantic import BaseModel
from trially_agents.base import (
Prompt, BraintrustProjectName, ModelName
)
class YourAgentInputSchema(BaseModel):
input: str
# Add any other input fields your agent requires
class YourAgentOutputSchema(BaseModel):
result: str
confidence: float
# Add any other output fields your agent produces
class YourAgent:
def __init__(self, enable_tracing: bool = False):
self.enable_tracing = enable_tracing
# Initialize your prompts
from trially_agents.prompts.your_agent import your_agent_prompt
self.prompt = your_agent_prompt
def process(self, input_text: str) -> Dict[str, Any]:
"""Process input and return results."""
# Prepare input
input_data = {"input": input_text}
# Call LLM with tracing if enabled
if self.enable_tracing:
response = self.prompt.invoke_traced(input_data)
else:
response = self.prompt.invoke(input_data)
# Process and return results
return {
"result": response.get("result", ""),
"confidence": response.get("confidence", 0.0)
}
Step 4: Create Scoring Functions
Create custom scoring functions for your agent in trially_agents/trially_agents/evals/scores/your_new_agent.py:
from trially_agents.base import PythonScore, BraintrustProjectName
from typing import Dict, Any, Optional
def accuracy_score(input: Dict[str, Any], expected: Dict[str, Any], output: Dict[str, Any], **kwargs) -> float:
"""
Measures the accuracy of your agent's output.
Args:
input: The input given to the agent
expected: The expected output (ground truth)
output: The actual output from the agent
Returns:
float: Score between 0.0 and 1.0
"""
expected_result = expected.get("result", "")
actual_result = output.get("result", "")
# Implement your scoring logic here
# For example, a simple exact match:
if not expected_result or not actual_result:
return 0.0
return 1.0 if expected_result == actual_result else 0.0
# Register the scoring function with Braintrust
accuracy_score_fn = PythonScore.create_and_push(
name="Accuracy Score",
version="v1",
description="Measures the accuracy of the agent's output",
handler=accuracy_score,
project_name=BraintrustProjectName.YOUR_NEW_AGENT,
)
# Add more scoring functions as needed for your agent
Step 5: Create Evaluation Script
Create an evaluation script for your agent in trially_agents/trially_agents/evals/your_new_agent/eval.py:
## Step 6: Update GitHub Workflow
Update your GitHub workflow to include your new agent:
1. Add path filters for your agent in `.github/workflows/llm-evals.yml`:
```yaml
filters: |
patient_matcher:
- 'trially_agents/trially_agents/evals/scores/patient_matcher.py'
- 'trially_agents/trially_agents/prompts/patient_matcher.py'
- 'trially_agents/trially_agents/patient_matcher.py'
protocol_parser:
- 'trially_agents/trially_agents/evals/scores/protocol_parser.py'
- 'trially_agents/trially_agents/prompts/protocol_parser.py'
- 'trially_agents/trially_agents/protocol_parser.py'
your_new_agent:
- 'trially_agents/trially_agents/evals/scores/your_new_agent.py'
- 'trially_agents/trially_agents/prompts/your_new_agent.py'
- 'trially_agents/trially_agents/your_new_agent.py'
Best Practices
- Version Control: Keep all prompts and scoring functions versioned
- Test Cases: Create diverse test cases that cover edge cases
- Documentation: Document all scoring functions with clear explanations
- Modularity: Keep agent logic separate from evaluation code
- Reusability: Reuse common scoring components where possible
- Error Handling: Add robust error handling in evaluation scripts
Troubleshooting
Common Issues
- Authentication errors: Verify your API key is correctly set
- Scoring errors: Make sure you are using a model that supports tools
For additional help, refer to the Braintrust API Reference or the Debugging Guide.