How to Debug LLM Agents Using Braintrust
This guide explains how to effectively debug LLM agents using Braintrust's tracing and monitoring capabilities.
When to Use Braintrust for Debugging
Leverage Braintrust for debugging when:
- You notice performance issues in evaluation metrics
- You need to understand why a specific test case is failing
- You want to trace the full lifecycle of an LLM interaction
- You need to analyze token usage patterns or latency issues
- You're comparing behavior between different agent versions
Prerequisites
- Braintrust account with access to your project
BRAINTRUST_API_KEYenvironment variable set- Basic familiarity with the LLM agent architecture
Accessing Traces
Step 1: Run Evaluation with Tracing Enabled
To debug an agent, first run an evaluation with tracing enabled:
# For running a specific evaluation with tracing enabled
export BRAINTRUST_TRACING=true
python -m trially_agents.patient_matcher
Step 2: Navigate to Experiment Results
- Visit the Braintrust dashboard:
https://www.braintrust.dev - Select your project (e.g.,
patient_matcher_development) - Find your experiment in the list and click on it
Step 3: View Experiment Details
In the experiment view, you'll see:
- Overall metrics and scores
- A list of test cases with inputs, outputs, and scores
- Trace information for each test case
Step 4: Analyze Trace Data
For a specific test case go to the "Logs" tab:
- Click on the test case row to expand it
- Look for the "Trace" tab in the expanded view
- This shows all spans of the LLM API calls with detailed information
Understanding Span Information
Each span contains:
- Input: The input to the LLM
- Output: The output of the LLM
- Expected: The expected output
- Metadata: Context information about the request
Troubleshooting Common Issues
Missing Trace Data
If trace data is missing:
- Verify
BRAINTRUST_API_KEYis correctly set - Ensure
BRAINTRUST_TRACING=trueis set on LLM calls - Check if you're using the traced client: