Using the sampled_traced Decorator for Efficient LLM Tracing
This tutorial explains the sampled_traced decorator, which enables selective tracing of LLM agent functions in high-volume workflows.
What You'll Learn
- What the
sampled_traceddecorator is and why it's necessary - How the sampling mechanism works
- When and how to use the decorator in your code
- Best practices for setting sampling rates
- How to check if tracing is active within your functions
Prerequisites
- Basic familiarity with Python decorators
- Understanding of the Braintrust integration in our codebase
- Access to the Braintrust dashboard for viewing traces
Why Sampling is Necessary
LLM observability through tracing is invaluable for debugging and monitoring agent behavior. However, in high-volume production environments, tracing every single function call can lead to:
- Excessive storage costs - Each trace stores detailed information about inputs, outputs, and metadata
- Performance overhead - Tracing introduces a small but cumulative performance impact
- Braintrust dashboard clutter - Too many traces make it difficult to find relevant information
The sampled_traced decorator solves these issues by only tracing a configurable percentage of function calls while ensuring you still get representative data for analysis.
How sampled_traced Works
The sampled_traced decorator wraps the standard braintrust.traced decorator with a random sampling mechanism:
- For each function call, it generates a random number between 0 and 1
- If the random number is less than the specified sampling rate, the function call is traced
- Otherwise, the function executes normally without tracing
- A context variable (
is_being_traced) is set when tracing is active so nested functions can check the status
When to Use sampled_traced
Use the sampled_traced decorator when:
- Your function is called frequently (thousands or millions of times per day)
- You need tracing for debugging but don't need to trace every single call
- You want to reduce the observability costs while maintaining visibility
Real-world examples from our codebase:
- The patient matcher agent which processes hundreds of thousands of requests daily
- Any high-volume inference endpoint that uses LLMs
- Background jobs that run at scale
Implementation Example
Here's how to use the sampled_traced decorator:
from trially_agents.utils.logging import sampled_traced, is_being_traced
class MyHighVolumeAgent:
# Define a sampling rate as a class attribute for easy configuration
sampling_rate = 0.01 # Trace 1% of calls
prompt: Prompt = high_volume_task_prompt
@sampled_traced(
sampling_rate=sampling_rate,
name="my_high_volume_function" # Name that will appear in Braintrust
)
def process_request(self, input_data):
# You can check if this call is being traced
return self.prompt.invoke(
{"input": input_data},
trace=is_being_traced.get(),
)
Checking Trace Status in Nested Functions
When using nested function calls, you can check if the parent function is being traced:
from trially_agents.utils.logging import is_being_traced
def helper_function(data):
# Check if this is part of a traced call
if is_being_traced.get():
# This code only runs when the parent function is being traced
print("This helper function is part of a traced call")
# Continue with normal processing
return process_data(data)
This pattern is especially useful when working with prompt invocations:
# Pass the trace status to the prompt invocation
response = prompt.invoke(
{"input": user_query},
trace=is_being_traced.get() # Only trace if parent is being traced
)
Setting Appropriate Sampling Rates
Choosing the right sampling rate depends on your use case:
| Call Volume (daily) | Recommended Rate | Rationale |
|---|---|---|
| < 1,000 | 1.0 (100%) | Low volume, trace everything |
| 1,000 - 10,000 | 0.1 - 0.5 (10-50%) | Moderate volume, sample reasonably |
| 10,000 - 100,000 | 0.01 - 0.1 (1-10%) | High volume, be more selective |
| > 100,000 | 0.001 - 0.01 (0.1-1%) | Very high volume, minimal sampling |
Guidelines for setting sampling rates:
- Start low and adjust up - Begin with a lower rate and increase it if needed
- Consider statistical significance - Ensure you're capturing enough samples for meaningful analysis
- Adjust based on stability - Stable, mature services can use lower rates than new features
- Temporary boosting - Temporarily increase sampling during debugging or issue investigation
Best Practices
- Class Constants - Define sampling rates as class constants for easy configuration
- Environment-Specific Rates - Consider using higher sampling rates in staging environments
- Consistent Naming - Use descriptive, consistent names in the decorator for easy filtering in Braintrust
- Propagate Trace Status - Always pass the trace status to nested function calls
- Periodic Review - Regularly review sampling rates to balance visibility with cost
Real-World Example: Patient Matcher
In our codebase, the CriteriaAnsweringTask class in the patient matcher uses sampled_traced to efficiently trace only 1% of the criteria answering calls:
class CriteriaAnsweringTask:
prompt: Prompt = criteria_answering_task_prompt
sampling_rate: float = 0.01 # Only trace 1% of calls
@sampled_traced(
sampling_rate=sampling_rate,
name="criteria_answering_task",
)
def invoke(
self, question: str, documents: List[str], screening_date: str
) -> EligibilityAnswer:
return self.prompt.invoke(
{
"question": question,
"documents": documents,
"screening_date": screening_date,
},
trace=is_being_traced.get(), # Pass trace status to the prompt
)
This approach ensures we capture enough data for debugging and monitoring while keeping performance impact and costs manageable.
Troubleshooting
If you're encountering issues with sampled_traced:
- No traces appearing - Check your sampling rate; it might be too low
- Too many traces - Lower your sampling rate
- Missing nested spans - Ensure you're passing
trace=is_being_traced.get()to nested functions - Inconsistent tracing - Verify that context variables are being properly propagated through your call stack
Summary
The sampled_traced decorator is a powerful tool for balancing comprehensive observability with performance and cost concerns. By sampling a fraction of LLM function calls for tracing, you can:
- Maintain visibility into system behavior
- Reduce overhead and costs in high-volume systems
- Keep your Braintrust dashboard focused on relevant data
- Still capture enough information for debugging and performance analysis
Use this decorator in any high-volume workflow where tracing every single call would be excessive.