Using the `sampled_traced` Decorator for Efficient LLM Tracing

This tutorial explains the sampled_traced decorator, which enables selective tracing of LLM agent functions in high-volume workflows.

What You'll Learn

What the sampled_traced decorator is and why it's necessary
How the sampling mechanism works
When and how to use the decorator in your code
Best practices for setting sampling rates
How to check if tracing is active within your functions

Prerequisites

Basic familiarity with Python decorators
Understanding of the Braintrust integration in our codebase
Access to the Braintrust dashboard for viewing traces

Why Sampling is Necessary

LLM observability through tracing is invaluable for debugging and monitoring agent behavior. However, in high-volume production environments, tracing every single function call can lead to:

Excessive storage costs - Each trace stores detailed information about inputs, outputs, and metadata
Performance overhead - Tracing introduces a small but cumulative performance impact
Braintrust dashboard clutter - Too many traces make it difficult to find relevant information

The sampled_traced decorator solves these issues by only tracing a configurable percentage of function calls while ensuring you still get representative data for analysis.

How `sampled_traced` Works

The sampled_traced decorator wraps the standard braintrust.traced decorator with a random sampling mechanism:

For each function call, it generates a random number between 0 and 1
If the random number is less than the specified sampling rate, the function call is traced
Otherwise, the function executes normally without tracing
A context variable (is_being_traced) is set when tracing is active so nested functions can check the status

When to Use `sampled_traced`

Use the sampled_traced decorator when:

Your function is called frequently (thousands or millions of times per day)
You need tracing for debugging but don't need to trace every single call
You want to reduce the observability costs while maintaining visibility

Real-world examples from our codebase:

The patient matcher agent which processes hundreds of thousands of requests daily
Any high-volume inference endpoint that uses LLMs
Background jobs that run at scale

Implementation Example

Here's how to use the sampled_traced decorator:

from trially_agents.utils.logging import sampled_traced, is_being_traced

class MyHighVolumeAgent:
    # Define a sampling rate as a class attribute for easy configuration
    sampling_rate = 0.01  # Trace 1% of calls
    prompt: Prompt = high_volume_task_prompt

    @sampled_traced(
        sampling_rate=sampling_rate,
        name="my_high_volume_function"  # Name that will appear in Braintrust
    )
    def process_request(self, input_data):
        # You can check if this call is being traced
        return self.prompt.invoke(
            {"input": input_data},
            trace=is_being_traced.get(),
        )

Checking Trace Status in Nested Functions

When using nested function calls, you can check if the parent function is being traced:

from trially_agents.utils.logging import is_being_traced

def helper_function(data):
    # Check if this is part of a traced call
    if is_being_traced.get():
        # This code only runs when the parent function is being traced
        print("This helper function is part of a traced call")

    # Continue with normal processing
    return process_data(data)

This pattern is especially useful when working with prompt invocations:

# Pass the trace status to the prompt invocation
response = prompt.invoke(
    {"input": user_query},
    trace=is_being_traced.get()  # Only trace if parent is being traced
)

Setting Appropriate Sampling Rates

Choosing the right sampling rate depends on your use case:

Call Volume (daily)	Recommended Rate	Rationale
< 1,000	1.0 (100%)	Low volume, trace everything
1,000 - 10,000	0.1 - 0.5 (10-50%)	Moderate volume, sample reasonably
10,000 - 100,000	0.01 - 0.1 (1-10%)	High volume, be more selective
> 100,000	0.001 - 0.01 (0.1-1%)	Very high volume, minimal sampling

Guidelines for setting sampling rates:

Start low and adjust up - Begin with a lower rate and increase it if needed
Consider statistical significance - Ensure you're capturing enough samples for meaningful analysis
Adjust based on stability - Stable, mature services can use lower rates than new features
Temporary boosting - Temporarily increase sampling during debugging or issue investigation

Best Practices

Class Constants - Define sampling rates as class constants for easy configuration
Environment-Specific Rates - Consider using higher sampling rates in staging environments
Consistent Naming - Use descriptive, consistent names in the decorator for easy filtering in Braintrust
Propagate Trace Status - Always pass the trace status to nested function calls
Periodic Review - Regularly review sampling rates to balance visibility with cost

Real-World Example: Patient Matcher

In our codebase, the CriteriaAnsweringTask class in the patient matcher uses sampled_traced to efficiently trace only 1% of the criteria answering calls:

class CriteriaAnsweringTask:
    prompt: Prompt = criteria_answering_task_prompt
    sampling_rate: float = 0.01  # Only trace 1% of calls

    @sampled_traced(
        sampling_rate=sampling_rate,
        name="criteria_answering_task",
    )
    def invoke(
        self, question: str, documents: List[str], screening_date: str
    ) -> EligibilityAnswer:
        return self.prompt.invoke(
            {
                "question": question,
                "documents": documents,
                "screening_date": screening_date,
            },
            trace=is_being_traced.get(),  # Pass trace status to the prompt
        )

This approach ensures we capture enough data for debugging and monitoring while keeping performance impact and costs manageable.

Troubleshooting

If you're encountering issues with sampled_traced:

No traces appearing - Check your sampling rate; it might be too low
Too many traces - Lower your sampling rate
Missing nested spans - Ensure you're passing trace=is_being_traced.get() to nested functions
Inconsistent tracing - Verify that context variables are being properly propagated through your call stack

Summary

The sampled_traced decorator is a powerful tool for balancing comprehensive observability with performance and cost concerns. By sampling a fraction of LLM function calls for tracing, you can:

Maintain visibility into system behavior
Reduce overhead and costs in high-volume systems
Keep your Braintrust dashboard focused on relevant data
Still capture enough information for debugging and performance analysis

Use this decorator in any high-volume workflow where tracing every single call would be excessive.

Using the sampled_traced Decorator for Efficient LLM Tracing