Skip to content

Using the sampled_traced Decorator for Efficient LLM Tracing

This tutorial explains the sampled_traced decorator, which enables selective tracing of LLM agent functions in high-volume workflows.

What You'll Learn

  • What the sampled_traced decorator is and why it's necessary
  • How the sampling mechanism works
  • When and how to use the decorator in your code
  • Best practices for setting sampling rates
  • How to check if tracing is active within your functions

Prerequisites

  • Basic familiarity with Python decorators
  • Understanding of the Braintrust integration in our codebase
  • Access to the Braintrust dashboard for viewing traces

Why Sampling is Necessary

LLM observability through tracing is invaluable for debugging and monitoring agent behavior. However, in high-volume production environments, tracing every single function call can lead to:

  1. Excessive storage costs - Each trace stores detailed information about inputs, outputs, and metadata
  2. Performance overhead - Tracing introduces a small but cumulative performance impact
  3. Braintrust dashboard clutter - Too many traces make it difficult to find relevant information

The sampled_traced decorator solves these issues by only tracing a configurable percentage of function calls while ensuring you still get representative data for analysis.

How sampled_traced Works

The sampled_traced decorator wraps the standard braintrust.traced decorator with a random sampling mechanism:

  1. For each function call, it generates a random number between 0 and 1
  2. If the random number is less than the specified sampling rate, the function call is traced
  3. Otherwise, the function executes normally without tracing
  4. A context variable (is_being_traced) is set when tracing is active so nested functions can check the status

When to Use sampled_traced

Use the sampled_traced decorator when:

  • Your function is called frequently (thousands or millions of times per day)
  • You need tracing for debugging but don't need to trace every single call
  • You want to reduce the observability costs while maintaining visibility

Real-world examples from our codebase:

  • The patient matcher agent which processes hundreds of thousands of requests daily
  • Any high-volume inference endpoint that uses LLMs
  • Background jobs that run at scale

Implementation Example

Here's how to use the sampled_traced decorator:

from trially_agents.utils.logging import sampled_traced, is_being_traced

class MyHighVolumeAgent:
    # Define a sampling rate as a class attribute for easy configuration
    sampling_rate = 0.01  # Trace 1% of calls
    prompt: Prompt = high_volume_task_prompt

    @sampled_traced(
        sampling_rate=sampling_rate,
        name="my_high_volume_function"  # Name that will appear in Braintrust
    )
    def process_request(self, input_data):
        # You can check if this call is being traced
        return self.prompt.invoke(
            {"input": input_data},
            trace=is_being_traced.get(),
        )

Checking Trace Status in Nested Functions

When using nested function calls, you can check if the parent function is being traced:

from trially_agents.utils.logging import is_being_traced

def helper_function(data):
    # Check if this is part of a traced call
    if is_being_traced.get():
        # This code only runs when the parent function is being traced
        print("This helper function is part of a traced call")

    # Continue with normal processing
    return process_data(data)

This pattern is especially useful when working with prompt invocations:

# Pass the trace status to the prompt invocation
response = prompt.invoke(
    {"input": user_query},
    trace=is_being_traced.get()  # Only trace if parent is being traced
)

Setting Appropriate Sampling Rates

Choosing the right sampling rate depends on your use case:

Call Volume (daily) Recommended Rate Rationale
< 1,000 1.0 (100%) Low volume, trace everything
1,000 - 10,000 0.1 - 0.5 (10-50%) Moderate volume, sample reasonably
10,000 - 100,000 0.01 - 0.1 (1-10%) High volume, be more selective
> 100,000 0.001 - 0.01 (0.1-1%) Very high volume, minimal sampling

Guidelines for setting sampling rates:

  1. Start low and adjust up - Begin with a lower rate and increase it if needed
  2. Consider statistical significance - Ensure you're capturing enough samples for meaningful analysis
  3. Adjust based on stability - Stable, mature services can use lower rates than new features
  4. Temporary boosting - Temporarily increase sampling during debugging or issue investigation

Best Practices

  1. Class Constants - Define sampling rates as class constants for easy configuration
  2. Environment-Specific Rates - Consider using higher sampling rates in staging environments
  3. Consistent Naming - Use descriptive, consistent names in the decorator for easy filtering in Braintrust
  4. Propagate Trace Status - Always pass the trace status to nested function calls
  5. Periodic Review - Regularly review sampling rates to balance visibility with cost

Real-World Example: Patient Matcher

In our codebase, the CriteriaAnsweringTask class in the patient matcher uses sampled_traced to efficiently trace only 1% of the criteria answering calls:

class CriteriaAnsweringTask:
    prompt: Prompt = criteria_answering_task_prompt
    sampling_rate: float = 0.01  # Only trace 1% of calls

    @sampled_traced(
        sampling_rate=sampling_rate,
        name="criteria_answering_task",
    )
    def invoke(
        self, question: str, documents: List[str], screening_date: str
    ) -> EligibilityAnswer:
        return self.prompt.invoke(
            {
                "question": question,
                "documents": documents,
                "screening_date": screening_date,
            },
            trace=is_being_traced.get(),  # Pass trace status to the prompt
        )

This approach ensures we capture enough data for debugging and monitoring while keeping performance impact and costs manageable.

Troubleshooting

If you're encountering issues with sampled_traced:

  1. No traces appearing - Check your sampling rate; it might be too low
  2. Too many traces - Lower your sampling rate
  3. Missing nested spans - Ensure you're passing trace=is_being_traced.get() to nested functions
  4. Inconsistent tracing - Verify that context variables are being properly propagated through your call stack

Summary

The sampled_traced decorator is a powerful tool for balancing comprehensive observability with performance and cost concerns. By sampling a fraction of LLM function calls for tracing, you can:

  • Maintain visibility into system behavior
  • Reduce overhead and costs in high-volume systems
  • Keep your Braintrust dashboard focused on relevant data
  • Still capture enough information for debugging and performance analysis

Use this decorator in any high-volume workflow where tracing every single call would be excessive.