DEV Community 4h ago

Tracing, Prometheus metrics, and structured logs with two decorators: Fitz vs the OpenTelemetry setup in FastAPI

For full observability in FastAPI you need 6 pip packages + 60 lines of config + manual glue between logs/spans/metrics. In Fitz it's two decorators and an env var. With trace_id auto-correlated between logs and spans, and Secret<T> redacted in logs without thinking.

The stack every "production-ready" app ends up gluing together

Your app grows. The client wants to know which endpoint is slow, how many requests failed in the last hour, and why a specific user saw an error at 3 AM. We're talking about the Sacred Triangle of observability: traces, metrics, logs.

In 2026 the industry answer is OpenTelemetry for all three. In Python with FastAPI:

pip install opentelemetry-distro[otlp] \
  opentelemetry-instrumentation-fastapi \
  opentelemetry-instrumentation-sqlalchemy \
  opentelemetry-instrumentation-requests \
  opentelemetry-exporter-otlp-proto-grpc \
  prometheus-fastapi-instrumentator \
  structlog

observability.py (~60 lines):

import os
import logging
from opentelemetry import trace, metrics
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.exporter.otlp.proto.grpc.metric_exporter import OTLPMetricExporter
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
from opentelemetry.instrumentation.sqlalchemy import SQLAlchemyInstrumentor
from opentelemetry.sdk.trace.sampling import TraceIdRatioBased
import structlog
from prometheus_fastapi_instrumentator import Instrumentator

SERVICE_NAME = os.environ.get("OTEL_SERVICE_NAME", "myapp")
OTLP_ENDPOINT = os.environ.get("OTEL_EXPORTER_OTLP_ENDPOINT")
SAMPLE_RATIO = float(os.environ.get("OTEL_TRACES_SAMPLER_ARG", "1.0"))

def setup_observability(app, engine):
    resource = Resource.create({"service.name": SERVICE_NAME})
    
    if OTLP_ENDPOINT:
        trace_provider = TracerProvider(
            resource=resource,
            sampler=TraceIdRatioBased(SAMPLE_RATIO),
        )
        trace_provider.add_span_processor(
            BatchSpanProcessor(OTLPSpanExporter(endpoint=OTLP_ENDPOINT))
        )
        trace.set_tracer_provider(trace_provider)
        
        metric_reader = PeriodicExportingMetricReader(
            OTLPMetricExporter(endpoint=OTLP_ENDPOINT)
        )
        meter_provider = MeterProvider(
            resource=resource,
            metric_readers=[metric_reader]
        )
        metrics.set_meter_provider(meter_provider)
        
        FastAPIInstrumentor.instrument_app(app)
        SQLAlchemyInstrumentor().instrument(engine=engine)
        Instrumentator().instrument(app).expose(app, endpoint="/metrics")
    
    # Structured logs with trace_id
    structlog.configure(
        processors=[
            structlog.contextvars.merge_contextvars,
            structlog.processors.add_log_level,
            structlog.processors.TimeStamper(fmt="iso"),
            structlog.processors.dict_tracebacks,
            inject_trace_context,  # custom, see below
            structlog.processors.JSONRenderer(),
        ],
        wrapper_class=structlog.make_filtering_bound_logger(logging.INFO),
        cache_logger_on_first_use=True,
    )

def inject_trace_context(logger, method_name, event_dict):
    span = trace.get_current_span()
    if span and span.get_span_context().is_valid:
        ctx = span.get_span_context()
        event_dict["trace_id"] = format(ctx.trace_id, "032x")
        event_dict["span_id"] = format(ctx.span_id, "016x")
    return event_dict

Plus usage in handlers:

import structlog
from opentelemetry import trace, metrics

log = structlog.get_logger()
tracer = trace.get_tracer(__name__)
meter = metrics.get_meter(__name__)

orders_counter = meter.create_counter("orders_calls_total")
orders_histogram = meter.create_histogram("orders_duration_seconds", unit="s")

@app.post("/orders")
async def process_order(body: OrderIn):
    with tracer.start_as_current_span("process_order") as span:
        span.set_attribute("order.id", body.id)
        start = time.time()
        try:
            log.info("order.processing", order_id=body.id, total=body.total)
            receipt = await actually_process(body)
            orders_counter.add(1, {"status": "success"})
            log.info("order.processed", receipt_id=receipt.id)
            return receipt
        except Exception as e:
            orders_counter.add(1, {"status": "error"})
            log.error("order.failed", order_id=body.id, error=str(e))
            raise
        finally:
            orders_histogram.record(time.time() - start)

Eight pip installs. ~60 lines of setup. ~25 lines per handler to trace + meter + log. Manual connection between the three signals. And watch out: if you forget to call FastAPIInstrumentor.instrument_app(app), no HTTP spans. If you forget SQLAlchemyInstrumentor, the DB doesn't show. If structlog doesn't have the inject_trace_context processor, logs won't correlate with spans.

The same thing in Fitz

@server(8080, prometheus=true)
fn main() => 0

@trace(name="process_order")
@metric(name="orders")
async fn process_order(body: OrderIn) -> Result<Receipt> {
    log.info("order.processing", { order_id: body.id, total: body.total })
    let receipt = actually_process(body).await?
    log.info("order.processed", { receipt_id: receipt.id })
    return Ok(receipt)
}

@post("/orders")
async fn create_order(body: OrderIn) -> Result<Receipt> {
    return process_order(body).await
}

Activate OTLP export:

export OTEL_EXPORTER_OTLP_ENDPOINT=http://jaeger:4318
fitz run main.fitz

That's it.

The raw table

Item	Python (OTel + structlog + 6 libs)	Fitz
Initial setup	~60 LoC + 6 pip installs	`@server(prometheus=true)`
HTTP span per request	`FastAPIInstrumentor.instrument_app(app)`	Auto-instrumented
Custom span over fn	`with tracer.start_as_current_span("X")`	`@trace(name="X")`
Call counter	`meter.create_counter(...)` + `.add(1)`	`@metric(name="X")` (includes duration histogram)
Duration histogram	`meter.create_histogram(...)` + `.record(...)` + try/finally	Same `@metric` (RAII guard)
Prometheus /metrics endpoint	`Instrumentator().instrument(app).expose(app)`	`@server(prometheus=true)`
Structured JSON logs	`structlog.configure(...)` with 6 processors	`log.info("event", { ... })` built-in
Trace_id propagated to logs	Custom `inject_trace_context` processor	Automatic (task-local)
Secret redacted in logs	Manual `.get_secret_value()` with care	`Secret<T>` redacts auto
HTTP access log	FastAPIInstrumentor emits it	Auto-emitted with trace_id / span_id
Tail-based sampling	`TraceIdRatioBased`	`TraceIdRatioBased` (same)

Piece by piece

Auto-instrumented HTTP spans

In Fitz, every @get / @post / @put / @delete / @ws opens an OTel span with:

http.method (GET/POST/...)
http.target (the path template - /users/{id} not /users/42, low-cardinality friendly)
http.status_code when the span closes
duration_ms

And it emits an access log with the same fields + trace_id / span_id. Zero user code.

In Python you have to call FastAPIInstrumentor.instrument_app(app) and hope your version matches. If your user agent emits headers with non-ASCII characters, the old version of the lib used to panic - famous bug.

Trace_id propagated to custom logs

Inside the request's span:

@authenticated
@post("/orders")
async fn create_order(user: User, body: OrderIn) -> Result<Receipt> {
    log.info("order.received", { order_id: body.id, user_email: user.email })
    let receipt = process(body).await?
    log.info("order.ack", { receipt_id: receipt.id })
    return Ok(receipt)
}

Every log.info(...) inside the handler automatically includes:

{
  "timestamp": "2026-06-16T10:23:01.231Z",
  "level": "info",
  "msg": "order.received",
  "trace_id": "5e4f9b2c8a7d3e1f0b6c9a4d8e2f1a3b",
  "span_id": "a1b2c3d4e5f6a7b8",
  "order_id": 42,
  "user_email": "ada@example.com"
}

The trace_id matches the trace_id of the span in Jaeger/Tempo. By the 9.x.4 iter2.a close, when OTel is active, the trace_id in the logs is the exact same one as the OTel span - enabling cross-pipeline queries ("give me all logs whose trace_id matches this Jaeger span").

In Python you have to write the inject_trace_context processor by hand (~10 lines), add it to structlog's config, and validate that every handler uses structlog instead of the logging stdlib (because if anyone does import logging; logging.info(...) directly, those logs will NOT have the trace_id).

Metrics with one decorator

@metric(name="orders") automatically registers TWO metrics:

orders_calls_total - Counter, incremented at fn return.
orders_duration_seconds - Histogram, recorded at return (even if the fn panics, via RAII guard).

@trace(name="process_order")
@metric(name="orders")
async fn process(order: Order) -> Result<Receipt> {
    // process_order span closes at return
    // orders_calls_total += 1
    // orders_duration_seconds.observe(elapsed)
}

In Python with OTel, for the same effect:

orders_counter = meter.create_counter("orders_calls_total")
orders_histogram = meter.create_histogram("orders_duration_seconds", unit="s")

@tracer.start_as_current_span("process_order")
async def process(order: Order) -> Receipt:
    start = time.time()
    try:
        result = await actually_process(order)
        orders_counter.add(1)
        return result
    finally:
        orders_histogram.record(time.time() - start)

5× more code. And if you forget the finally, the histogram loses cases. Fitz's decorators guarantee cleanup via RAII.

Optional Prometheus endpoint

@server(8080, prometheus=true)
fn main() => 0

Auto-mounts GET /metrics on the same port, returning the Prometheus exposition format. No separate library. No defining the endpoint by hand. If the user declared their own @get("/metrics"), theirs wins - same pattern as /openapi.json / /healthz.

In Python: Instrumentator().instrument(app).expose(app) - it's fine, but it's another dep, another responsibility, another version to match with FastAPI.

OTLP export with one env var

# For Jaeger
export OTEL_EXPORTER_OTLP_ENDPOINT=http://jaeger:4318

# For Honeycomb (with headers)
export OTEL_EXPORTER_OTLP_ENDPOINT=https://api.honeycomb.io
export OTEL_EXPORTER_OTLP_HEADERS=x-honeycomb-team=abc123

# For Tempo
export OTEL_EXPORTER_OTLP_ENDPOINT=http://tempo:4318

# Tail-based sampling 10%
export OTEL_TRACES_SAMPLER_ARG=0.1

# Service name
export OTEL_SERVICE_NAME=myapp

Without the env var, zero overhead, zero network calls. The instrument runs as a no-op if no endpoint is declared. These env vars are the OpenTelemetry standard, not invented by Fitz. Compatible with any OTel backend (Jaeger, Tempo, Honeycomb, Datadog, NewRelic, Grafana Cloud, etc.).

Secret<T> auto-redacted in logs

This is my favorite feature. In Python:

log.info("auth.success", token=user.access_token)  # bug! token goes to Loki

The bug I've seen most in production code: someone logs the API key, the password, the JWT. The secret goes to Loki/Sentry/Datadog. Deleting production logs is a slow and painful operation.

In Fitz, Secret<T> redacts automatically in Display and JSON serialization:

let JWT_SECRET: Secret<Str> = secret("JWT_SECRET")
let user_token: Secret<Str> = generate_token(user)
log.info("auth.success", { token: user_token })
// → {"msg": "auth.success", "token": "***"}
print("JWT_SECRET = {JWT_SECRET}")
// → "JWT_SECRET = ***"

To expose the real value (when signing JWT, when hitting the DB):

let token = jwt.encode(claims, JWT_SECRET.expose())

.expose() is explicit, greppable. Code review can audit each call site in seconds. In Pydantic there's SecretStr but you have to remember to opt in at every place, and people forget. Fitz makes the safe version the default.

Design decisions worth understanding

@trace / @metric only on user fns, not on HTTP/WS

HTTP and WebSocket handlers already have auto-instrumentation with the request span. Stacking @trace on top of @get would be redundant and would create nested spans with no value. The checker rejects it with a clear message.

Auto-emitted access log

Every HTTP handler emits a log.info("http.access", ...) on return with http.method / http.target / http.status_code / duration_ms. No opt-in. If you want to disable it:

@server(8080, observability=false)
fn main() => 0

And the instrumentation wrapper is bypassed entirely.

Span context storage with task-local

SpanContext lives in a tokio::task_local!. Crosses thread boundaries in multi-thread runtime. No mutable globals, no race conditions.

What Fitz does NOT give you (yet)

Honesty about the residual debts from Phase 12.3 documented explicitly:

OTel logs bridge: log.X(...) go to stderr (with trace_id propagated). For them to ALSO go to the OTel log signal of the backend (correlated with spans there), you have to wait for sub-step 12.3.iter2.b - designed but not shipped.
OTel metrics bridge: the metrics that @metric emits dispatch to the Prometheus recorder when @server(prometheus=true). For them to also go to the OTel metrics signal (push to Honeycomb metrics, NewRelic metrics) you need to wait for the upstream crate metrics-exporter-opentelemetry release compatible with opentelemetry_sdk 0.32 (documented debt, not laziness on our part).
Tail-based sampling: only head-based (TraceIdRatioBased). For "export traces that had errors" or "sample by latency" you have to run OTel collector in the middle with that config.
Continuous profiling (pyroscope/pprof) - not integrated.
DB auto-instrumentation: the native ORM emits the executed SQL inside the calling handler's span. For queries via raw db.query(...), today no child span is created (minor debt).

Closing

Observability in Python is the area where "library-first" most visibly pays off poorly: each signal lives in a different lib, each lib has its own config, each handler needs boilerplate to use them, and keeping consistency across the three signals is your responsibility.

Fitz puts the three signals in the language, with auto-instrumentation for what every app needs (HTTP spans + access logs + metrics), optional decorators for custom spans and metrics, and zero-cost no-op when no backend is configured.

Read on DEV Community ↗ ← Back to News

Tracing, Prometheus metrics, and structured logs with two decorators: Fitz vs the OpenTelemetry setup in FastAPI