Tracing, Prometheus metrics, and structured logs with two decorators: Fitz vs the OpenTelemetry setup in FastAPI
Tracing, Prometheus metrics, and structured logs with two decorators: Fitz vs the OpenTelemetry setup in FastAPI
For full observability in FastAPI you need 6 pip packages + 60 lines of config + manual glue between logs/spans/metrics. In Fitz it's two decorators and an env var. With trace_id auto-correlated between logs and spans, and Secret<T> redacted in logs without thinking.
The stack every "production-ready" app ends up gluing together
Your app grows. The client wants to know which endpoint is slow, how many requests failed in the last hour, and why a specific user saw an error at 3 AM. We're talking about the Sacred Triangle of observability: traces, metrics, logs.
In 2026 the industry answer is OpenTelemetry for all three. In Python with FastAPI:
pip install opentelemetry-distro[otlp] \
opentelemetry-instrumentation-fastapi \
opentelemetry-instrumentation-sqlalchemy \
opentelemetry-instrumentation-requests \
opentelemetry-exporter-otlp-proto-grpc \
prometheus-fastapi-instrumentator \
structlog
observability.py (~60 lines):
import os
import logging
from opentelemetry import trace, metrics
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.exporter.otlp.proto.grpc.metric_exporter import OTLPMetricExporter
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
from opentelemetry.instrumentation.sqlalchemy import SQLAlchemyInstrumentor
from opentelemetry.sdk.trace.sampling import TraceIdRatioBased
import structlog
from prometheus_fastapi_instrumentator import Instrumentator
SERVICE_NAME = os.environ.get("OTEL_SERVICE_NAME", "myapp")
OTLP_ENDPOINT = os.environ.get("OTEL_EXPORTER_OTLP_ENDPOINT")
SAMPLE_RATIO = float(os.environ.get("OTEL_TRACES_SAMPLER_ARG", "1.0"))
def setup_observability(app, engine):
resource = Resource.create({"service.name": SERVICE_NAME})
if OTLP_ENDPOINT:
trace_provider = TracerProvider(
resource=resource,
sampler=TraceIdRatioBased(SAMPLE_RATIO),
)
trace_provider.add_span_processor(
BatchSpanProcessor(OTLPSpanExporter(endpoint=OTLP_ENDPOINT))
)
trace.set_tracer_provider(trace_provider)
metric_reader = PeriodicExportingMetricReader(
OTLPMetricExporter(endpoint=OTLP_ENDPOINT)
)
meter_provider = MeterProvider(
resource=resource,
metric_readers=[metric_reader]
)
metrics.set_meter_provider(meter_provider)
FastAPIInstrumentor.instrument_app(app)
SQLAlchemyInstrumentor().instrument(engine=engine)
Instrumentator().instrument(app).expose(app, endpoint="/metrics")
# Structured logs with trace_id
structlog.configure(
processors=[
structlog.contextvars.merge_contextvars,
structlog.processors.add_log_level,
structlog.processors.TimeStamper(fmt="iso"),
structlog.processors.dict_tracebacks,
inject_trace_context, # custom, see below
structlog.processors.JSONRenderer(),
],
wrapper_class=structlog.make_filtering_bound_logger(logging.INFO),
cache_logger_on_first_use=True,
)
def inject_trace_context(logger, method_name, event_dict):
span = trace.get_current_span()
if span and span.get_span_context().is_valid:
ctx = span.get_span_context()
event_dict["trace_id"] = format(ctx.trace_id, "032x")
event_dict["span_id"] = format(ctx.span_id, "016x")
return event_dict
Plus usage in handlers:
import structlog
from opentelemetry import trace, metrics
log = structlog.get_logger()
tracer = trace.get_tracer(__name__)
meter = metrics.get_meter(__name__)
orders_counter = meter.create_counter("orders_calls_total")
orders_histogram = meter.create_histogram("orders_duration_seconds", unit="s")
@app.post("/orders")
async def process_order(body: OrderIn):
with tracer.start_as_current_span("process_order") as span:
span.set_attribute("order.id", body.id)
start = time.time()
try:
log.info("order.processing", order_id=body.id, total=body.total)
receipt = await actually_process(body)
orders_counter.add(1, {"status": "success"})
log.info("order.processed", receipt_id=receipt.id)
return receipt
except Exception as e:
orders_counter.add(1, {"status": "error"})
log.error("order.failed", order_id=body.id, error=str(e))
raise
finally:
orders_histogram.record(time.time() - start)
Eight pip installs. ~60 lines of setup. ~25 lines per handler to trace + meter + log. Manual connection between the three signals. And watch out: if you forget to call FastAPIInstrumentor.instrument_app(app), no HTTP spans. If you forget SQLAlchemyInstrumentor, the DB doesn't show. If structlog doesn't have the inject_trace_context processor, logs won't correlate with spans.
The same thing in Fitz
@server(8080, prometheus=true)
fn main() => 0
@trace(name="process_order")
@metric(name="orders")
async fn process_order(body: OrderIn) -> Result<Receipt> {
log.info("order.processing", { order_id: body.id, total: body.total })
let receipt = actually_process(body).await?
log.info("order.processed", { receipt_id: receipt.id })
return Ok(receipt)
}
@post("/orders")
async fn create_order(body: OrderIn) -> Result<Receipt> {
return process_order(body).await
}
Activate OTLP export:
export OTEL_EXPORTER_OTLP_ENDPOINT=http://jaeger:4318
fitz run main.fitz
That's it.
The raw table
| Item | Python (OTel + structlog + 6 libs) | Fitz |
|---|---|---|
| Initial setup | ~60 LoC + 6 pip installs | @server(prometheus=true) |
| HTTP span per request | FastAPIInstrumentor.instrument_app(app) |
Auto-instrumented |
| Custom span over fn | with tracer.start_as_current_span("X") |
@trace(name="X") |
| Call counter | meter.create_counter(...) + .add(1) |
@metric(name="X") (includes duration histogram) |
| Duration histogram | meter.create_histogram(...) + .record(...) + try/finally |
Same @metric (RAII guard) |
| Prometheus /metrics endpoint | Instrumentator().instrument(app).expose(app) |
@server(prometheus=true) |
| Structured JSON logs | structlog.configure(...) with 6 processors |
log.info("event", { ... }) built-in |
| Trace_id propagated to logs | Custom inject_trace_context processor |
Automatic (task-local) |
| Secret redacted in logs | Manual .get_secret_value() with care |
Secret<T> redacts auto |
| HTTP access log | FastAPIInstrumentor emits it | Auto-emitted with trace_id / span_id |
| Tail-based sampling | TraceIdRatioBased |
TraceIdRatioBased (same) |
Piece by piece
Auto-instrumented HTTP spans
In Fitz, every @get / @post / @put / @delete / @ws opens an OTel span with:
http.method(GET/POST/...)http.target(the path template -/users/{id}not/users/42, low-cardinality friendly)http.status_codewhen the span closesduration_ms
And it emits an access log with the same fields + trace_id / span_id. Zero user code.
In Python you have to call FastAPIInstrumentor.instrument_app(app) and hope your version matches. If your user agent emits headers with non-ASCII characters, the old version of the lib used to panic - famous bug.
Trace_id propagated to custom logs
Inside the request's span:
@authenticated
@post("/orders")
async fn create_order(user: User, body: OrderIn) -> Result<Receipt> {
log.info("order.received", { order_id: body.id, user_email: user.email })
let receipt = process(body).await?
log.info("order.ack", { receipt_id: receipt.id })
return Ok(receipt)
}
Every log.info(...) inside the handler automatically includes:
{
"timestamp": "2026-06-16T10:23:01.231Z",
"level": "info",
"msg": "order.received",
"trace_id": "5e4f9b2c8a7d3e1f0b6c9a4d8e2f1a3b",
"span_id": "a1b2c3d4e5f6a7b8",
"order_id": 42,
"user_email": "ada@example.com"
}
The trace_id matches the trace_id of the span in Jaeger/Tempo. By the 9.x.4 iter2.a close, when OTel is active, the trace_id in the logs is the exact same one as the OTel span - enabling cross-pipeline queries ("give me all logs whose trace_id matches this Jaeger span").
In Python you have to write the inject_trace_context processor by hand (~10 lines), add it to structlog's config, and validate that every handler uses structlog instead of the logging stdlib (because if anyone does import logging; logging.info(...) directly, those logs will NOT have the trace_id).
Metrics with one decorator
@metric(name="orders") automatically registers TWO metrics:
orders_calls_total- Counter, incremented at fn return.orders_duration_seconds- Histogram, recorded at return (even if the fn panics, via RAII guard).
@trace(name="process_order")
@metric(name="orders")
async fn process(order: Order) -> Result<Receipt> {
// process_order span closes at return
// orders_calls_total += 1
// orders_duration_seconds.observe(elapsed)
}
In Python with OTel, for the same effect:
orders_counter = meter.create_counter("orders_calls_total")
orders_histogram = meter.create_histogram("orders_duration_seconds", unit="s")
@tracer.start_as_current_span("process_order")
async def process(order: Order) -> Receipt:
start = time.time()
try:
result = await actually_process(order)
orders_counter.add(1)
return result
finally:
orders_histogram.record(time.time() - start)
5× more code. And if you forget the finally, the histogram loses cases. Fitz's decorators guarantee cleanup via RAII.
Optional Prometheus endpoint
@server(8080, prometheus=true)
fn main() => 0
Auto-mounts GET /metrics on the same port, returning the Prometheus exposition format. No separate library. No defining the endpoint by hand. If the user declared their own @get("/metrics"), theirs wins - same pattern as /openapi.json / /healthz.
In Python: Instrumentator().instrument(app).expose(app) - it's fine, but it's another dep, another responsibility, another version to match with FastAPI.
OTLP export with one env var
# For Jaeger
export OTEL_EXPORTER_OTLP_ENDPOINT=http://jaeger:4318
# For Honeycomb (with headers)
export OTEL_EXPORTER_OTLP_ENDPOINT=https://api.honeycomb.io
export OTEL_EXPORTER_OTLP_HEADERS=x-honeycomb-team=abc123
# For Tempo
export OTEL_EXPORTER_OTLP_ENDPOINT=http://tempo:4318
# Tail-based sampling 10%
export OTEL_TRACES_SAMPLER_ARG=0.1
# Service name
export OTEL_SERVICE_NAME=myapp
Without the env var, zero overhead, zero network calls. The instrument runs as a no-op if no endpoint is declared. These env vars are the OpenTelemetry standard, not invented by Fitz. Compatible with any OTel backend (Jaeger, Tempo, Honeycomb, Datadog, NewRelic, Grafana Cloud, etc.).
Secret<T> auto-redacted in logs
This is my favorite feature. In Python:
log.info("auth.success", token=user.access_token) # bug! token goes to Loki
The bug I've seen most in production code: someone logs the API key, the password, the JWT. The secret goes to Loki/Sentry/Datadog. Deleting production logs is a slow and painful operation.
In Fitz, Secret<T> redacts automatically in Display and JSON serialization:
let JWT_SECRET: Secret<Str> = secret("JWT_SECRET")
let user_token: Secret<Str> = generate_token(user)
log.info("auth.success", { token: user_token })
// → {"msg": "auth.success", "token": "***"}
print("JWT_SECRET = {JWT_SECRET}")
// → "JWT_SECRET = ***"
To expose the real value (when signing JWT, when hitting the DB):
let token = jwt.encode(claims, JWT_SECRET.expose())
.expose() is explicit, greppable. Code review can audit each call site in seconds. In Pydantic there's SecretStr but you have to remember to opt in at every place, and people forget. Fitz makes the safe version the default.
Design decisions worth understanding
@trace / @metric only on user fns, not on HTTP/WS
HTTP and WebSocket handlers already have auto-instrumentation with the request span. Stacking @trace on top of @get would be redundant and would create nested spans with no value. The checker rejects it with a clear message.
Auto-emitted access log
Every HTTP handler emits a log.info("http.access", ...) on return with http.method / http.target / http.status_code / duration_ms. No opt-in. If you want to disable it:
@server(8080, observability=false)
fn main() => 0
And the instrumentation wrapper is bypassed entirely.
Span context storage with task-local
SpanContext lives in a tokio::task_local!. Crosses thread boundaries in multi-thread runtime. No mutable globals, no race conditions.
What Fitz does NOT give you (yet)
Honesty about the residual debts from Phase 12.3 documented explicitly:
- OTel logs bridge:
log.X(...)go to stderr (withtrace_idpropagated). For them to ALSO go to the OTel log signal of the backend (correlated with spans there), you have to wait for sub-step 12.3.iter2.b - designed but not shipped. - OTel metrics bridge: the metrics that
@metricemits dispatch to the Prometheus recorder when@server(prometheus=true). For them to also go to the OTel metrics signal (push to Honeycomb metrics, NewRelic metrics) you need to wait for the upstream cratemetrics-exporter-opentelemetryrelease compatible withopentelemetry_sdk 0.32(documented debt, not laziness on our part). - Tail-based sampling: only head-based (
TraceIdRatioBased). For "export traces that had errors" or "sample by latency" you have to run OTel collector in the middle with that config. - Continuous profiling (pyroscope/pprof) - not integrated.
- DB auto-instrumentation: the native ORM emits the executed SQL inside the calling handler's span. For queries via raw
db.query(...), today no child span is created (minor debt).
Closing
Observability in Python is the area where "library-first" most visibly pays off poorly: each signal lives in a different lib, each lib has its own config, each handler needs boilerplate to use them, and keeping consistency across the three signals is your responsibility.
Fitz puts the three signals in the language, with auto-instrumentation for what every app needs (HTTP spans + access logs + metrics), optional decorators for custom spans and metrics, and zero-cost no-op when no backend is configured.
Comments
No comments yet. Start the discussion.