Logfire vs LangSmith for AI Agents — The Pragmatic Playbook

August 1, 2025 — #Logfire #LangSmith #LangGraph #Evaluation #Observability

Hot take: you don’t need Yet Another Dashboard—you need to ship reliable agents. That means two things: (1) observability that sees beyond the LLM call (HTTP, DB, queues, retries), and (2) evaluation that tells you if your chain/graph/agent is getting better or worse.

Enter Pydantic Logfire and LangSmith. They’re not rivals so much as complimentary lenses on the same beast. Below is a field guide tuned for AI agents and LLM applications (LangGraph, LangChain, Pydantic‑AI, OpenAI Agents, bespoke tool stacks)—with copy‑paste snippets and zero fluff.

Logfire

The elevator pitch (for agent builders)

Logfire: an OpenTelemetry‑native observability platform from the Pydantic team. It auto‑instruments Python apps (FastAPI/HTTPX/SQLAlchemy), LLM calls (OpenAI, images, embeddings, streaming), OpenAI Agents, and Pydantic‑AI. You get traces, logs, and metrics in one place—so you can follow the hop from request → tool → DB → LLM → response.
LangSmith: a LLM‑native testing + tracing platform by LangChain. It’s where datasets, evaluators (LLM‑as‑judge + rules), annotation queues, run trees, and LangGraph/LangChain traces live. Recently it added OpenTelemetry ingestion, so you can unify system + LLM telemetry when you want to.

TL;DR: Use Logfire to see how your system behaves. Use LangSmith to prove your agent behaves. Use both if you own production.

How they map to an agentic stack

Agentic_Map

Agent runtime coverage: Both capture LLM calls; Logfire also shows system spans (HTTP/DB/etc.). LangSmith shows chain/graph structure, inputs/outputs, scores, and dataset runs.
OTel story: Logfire is built on OTel end‑to‑end. LangSmith can ingest OTel so you can correlate traces across your stack and your evals.

Quick wins you can paste

Instrument OpenAI (and Agents) with Logfire

# pip install logfire openai
import logfire
from openai import OpenAI

client = OpenAI()
logfire.configure(service_name="orders")

# Trace OpenAI SDK calls (chat, responses, embeddings, images, with/without streaming)
logfire.instrument_openai(client)

# If you’re using OpenAI's Agents SDK, add this too
logfire.instrument_openai_agents()

# ...your agent code...

Trace a LangGraph tool/chain with LangSmith

# pip install langsmith openai
from langsmith import traceable
from openai import OpenAI

llm = OpenAI()

@traceable(run_type="tool", name="summarize")
def summarize(x: str) -> str:
    r = llm.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": f"Summarize: {x}"}],
        temperature=0,
    )
    return r.choices[0].message.content

# Use inside your LangGraph node; runs will appear in LangSmith with full inputs/outputs.

Send OpenTelemetry traces into LangSmith (optional, powerful)

# Minimal sketch: export OTel spans from your app to LangSmith’s OTel endpoint
# (see LangSmith docs for the current OTLP endpoint & auth instructions)
from opentelemetry import trace
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor

provider = TracerProvider()
provider.add_span_processor(BatchSpanProcessor(OTLPSpanExporter(endpoint="<langsmith-otlp-endpoint>", headers={"x-api-key": "<token>"})))
trace.set_tracer_provider(provider)

The reality check: where each shines for agents

Logfire strengths for agents

Whole‑system context: follows the request across HTTP, DB, cache, file I/O, background jobs, plus the LLM call, in one trace.
Agent frameworks beyond LangChain: first‑class hooks for OpenAI Agents and Pydantic‑AI—handy if your stack isn’t LangChain‑centric.
Operational signals: spans + logs + metrics in one place. Easier incident reviews and SLOs.

LangSmith strengths for agents

Evaluation workflow: datasets (goldens), LLM‑as‑judge and heuristic evaluators, annotation queues, experiment reports.
Agent/graph introspection: run trees for LangGraph/LangChain, token/latency/cost per step, prompt/version diffs.
CI‑friendliness: gate deploys on dataset runs; compare branches/runs; keep human feedback attached to runs.

Rule of thumb: If your question is “why did the agent choose that tool?” → LangSmith. If it’s “why is the request slow/expensive?” → Logfire.

Evals: who does what for LLM apps

LangSmith brings the eval loop to you: upload a dataset, run your chain/graph, score with built‑ins (faithfulness/relevancy/criteria) or your own judge, and triage with annotation queues.
Logfire is about observability. For quality evals, pair with your favorite evaluator (e.g., Pydantic‑AI’s eval helpers, Phoenix/Ragas). You still correlate the scores with latency/cost via Logfire traces.

CI gate example (LangSmith dataset run)

# sketch: fail build if TSR < 0.8 in last dataset run
python scripts/run_langsmith_eval.py --dataset agents_regression
python scripts/assert_score.py --metric TSR --gte 0.8

A candid, agent‑focused comparison

When to anchor on Logfire

You run FastAPI/HTTPX/SQLAlchemy or other Python infra around your agent, and you want a single pane for everything.
You’re using Pydantic‑AI or OpenAI Agents and want native tracing without extra glue.
Your ops team speaks OpenTelemetry and wants to keep options open (export anywhere via OTel Collector).

When to anchor on LangSmith

Your agents are built with LangGraph/LangChain, and you need datasets + evaluators + human review tightly coupled with traces.
You want run‑by‑run scores (task success, groundedness, style, etc.) and easy prompt/version comparisons.
You plan to block releases on eval regressions.

When to run both (most teams at scale)

Use Logfire for system & cost/latency tracing across your microservices.
Use LangSmith for the agent quality loop (datasets, judges, annotation queues) and to peek inside graph steps.
Optionally pipe OTel from your app into LangSmith to align run IDs and make incident triage trivial.

Tiny recipes

Pydantic‑AI agent + Logfire

# pip install pydantic-ai logfire
from pydantic_ai import Agent
import logfire

logfire.configure(service_name="policy-agent")
logfire.instrument_pydantic_ai()  # auto-trace Pydantic‑AI agents

policy = Agent("You extract policy details from PDFs and return JSON.")
# ...tool defs...
result = policy.run("What is the refund window in doc.pdf?")

LangGraph supervisor + LangSmith evals

# In tests/: create a small dataset and score with LangSmith evaluators
# Then add a build step that fails if the score drops.

OpenAI Agents + Logfire + OTel to LangSmith

# Trace the agent session in Logfire and also forward OTel spans to LangSmith
logfire.instrument_openai_agents()
# Configure an OTLP exporter (see above) pointing at LangSmith

Gotchas & guardrails

Noisy spans: scope your instrumentation; don’t wrap hot loops. Group spans logically (tool, retrieval, synthesis).
Eval drift: freeze datasets and record model/params with each run. Re‑run goldens on prompt or tool changes.
Privacy: use provider redaction/scrubbing; avoid dropping raw PII into traces.
Cost math: emit per‑run token + latency counters; watch p95 and 99.

Decision cheat‑sheet (bookmark this)

Shipping a LangGraph agent to prod? Start with LangSmith for evals/trace. Add Logfire when you wire real tools/infra.
Building agents without LangChain (Pydantic‑AI/OpenAI Agents/custom)? Start with Logfire; bring in LangSmith once you formalize evals.
SREs asking for one telemetry standard? Keep OTel central. Logfire is OTel‑native; LangSmith ingests OTel—so you can correlate everywhere.