lubu labs
Back to Blog
LangSmith

Stop Deploying to Fix a Bad Prompt - Use Prompt Hub

Hardcoded prompts have no rollback path. LangSmith Hub gives you push/pull versioning, tag-based promotion, and runtime updates without redeployment.

Simon Budziak
Simon Budziak
CTO
Stop Deploying to Fix a Bad Prompt - Use Prompt Hub

A client's document analysis pipeline started producing malformed output on a Monday morning. The team had updated the system prompt over the weekend - tightening the output format specification, adding a clause about how to handle redacted sections, and shipped it in a routine deployment. No schema validation test failed. No unit test caught it. The pipeline ran, produced structured output, and the downstream consumer silently started writing garbage to the database because a JSON key had been renamed in the "improved" prompt.

Eight hours later, a scheduled reporting job failed with None values where there should have been obligation lists. An engineer traced it back to the prompt change. Rollback meant reverting the file, opening a PR, and waiting twenty minutes for the deployment pipeline. Twenty minutes to undo a three-word edit that had no audit trail, no diff, and no staged test environment.

In the previous post on prompt caching, we looked at stopping payment for repeated static context. And in the post before that on model.profile, we adapted context management to the active model's actual capabilities. This post closes the loop: treating the prompt itself as a versioned artifact. A hardcoded prompt string has the same structural problem as a hardcoded dependency version — no history, no diff, no rollback path that doesn't require a deployment.

The Problem: Prompts Treated as Config, Not Code

There are three patterns I've seen teams use to manage prompt strings in production, in increasing order of false confidence.

Hardcoded strings are the most common. The system prompt lives in the source file next to the chain invocation. Every change is a code change, every change is a deployment, and there's no signal when the output format breaks — the pipeline still runs, just wrong. Reviewers check logic; nobody reads multi-paragraph system prompts for semantic drift during a code review.

Environment variables feel like a step forward. The string moves to .env or a secrets manager, injected at deploy time. But the version history is the environment's history — no diff, no audit trail, and rollback means coordinating an env-var change and a new deployment in the right order. The prompt has been decoupled from the codebase, but it's been recoupled to the deployment process instead.

A database row goes further. The prompt lives in a table, fetched at runtime — now you can change the prompt without deploying code. This is real progress. But what's still missing is a commit model: immutable snapshots with mutable pointers. A database row can be overwritten. There's no staging/prod concept, no diff between versions, no promotion workflow. You know the current value; you don't know what it was before the last edit or whether anyone reviewed it.

What's missing in all three is what version control gave us for code: an immutable snapshot for every change, mutable pointers (branches, tags) that move between snapshots, and a promotion model where "prod" is a pointer that moves — not a value that gets overwritten.

LangSmith Hub: The Push/Pull Model

LangSmith Hub treats prompts as first-class versioned artifacts. A prompt is a named artifact — not a string, but a full ChatPromptTemplate with structured message roles, input variables, and format markers. You push it to the Hub; the Hub stores it.

Every push_prompt call creates an immutable commit: a hashed snapshot of the full prompt state. Old commits are never overwritten. You can pull any previous commit by its hash indefinitely.

Tags are mutable pointers to commits. document-analyzer:staging and document-analyzer:prod are two tags that can point to the same or different commits. Promoting a prompt from staging to prod means moving the prod tag pointer to the commit that staging already references — no content change, no deployment, no config file edit.

Key Takeaway: A commit hash is immutable; a tag is not. Pin to a tag in normal operation so your application automatically picks up promotions. Pin to a commit hash in eval pipelines where you need guaranteed immutability across runs — you don't want the prompt to change mid-evaluation.

Pushing and Pulling Prompts with the LangChain SDK

Pushing a prompt after you build the template:

python
from langsmith import Client
from langchain_core.prompts import ChatPromptTemplate
 
client = Client()
 
prompt: ChatPromptTemplate = ChatPromptTemplate.from_messages([
    ("system", "You are a senior legal analyst. Extract obligations, deadlines, "
               "and risk factors. Output must include: "
               "{obligations: list[str], deadlines: list[str], risks: list[str]}"),
    ("human", "{document_text}"),
])
 
client.push_prompt("document-analyzer", object=prompt, commit_tags=["staging"])

Every call creates a new commit. The staging tag moves forward to the new commit; the previous commit is preserved at its hash. No snapshot is ever lost.

If you're running a non-LangChain stack in part of your pipeline — a direct Anthropic call or an OpenAI client in a service that doesn't use LangChain — convert_prompt_to_anthropic_format() and convert_prompt_to_openai_format() deserialize a Hub prompt into the provider's native message format. The Hub becomes provider-agnostic storage; the conversion happens at the call site:

python
import anthropic
from langsmith import Client
from langsmith.client import convert_prompt_to_anthropic_format
 
ls_client = Client()
anthropic_client = anthropic.Anthropic()
 
# Pull the versioned prompt from Hub — works in any Python service
prompt = ls_client.pull_prompt("your-org/document-analyzer:prod")
 
# Convert to Anthropic's native format for direct SDK use
anthropic_payload = convert_prompt_to_anthropic_format(
    prompt,
    inputs={"document_text": contract_text},
)
 
response = anthropic_client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    **anthropic_payload,  # system + messages unpacked
)

Pulling at runtime:

python
from langchain import hub
from langchain_anthropic import ChatAnthropic
from langchain_core.prompts import ChatPromptTemplate
 
model = ChatAnthropic(model="claude-sonnet-4-6")
 
# Fetches the current commit at the 'prod' tag; cached in-process ~5 minutes
prompt: ChatPromptTemplate = hub.pull("your-org/document-analyzer:prod")
 
chain = prompt | model
result = chain.invoke({"document_text": contract_text})

hub.pull() maintains an in-process LRU cache — 100-prompt capacity, 5-minute TTL, with background refresh using stale-while-revalidate. The cached version is served immediately; the background fetch runs without blocking the call. Updates reach every running process within one TTL window, with zero latency penalty on the hot path.

Tag-Based Environment Promotion

The standard workflow: a new prompt is pushed with the staging tag. The team reviews it in the LangSmith UI — diffs between commits are human-readable there — and runs evaluation against the staging version. When it passes, the prod tag is moved to that commit. No code change. No deployment. The next hub.pull("...document-analyzer:prod") in any running process returns the new version within one TTL window.

Rollback is the same operation in reverse. Identify the last stable commit hash, move prod back to it:

python
from langsmith import Client
 
client = Client()
 
# Inspect recent commits to find the stable hash
commits = client.list_commits("your-org/document-analyzer")
for commit in commits[:5]:
    print(commit.commit_hash, commit.created_at, commit.tags)
 
# Rollback: move 'prod' tag to a previous stable commit
client.update_prompt("document-analyzer", tag="prod", commit_hash="<stable-hash>")

One SDK call. No deployment. No merge. The tag moves, and within one TTL window the fleet is on the previous version.

Warning: Tag movement is immediate in LangSmith but running instances serve the stale cached version until the TTL expires (default 5 min). If you need an instantaneous rollback, build a cache-invalidation endpoint or accept a process restart. For most incidents a 5-minute window is acceptable. For payment flows or compliance pipelines where a bad prompt means bad audit records, design your rollback SLA around this constraint explicitly.

In-Process Caching: What Stale-While-Revalidate Means in Practice

The LRU cache inside hub.pull() is what makes runtime prompt updates practical at production latency budgets. Without it, every inference call would need to check the LangSmith API — a network round-trip on every single prompt fetch. That's not viable.

Stale-while-revalidate means the cache serves the last known version immediately while a background thread fetches the latest commit for the tag. The calling thread never waits on the LangSmith network call after the first warm-up fetch. This is the same pattern CDNs use for content updates: serve now, update for next time.

The practical implication: when you move a tag — promote or rollback — the update propagates to all running instances within roughly 5 minutes, with no added latency during that window. Every in-flight request gets the current cached version; no request blocks waiting for the Hub API.

The cross-process caveat: different pods in a horizontally scaled deployment may refresh their caches at different offsets within the TTL window, depending on when each pod last fetched. For most workloads, a sub-5-minute staggered rollout across pods is fine. For cases where you need simultaneous cutover — a compliance pipeline where version drift between pods creates inconsistent audit records — pull by commit hash during the transition window, then switch back to tag-based pulling once all pods have converged.

Webhooks for CI/CD Automation

LangSmith can fire outbound webhooks on commit events. The POST payload includes the full prompt metadata: repo name, commit hash, tags, timestamp, and the prompt content itself. This is the hook point for closing the loop between prompt changes and your existing CI/CD infrastructure.

The pattern that works well in practice: every new staging commit triggers a webhook, the webhook kicks off a CI job that runs your eval suite against the new commit hash (pinned by hash, not tag, for reproducibility across the run), and a pass rate above a defined threshold automatically promotes the prod tag. The same gate you use for code changes now gates prompt changes.

python
from fastapi import FastAPI, Request
from langsmith import Client
 
app = FastAPI()
client = Client()
 
@app.post("/hooks/langsmith-prompt")
async def handle_prompt_commit(request: Request) -> dict:
    payload = await request.json()
    commit_hash = payload["commit_hash"]
    repo = payload["repo_name"]        # "your-org/document-analyzer"
    tags = payload.get("tags", [])
 
    if "staging" not in tags:
        return {"status": "skipped"}
 
    # Eval runs against pinned hash — not the tag — for reproducibility
    pass_rate = await run_eval_suite(repo=repo, commit_hash=commit_hash)
 
    if pass_rate >= 0.90:
        client.update_prompt(
            repo.split("/")[1], tag="prod", commit_hash=commit_hash
        )
        return {"status": "promoted", "commit": commit_hash, "pass_rate": pass_rate}
 
    return {"status": "blocked", "pass_rate": pass_rate}

Perspective: Most teams don't need CI/CD automation for prompts on day one. Start with manual tag promotion — that alone eliminates most rollback pain and removes the twelve-minute deployment from the rollback path. Add webhook-driven eval gates once you have a stable eval dataset. An eval suite with ten representative cases and a 90% pass threshold will catch output format regressions more reliably than any code review.

Takeaway

Three things that carry forward:

  • Tags are your deployment primitive. document-analyzer:prod is the only identifier that belongs in production application code. Never hardcode a commit hash in the application — that's for eval pipelines only.
  • Push/pull decouples prompt iteration from deployments. A broken output format can be fixed in minutes without touching the codebase, opening a PR, or waiting on a pipeline. The rollback path is one SDK call.
  • The cache is your friend and your gotcha. Zero latency overhead for prompt updates, but rollbacks are eventual within the TTL window — design your SLAs around that, especially for compliance-sensitive pipelines where version consistency across pods matters.

Dealing with prompt drift in a production agent? Let's talk.

Sources & Further Reading

Ready to Transform Your Business?

Let's discuss how Lubu Labs can help you leverage AI to drive growth and efficiency.