Your AI Stack is Publicly Searchable

A practical Shodan recon guide for every component in the modern ML serving pipeline from model servers to vector databases to notebook…

Tanmay Bhattacharjee

~6 min read · May 3, 2026 (Updated: May 3, 2026) · Free: No

A practical Shodan recon guide for every component in the modern ML serving pipeline from model servers to vector databases to notebook environments.

The modern AI stack is sprawling. A production ML deployment no longer means one model behind one API it's a constellation of model servers, vector databases, experiment trackers, notebook environments, object stores, and GPU orchestration layers, each running its own HTTP or gRPC service, each with its own default port, and each trivially discoverable by anyone with a Shodan account.

This is not a theoretical concern. Run http.title:"Jupyter" port:8888 -http.html:"token" right now. The results are as bad as you expect.

Every component that ships with a built-in health endpoint, a REST API, or a metrics scraper is also a potential reconnaissance target. The question isn't whether your ML infrastructure is fingerprint-able — it is. The question is whether you know about it before an attacker does.

This guide is organized by service. For each component we cover the default ports, what the unauthenticated attack surface looks like, and the Shodan queries that find exposed instances in the wild. Use this offensively for authorized recon and defensively to audit your own exposure.

Responsible use: These queries are for authorized security assessments, bug bounty recon on in-scope targets, and defensive hardening of infrastructure you own or are contracted to protect. Unauthorized access to computer systems is illegal. If you run these against your own org and find exposure, treat it as a P0.

The Attack Surface at a Glance

Here's every service in the modern ML stack, its default ports, and the exposed endpoints that matter for recon:

Note the critical-to-medium gradient. Model servers and vector databases matter most they're where your IP lives. But don't underestimate Prometheus endpoints: /metrics leaks model names, inference counts, latency percentiles, and sometimes hostnames that enumerate your entire backend.

Model Serving Servers

These are the highest-value targets. An unauthenticated model server doesn't just leak information it lets an attacker invoke your models for free, extract architecture details, and in some cases download weights directly.

NVIDIA Triton — ports 8000, 8001, 8002

Triton ships with no authentication by default. The /v2/health/ready endpoint is a trivial canary if it returns 200, the server is li,ve and everything below it is accessible. The /v2/models endpoint lis all loaded models, including their names, versions on, and input/output tensor shapes.

NVIDIA Triton

http.title:"Triton Inference Server" port:8000

http.html:"/v2/health/ready" port:8000

http.html:"/v2/models" port:8000

port:8001 product:"grpc" http.html:"triton"

port:8002 http.html:"nv_inference_request_success"

The Prometheus endpoint on 8002 is worth noting separately. It leaks nv_inference_request_success, nv_inference_request_duration_us, and per-model GPU memory usage. From a metrics scrape alone you can enumerate every model name and gauge production request volume.

TensorFlow Serving — ports 8500, 8501

TensorFlow Serving exposes a REST API on 8501 and gRPC on 8500. The REST /v1/models/<name> endpoint returns model metadata and version status. If you don't know the model name, /v1/models (without a name) often enumerates them anyway.

TensorFlow Serving

http.html:"/v1/models/" port:8501

http.html:"model_version_status" port:8501

port:8500 product:"grpc"

product:"TensorFlow Serving"

TorchServe — ports 8080, 8081, 8082

TorchServe's architecture is unusually dangerous: the inference port (8080) and the management API (8081) are separate services. The management API allows registering, scaling, and unregistering models with no auth required by default. Port 8082 serves Prometheus metrics.

TorchServe

port:8081 http.html:"/models"

http.html:"defaultWorkers" port:8081

http.html:"Healthy" port:8080 http.html:"torchserve"

port:8082 http.html:"ts_inference_requests_total"

An exposed TorchServe management port isn't just an information disclosure it's remote model deployment. An attacker can push a malicious model archive (.mar file) to a listening management API. Plan accordingly.

LLM Runtimes

Ollama and vLLM are the two dominant local/self-hosted LLM serving runtimes. Both expose OpenAI-compatible APIs with no authentication by default — by design, for ease of local use. This design decision doesn't survive the internet.

Ollama port 11434

Ollama is installed by millions of developers to run models locally. When OLLAMA_HOST is set to 0.0.0.0 — which the docs explicitly show for remote access — the API is exposed on all interfaces. /api/tags lists every downloaded model. /api/generate lets you run them.

Ollama

port:11434 http.html:"ollama"

http.html:"/api/tags" port:11434

http.html:'"models"' port:11434

http.html:"/api/generate" port:11434

product:"Ollama" port:11434

vLLM port 8000

vLLM's OpenAI-compatible API on port 8000 is designed for high-throughput LLM serving. Exposed instances hand over /v1/chat/completions and /v1/completions to anyone who asks. On a powerful GPU box, the financial impact of this is immediate — someone else is running inference on your dime.

vLLM

http.html:"/v1/models" port:8000 http.html:"vllm"

http.html:"/v1/chat/completions" port:8000

http.html:'"object":"list"' port:8000 http.html:"model"

http.html:"/health" port:8000 http.html:"vllm"

Vector Databases

Vector databases are the backbone of every RAG pipeline. Exposed instances leak your embedding collections which often contain reconstructable representations of your proprietary documents, customer data, or training corpus.

Qdrant ports 6333, 6334

Qdrant

http.html:"/collections" port:6333

http.title:"Qdrant" port:6333

http.html:"qdrant" port:6333 http.html:"result"

http.html:'"version":' http.html:"qdrant" port:6333

Weaviate port 8080

Weaviate's /v1/schema endpoint exposes the full schema of every collection, including class names, properties, and vectorizer configuration. This is enough to understand the entire knowledge graph being served.

Weaviate

http.html:"/v1/schema" port:8080 http.html:"weaviate"

http.html:"weaviate" http.html:"graphql" port:8080

http.html:"/v1/meta" port:8080 http.html:"weaviate"

Milvus port 19530

Milvus communicates over gRPC on 19530. While Shodan can't read gRPC payloads, the open port combined with the Attu web UI (often co-deployed) is the reliable fingerprint.

Milvus

port:19530 product:"grpc"

http.title:"Attu" http.html:"milvus"

product:"Milvus" port:19530

ML Infrastructure

MLflow port 5000

MLflow tracks experiments, stores model artifacts, and maintains the model registry. An exposed MLflow instance is a complete audit trail of your ML development process: every training run, every hyperparameter, every metric, and links to every artifact. The model registry endpoint can also be used to download registered model versions directly.

MLflow

http.title:"MLflow" port:5000

http.html:"/api/2.0/mlflow/experiments/search" port:5000

http.html:"/api/2.0/mlflow/registered-models" port:5000

http.html:"mlflow" http.html:"artifact" port:5000

Ray port 8265

The Ray dashboard exposes cluster state, job queues, actor graphs, and resource allocation for the entire distributed compute cluster. The /api/jobs/ endpoint lists running and completed jobs including their entrypoints (which often contain paths, environment variables, and configuration).

Ray

http.title:"Ray Dashboard" port:8265

http.html:"/api/jobs/" port:8265

http.html:"/api/serve/applications/" port:8265

Data Environments

Jupyter Notebook port 8888

Jupyter is the most dangerous entry on this list by impact-to-prevalence ratio. An unauthenticated Jupyter instance is full remote code execution on the host. The /api/kernels endpoint lets you create and attach to Python kernels. The /api/contents endpoint exposes the entire filesystem tree from the working directory. There is no meaningful security boundary here.

Jupyter Notebook

http.title:"Jupyter" port:8888 -http.html:"password"

http.title:"Home Page — Select or create a notebook"

http.html:"/api/kernels" port:8888

http.title:"JupyterLab" port:8888

http.html:"jupyter" port:8888 -http.html:"token"

MinIO ports 9000, 9001

MinIO is the S3-compatible object store most teams use for model artifacts, training data, and checkpoint storage. Anonymous bucket listing is its own data breach — you can enumerate every object path without credentials. The console on 9001 uses default credentials (minioadmin:minioadmin) surprisingly often in dev environments that drifted to production.

MinIO

http.title:"MinIO" port:9001

port:9000 http.html:"<ListBucketResult>"

http.html:"minio" port:9000 http.html:"ListBuckets"

product:"MinIO" port:9000

Compound Dorks for Broad Sweeps

For wide-net recon or internal auditing across cloud ASNs, these compound queries cover the full stack efficiently:

Unauthenticated LLM endpoints on major cloud providers

http.html:"/v1/models" http.html:"object" org:"Amazon" OR org:"Google" OR org:"Microsoft"

Jupyter — no token, no password (RCE exposure)

http.title:"Jupyter" port:8888 -http.html:"token" -http.html:"password"

Prometheus scrape endpoints leaking ML metrics

http.html:"# TYPE" http.html:"inference" http.html:"latency" -org:"monitoring"

Ollama instances with models loaded

port:11434 http.html:'"models"' http.html:'"name"'

Shodan Operator Reference

Combine these modifiers with any dork above to narrow or broaden scope:

Run the dorks. Then fix what you find.

Thanks

#reconnaissance #ai-security #cybersecurity #bugbounty-writeup

< Go to the original