The "You Let It Run Forever" Problem!
AI usage isn't "free." It's compute, memory, tokens, time, and money. Unbounded consumption happens when an AI system is allowed to process requests without strict limits, meaning attackers (or even normal users) can drain your resources, overload your service, spike your bills, or extract far more output than you intended.
This is the cybersecurity version of: "Nobody broke in… we left the meter running."

It happens when AI systems don't enforce:
- Rate limits (requests per minute/hour/day)
- Token caps (input/output size)
- Timeouts (how long a task is allowed to run)
- Cost budgets (per user / per tenant / per org)
- Tool boundaries (what the AI can repeatedly call)
And just like that… your GenAI becomes a cost-amplifier with a public "RUN" button.
Common Unbounded Consumption Scenarios!
- Uncontrolled Input Size [The "Here's a book… summarize it" attack]
Users send massive prompts, huge documents, giant images, or long multi-file queries that trigger heavy processing.
📦📚 → 🤖🔥 → 💸
Huge input → huge compute → huge billWhy it works: the system tries to be helpful… until it goes bankrupt.
2. Repeated Requests [The slow-motion DoS]
High-volume requests overwhelm the system — even if each request is "valid."
🔁🤖✅ → 🧯⚠️ → 🚫
Repeated calls → overload → service degradationWhy it works: it's not one bad request… It's 10,000 normal ones.
3. Resource-Intensive Queries [The "make it think harder" problem]
Carefully crafted prompts force the most expensive behavior: deep reasoning, tool loops, long retrieval chains, and multi-step generation.
🧠⚙️ → 🧨⏳ → 💥
Expensive logic → long runtime → system strainWhy it works: complex tasks scale costs faster than they scale value.
4. Denial of Wallet (DoW) [The bill is the attack]
Pay-per-token or pay-per-call systems are drained aggressively until budget limits are exceeded.
💳🤖✅ → 📈💸 → 😵
AI runs → costs spike → finance calls youWhy it works: the attacker doesn't need downtime… only your invoice.
5. Functional Model Replication [The "print the model slowly" trick]
Attackers extract enough output over time to approximate model behavior, logic patterns, or proprietary reasoning.
🧠🧪 → 📤📤📤 → 🧬
Many queries → many answers → imitation possibleWhy it works: unlimited access becomes unlimited sampling.
6. Side-Channel Attack via Input Filtering Bypass [The "observe the AI's shape" attack]
Attackers use repeated probing to infer internal behaviors — guardrails, thresholds, hidden system prompts, or model constraints.
🕳️🔎 → 🤖🧠 → 📡
Probe repeatedly → observe differences → extract behaviorWhy it works: the AI leaks patterns even when it doesn't leak data.
Ans here are how attackers turn your GenAI into a paid subscription… for them:
- AML.T0034 — Cost Harvesting
- AML.T0000 — ML Model Access
- AML.T0024 — Exfiltration via ML Inference API
- AML.T0025 — Exfiltration via Cyber Means
- AML.T0029 — Denial of ML Service
What Can Go Wrong?
- Service outages
- Severe cost overruns
- Degraded performance and availability
- Intellectual property theft
- Loss of customer trust
Because when AI is unlimited… the blast radius isn't "wrong answers." It's your operations and your budget.
High-Level Mitigations
- Enforce strict input size and complexity limits
- Apply rate limiting and per-user quotas
- Use timeouts and throttling
- Monitor usage and detect anomalies
- Restrict expensive operations (reasoning depth, multi-tool loops)
- Separate high-cost operations from public access
- Design for graceful degradation under load
- Set hard budget caps per tenant/org / API key
Golden rule: ✅ The AI can respond. ❌ But it should never be allowed to respond forever. Not for free.
Practice, Test, Learn in public, and Share what actually works … daily and free. Ite me :)
Nothing Cyber — A free space for hands-on learning and skill development. If that's useful to you, feel free to follow along and share!