And what modern engineering teams are doing instead
If you manage more than a dozen servers, I'm willing to bet you have an SSH key problem. You might not know it yet — but somewhere in your infrastructure, there's a key that belongs to someone who left the company six months ago, still granting root access to a production database.
This isn't a theoretical concern. It's one of the most common findings in security audits, and it's the root cause of a surprising number of breaches that never make headlines.
Let's talk about why SSH keys rot, what the actual risks are, and what the engineering solution looks like.
The Anatomy of SSH Key Sprawl
SSH key authentication was a massive improvement over password-based access when it was introduced. Public-key cryptography, no shared secrets sent over the wire, no brute-forcing. Elegant.
But SSH keys were designed for a world where:
- Servers were named, not numbered
- Teams were small
- Infrastructure was static
- People stayed at companies for years
None of that is true anymore.
Here's what happens in practice at any company that's been operating for more than 18 months:
# A typical authorized_keys file on a production server
ssh-rsa AAAAB3NzaC1y... alice@laptop-2024
ssh-rsa AAAAB3NzaC1y... bob@workstation
ssh-rsa AAAAB3NzaC1y... deploy@ci-runner-old
ssh-rsa AAAAB3NzaC1y... charlie@unknown
ssh-rsa AAAAB3NzaC1y... root@jumpbox-legacy
ssh-ed25519 AAAAC3Nza... eve@personal-macbookWho is charlie@unknown? Is deploy@ci-runner-old still a valid service? Does eve@personal-macbook still work here?
Nobody knows. And nobody wants to remove them because the last time someone cleaned up authorized_keys, three deploys broke and an on-call engineer got locked out at 2 AM.
The Real Risks (Beyond Compliance Checkboxes)
1. Keys Never Expire
Unlike passwords, certificates, or tokens — SSH keys have no built-in expiry. A key generated in 2019 works identically in 2026. There's no mechanism in the SSH protocol to force rotation.
This means your attack surface only grows over time. Every key ever issued is potentially still valid.
2. Keys Are Unattributable at Scale
When Alice uses her key to SSH into a server, the server logs:
Accepted publickey for deploy from 10.0.1.42 port 54832 ssh2: RSA SHA256:nThb...That fingerprint tells you which key was used, but mapping fingerprints back to humans requires an external inventory you almost certainly don't maintain. If multiple people share access to a deploy user (common), attribution is impossible.
3. Private Keys Live in Uncontrolled Locations
SSH private keys live on:
- Developer laptops (encrypted disk? maybe)
- CI/CD runners (as secrets or mounted files)
- Configuration management tools
- Backup systems
- That one S3 bucket someone used for migration scripts in 2022
The private key is the credential. If it's compromised, it's compromised silently — there's no failed login attempt, no alert, no notification.
4. The Offboarding Problem Is Unsolvable at Scale
When someone leaves, you need to remove their public key from every authorized_keys file on every server they ever accessed. In practice:
- You don't have a complete list of which servers they accessed
- Some servers have keys deployed via Ansible/Puppet/Chef, others were added manually
- Some keys were shared across team members
- Some servers are in networks you don't fully control (staging accounts, partner infra)
The realistic offboarding story is: you disable their IdP account, revoke their VPN, and hope they didn't have direct SSH keys anywhere important.
The Engineering Solutions
There are three paths forward, depending on how much of the problem you want to solve:
Option 1: Certificate-Based SSH (Partial Fix)
OpenSSH has supported certificates since 5.4 (2010). Instead of distributing public keys to servers, you configure servers to trust a Certificate Authority. Users get short-lived certificates signed by the CA.
# Sign a user's key with a 4-hour validity
ssh-keygen -s /path/to/ca_key \
-I "alice@company" \
-n deploy,readonly \
-V +4h \
user_key.pubPros: Certificates expire. You don't need to manage authorized_keys files. Revocation is centralized.
Cons: You now need to manage a CA. Certificate issuance is a critical path — if the signing service is down, nobody can SSH. Principal management (-n flag) becomes its own access control system. And as CVE-2026-35414 recently proved, even OpenSSH's certificate parsing has had critical bugs lurking for 15 years.
Option 2: Bastion Hosts + Session Recording (Partial Fix)
Funnel all SSH access through a bastion that records sessions and enforces MFA:
Developer → MFA → Bastion → Target Server
↓
Session Recording
Audit LogPros: Single control point. Session recording for compliance. Can enforce MFA.
Cons: The bastion itself becomes a high-value target (see: CopyFail, Dirty Frag). Lateral movement is still possible once past the bastion. Key management on the bastion is still manual. Scaling bastions across multiple clouds/regions is painful.
Option 3: Identity-Based Ephemeral Access (Complete Fix)
The modern approach eliminates long-lived keys entirely:
- User authenticates via their identity provider (Okta, Google, Azure AD)
- Access is granted based on identity + policy (role, time, approval)
- A short-lived credential is issued — valid for minutes or hours
- The credential is automatically revoked when the session ends
- Every session is recorded and attributed to a real human identity
Developer → IdP Auth → Access Policy Engine → Ephemeral Credential → Target
↓
Auto-expires
Fully attributed
Session recordedNo permanent keys. No authorized_keys files. No offboarding problem. No unattributable access.
What Changes in Practice
Teams that move to ephemeral access report:
- Offboarding drops from hours to instant — disable IdP account, all infrastructure access is immediately revoked
- Audit answers go from "we think…" to "here's the recording" — every session is attributed and recorded
- Key rotation becomes a non-issue — there are no long-lived keys to rotate
- Least privilege becomes enforceable — access is scoped by role and time-limited by default
The tradeoff is operational: you're adding an access layer that sits between engineers and their servers. If it's poorly designed, it becomes friction. If it's well-designed, engineers don't notice it's there — they just ssh target-server and it works, with the auth happening transparently.
Implementation Considerations
If you're evaluating this kind of move, here are the questions that matter:
- Does it work with your IdP? — If it doesn't plug into Okta/Google/Azure AD natively, you'll end up managing another identity silo.
- What happens when the access service is down? — You need a break-glass path that doesn't reintroduce permanent keys.
- Can it handle non-human access? — CI/CD runners, cron jobs, service accounts. These are often the majority of SSH sessions.
- Is the session recording actually useful? — Recording terminal output is easy. Making it searchable, replayable, and alertable is what matters.
- Does it cover more than SSH? — Kubernetes, databases, RDP, cloud consoles. SSH is one protocol. Your access problem spans all of them.
Getting Started
If you're running any significant Linux infrastructure, the SSH key problem is worth solving before it solves you (usually in the form of an audit finding or, worse, a breach).
The minimum viable step: inventory your keys. Run find / -name authorized_keys 2>/dev/null across your fleet and count how many keys you find versus how many people you employ. The ratio will be uncomfortable.
From there, the path is clear: move to short-lived, identity-based credentials that expire automatically and leave a complete audit trail.
We built OnePAM specifically for this problem — infrastructure access that works through your existing identity provider, grants time-limited sessions, and records everything. If you're evaluating solutions in this space, it's worth a look.
This post is part of our series on modern infrastructure access patterns. Follow for more on zero trust, access management, and security engineering.