SecurityInfrastructure

Why Your Secrets Management Is Probably Broken

January 202612 min read

Working in cybersecurity, I've seen how pervasive the misconceptions around secrets management are across the industry. Most organizations struggle with the same fundamental issues: engineers don't truly understand what secrets are or how they should be managed. Static, long-lived credentials persist because they're simple to implement initially, even though they create cascading problems later.

This isn't a tooling problem. You can't fix this by buying HashiCorp Vault or switching to AWS Secrets Manager. The problem is deeper. Most engineers (including early in my career) don't actually understand what secrets are or how they should work.

What We Get Wrong About Secrets

If I ask you to list examples of secrets, you'll probably say: API keys, passwords, database credentials, tokens. All correct, but this list doesn't help you make decisions.

Here's a better way to think about it: a secret is anything that grants authority if someone else gets hold of it.

Your database password is a secret because having it means you can read, write, or delete production data. Your TLS private key is a secret because having it means you can impersonate your service. Your CI signing key is a secret because having it means you can push code that looks legitimate.

The common thread isn't the format or where you store it. It's that possession equals power.

This reframe matters because it changes how you handle these things. A database password isn't just a configuration value you happen to keep private. It's delegated authority to access production data. When you store it in a config file or environment variable, you're creating a bearer token that grants permanent, unrevocable access to anyone who finds it.

Most security incidents don't happen because someone broke your encryption. They happen because someone found a secret sitting in a place it shouldn't be, and that secret had way more power than it needed to have.

The Part Nobody Implements

Secrets have a lifecycle. They get created, distributed to the things that need them, used for some period of time, rotated when they get old or compromised, revoked when they're no longer needed, and eventually destroyed.

In practice, most systems only handle two of these steps: creation and use. Everything else is "we should probably do that at some point."

Here's what actually happens in most companies:

Someone creates a database password for a new service. They put it in an environment variable or a secrets manager. The service uses it. Everything works. Ship it.

A year later, you have twelve services using that same password because it was easier than setting up new credentials. One of those services has verbose error logging that occasionally dumps its entire environment to the logs. The password has been sitting in your log aggregation system for eight months.

You discover this during a security audit. You decide to rotate the password. Half your services break because they were caching the old one. You spend four hours tracking down all the places that need updating. You eventually get it working, but it took down two services in the process and someone got paged at 2am.

Next time someone suggests rotating credentials, everyone remembers this disaster and finds reasons to postpone it. The password lives for three more years.

This is not an exaggeration. I have seen this exact sequence play out at multiple companies. The problem isn't that engineers are careless. The problem is that the easy path (static, long-lived, shared credentials) creates systems that are nearly impossible to maintain safely.

Why Static Secrets Are Technical Debt

Static secrets are convenient. You generate a password once, put it somewhere, reference it when needed. It works. The problem is what happens after that initial setup.

Static secrets are long-lived. Once created, they stick around indefinitely unless someone makes a specific effort to rotate them. They're shared across multiple systems because setting up individual credentials for each service is more work. They get reused between environments because "staging doesn't need real security." They're hard to revoke because you can't know who's using them.

When a static secret leaks, you have a permanent compromise. The secret doesn't expire. You can't limit what it can do retroactively. You can't tell what the attacker did with it. Your only option is to rotate it, which means updating every system that uses it, which is error-prone and often breaks things.

So what actually happens is that leaked credentials don't get rotated. They stay active because the operational cost of rotating them is too high. Security becomes a losing battle against entropy.

The correct way to think about static secrets is as technical debt. Every static secret in your system is a future incident waiting for the right trigger. You accumulate them because they're easy in the moment, and you pay interest in the form of increased blast radius and difficult incident response.

You should only use static secrets when you have no alternative, when the blast radius is genuinely small, when the lifetime can be kept short, and when you have a proven rotation process that actually works. Otherwise, you need a different model.

How Dynamic Secrets Change Everything

Dynamic secrets work differently. Instead of creating a credential once and using it forever, you create credentials on demand that expire after a short period.

A service needs database access. It requests credentials. You generate a new database user with a random password that expires in four hours. The service uses those credentials. Four hours later, they stop working. The service requests new credentials. Repeat.

If those credentials leak, the attacker has a four-hour window. After that, the credentials are useless. You don't need to rotate anything. You don't need to track down every place the credentials might have been used. The problem resolves itself.

This approach has other benefits. Each service gets its own credentials, so you can track who accessed what. Credentials are scoped to specific permissions, so a service that only needs read access can't accidentally (or maliciously) delete data. Revocation is straightforward because you just stop issuing new credentials to that service.

The mental shift here is important. With static secrets, your security model is "prevent all leaks." That's impossible. People make mistakes. Systems have bugs. Attackers are persistent.

With dynamic secrets, your security model is "assume leaks will happen, limit the damage." That's achievable. You can't prevent every leak, but you can make sure that when one happens, the blast radius is small and the window of opportunity is narrow.

This doesn't mean dynamic secrets are a silver bullet. They add complexity. Your services need to handle credential renewal. Your infrastructure needs to support generating credentials on demand. You need monitoring to catch when credential requests fail. But this complexity is manageable, and the security benefits are substantial.

Identity vs Secrets

There's a related concept that's worth understanding: the difference between secrets and identity.

A secret proves you have something. A password, a key, a token. If you possess the secret, you get access. This works, but it has the problems we've been discussing. Secrets can be copied, leaked, shared, stolen.

Identity proves who you are. Modern systems can verify identity without relying on shared secrets. Kubernetes service accounts, cloud IAM roles, OIDC tokens, SPIFFE IDs. These systems let you say "I am service X, running in environment Y, and I need to do Z" without passing around a password.

The benefit is that identity can't be copied in the same way. If your identity is based on where you're running and what signed your deployment, an attacker can't just steal a string and use it elsewhere. They would need to compromise your build system or your orchestration layer, which is harder and more detectable.

This is where the industry is moving. Instead of distributing secrets to prove identity, we're building systems that can verify identity directly and then grant access based on that identity. Secrets still exist (you need them to bootstrap trust) but they're not the primary access control mechanism.

What Vault Actually Does

Let's talk about Vault because this is where a lot of teams get confused.

Vault is not a safe where you store passwords. I mean, you can use it that way, but that's missing the point. Vault is an authorization engine that happens to have an API for issuing secrets.

The value of Vault isn't that it encrypts data at rest (your cloud provider can do that). The value is that it can bind secret issuance to identity, enforce time limits, audit access, and reduce blast radius.

Here's how Vault should work: a service authenticates using its identity (Kubernetes service account, AWS IAM role, whatever). Vault verifies that identity, checks its policies, and issues short-lived credentials appropriate for that service. The service uses those credentials until they expire, then requests new ones.

Here's how Vault actually gets used at a lot of companies: someone stores the production database password in Vault. Services authenticate to Vault using long-lived tokens stored in environment variables. They fetch the password from Vault at startup and cache it in memory forever. Vault is now just an API in front of what is effectively a static secret.

The second approach is not more secure than just using environment variables. You've added complexity and a new service dependency without changing the fundamental security model.

The failure mode I see most often: a service authenticates to Vault and gets a token. That token gets written to disk for convenience. The token is long-lived because short-lived tokens are "annoying to manage." The token has broad permissions because "we might need to access other secrets later." That token leaks. Now the attacker has access to Vault with the same permissions as your production services.

Vault didn't fail here. Your design failed. You used Vault as a fancy key-value store instead of as an authorization engine.

Proper Vault usage means short-lived tokens, automatic renewal with proper error handling, least-privilege policies, and explicit revocation paths. It also means designing for the case where Vault is unavailable.

This last point is critical. If Vault goes down for fifteen minutes, what happens? If the answer is "all my services stop working," you've created a single point of failure in the name of security. Your services need some way to handle Vault being unavailable, whether that's cached credentials with TTLs, fallback mechanisms, or graceful degradation.

Why CI/CD Is Where Everything Goes Wrong

CI pipelines are where secrets go to die. Your CI environment needs access to production systems to deploy code. That means it needs credentials. Those credentials end up in environment variables, in secret storage that gets injected at runtime, or in configuration files that get committed to your repository.

Then someone opens a malicious pull request. Or uses a compromised dependency. Or runs a third-party GitHub Action that nobody audited. That code runs in your CI environment with access to all your secrets. The attacker exfiltrates your cloud credentials. Now they have production access.

This is not theoretical. This is how supply chain attacks work. One compromised package in your build, one malicious PR that gets past review, one GitHub Action that turns evil in an update. Your CI environment is a high-value target because it has access to everything.

The old model was to put static credentials in CI and hope nothing bad happened. The new model is to use identity-based authentication. Your CI job proves its identity (using OIDC or a similar mechanism) and uses that identity to request short-lived credentials scoped to exactly what that job needs to do.

GitHub Actions supports this. AWS supports this. GCP supports this. If you're still using static IAM keys in CI, you're creating unnecessary risk.

What Actually Happens When Things Go Wrong

Let me tell you about real incidents, because the theoretical risks only become real when you see the consequences.

GitHub Actions has had multiple waves of secret leaks. Secrets exposed in logs because error messages included environment variables. Secrets abused through pull requests from forks. Compromised workflows that exfiltrated credentials. GitHub has added secret masking, OIDC-based authentication, and reduced the default permissions for workflow tokens. These weren't optional improvements. They were responses to actual attacks.

Cloud credential leaks happen constantly. Someone commits an AWS access key to a public repository. Automated scanners find it within minutes. Attackers spin up cryptocurrency miners in your account. You wake up to a five-figure AWS bill. This happens often enough that AWS, GCP, and Azure all have automated detection for leaked credentials and will sometimes proactively disable them.

The common thread in these incidents isn't sophisticated attacks. It's basic operational failures. Credentials that lived longer than they should have. Credentials with more permissions than they needed. Credentials that got logged, committed, or cached somewhere insecure.

The fix isn't better security tools. The fix is better defaults. Short-lived credentials that expire automatically. Identity-based authentication that doesn't rely on shared secrets. Scoped permissions that limit blast radius. Systems designed to survive credential leaks instead of trying to prevent them entirely.

What You Should Actually Do

If you're running a system with static secrets everywhere, you don't need to fix everything overnight. But you do need a plan.

Start by understanding what secrets you have and how they're used. What credentials exist? How long do they live? Who has access to them? What happens if they leak?

Then start moving toward better models. Use dynamic credentials where your infrastructure supports it. Implement identity-based authentication instead of shared secrets. Reduce the lifetime of credentials that have to be static. Build and test rotation procedures before you need them in an emergency.

Add monitoring for unusual secret usage. If a service that normally makes ten database connections per minute suddenly makes a thousand, something is wrong. If credentials get used from an unexpected location, that's worth investigating.

Most importantly, design for failure. Assume credentials will leak. Assume services will be compromised. Assume attackers will find their way in. What's your blast radius? How quickly can you detect and respond? Can you limit the damage?

Security isn't about perfect protection. Perfect protection doesn't exist. Security is about making attacks harder, detecting them faster, and limiting the damage when they succeed.

Static secrets make all of this harder. They live forever, grant broad access, and leave no audit trail. When they leak (and they will leak), you have a permanent compromise that's difficult to detect and expensive to remediate.

Dynamic secrets and identity-based access make this easier. Credentials expire automatically. Access is scoped to specific needs. Usage is audited. Leaks have limited blast radius and short windows of opportunity.

You can't eliminate risk. But you can shift from a security model that requires perfection to one that survives the inevitable mistakes. That's the difference between secrets management that works and secrets management that's waiting to fail.