I’m considering how to deploy a service that needs SSH access to many important boxes in my infrastructure. Rather than store a long-lived SSH private key in a key store that the service could request, I’m considering using short-lived SSH certificates to allow SSH access for the service. So the two architectures I’m comparing are as follows (and I’m not mentioning the technologies at play, because I’m more interested in the theory and reasoning):
- distribute a service account public key to all necessary servers
- store private key in secure secrets store.
- run service in a role that has access to the private key store
- Distribute a CA cert to all servers
- Store CA Key in secure secrets store.
- CA service runs in a role that has access to the CA key.
- Service generates private key pair and sends CSR to CA service
- CA signs and returns certificate with short life span (~5 minutes or long enough to for the service to authenticate to the servers it needs).
The tradeoffs I see, is that with certificate-based auth, compromised certificates are quickly expired and thus less risky. If a service using SSH is compromised, I can revoke it’s ability to request new certificates without having to do any config on any of the servers and without taking away the ability for other such services to authenticate. However, this architecture is more complex, and in the end, the SSHing service still has to auth somehow to the CA server to authorize the signing. Whether this is from a provider role permission, shared secret (hard-coded or accessible by a secrets store), IP address, or some sort of PKI (having the service provided a signed cert by its provisioner).
But whatever the mechanism, is this providing a benefit above and beyond just giving the services access to the private key, because if the services are ever compromised, an attacker can just as easily request a valid cert and use it just as well as a private key.
Is there a method for providing a way of securely authing to the CA server for signing requests that doesn’t require human intervention and is resistant to the service being compromised? Or is there some other benefit to this architecture that would justify the extra complexity?
I don’t want to confuse the discussion too much by bringing specific technologies into it, but to prevent this from being too abstract, this would operate in a kubernetes, EC2, or similar cloud platform where I can provide a set of API permissions to a service from the platform itself using RBAC. The SSH services might be short-lived push-style tasks or long-lived services like Ansible Tower.