Leveraging a Learning Culture for Certificate Management
When you ask engineers what they think about digital certificates, you’re likely to get a groan, an eye roll and a story about their last application outage. Digital certificates are a necessity in today’s world, and engineers spend a lot of time and effort managing certificates.
For those who are lucky enough to have never installed or rotated a certificate, here’s a crash course: A digital certificate is a cryptographically signed key pair used by websites and applications to establish trust and enable encryption. In simplest terms, it is a critical part of the “Secure” indicator in your browser, an indicator that your credit card and personal information is being encrypted in flight as you place your order on your favorite website.
Digital certificates expire and must be renewed periodically, usually every one to two years. In fact, the CAB Forum has recently pushed this trend and dictates a two-year maximum validity period. But in this age of virtual instances and microservice architecture, one standard doesn’t fit every model. When we consider short-lived machine and compute instances that spin up and down, a period of days for a certificate is more appropriate. But long-lived instances, which may last for years, mean certificate maintenance is required before it expires, and an increased potential for mistakes and outages.
When certificates expire, so does the ability for your application to successfully complete a TLS handshake – no encryption means your application stops working. Engineers need to make sure new certificates are installed before this occurs. With hundreds of thousands of certificates to manage in an environment the size of Target, you can see where it gets cumbersome, not to mention time-consuming, to ensure this all happens without issue.
Our Digital Certs & Crypto Services team knew how much time and effort engineers were spending on managing certificates – time that could be spent doing other things. We understood the challenge, the obstacles and impact outages pose. So we took advantage of our learning culture and the ability to innovate to build a service to help engineers better manage certificates.
We were able to leverage a vendor API framework that we felt could be used to create a full lifecycle certificate management service. With an eye toward our internal customers, and thinking about how engineering teams worked day to day, we stood it up as a consumable service, easy to leverage, and in the spirit of full stack ownership. By doing this, we were able to eliminate manual management and enhance our service offering without the need for additional members of the Digital Certs & Crypto Services team. Engineers are now empowered to solve certificate needs in the most appropriate way for their application.
Launching this service to automate manual steps has eliminated human error and drastically sped up processing. Certificate provisioning times have dropped from days to seconds. Providing clear guardrails has created fewer security policy exceptions because engineers are fantastic at working within constraints that are clear and reasonable. Fewer exceptions mean the Digital Certs & Crypto Services team can spend more time on service improvements than one-off solutions. Improved monitoring and reporting has driven incidents down more than 70 percent, and if an incident does occur, mean time to recovery is dramatically reduced due to improved provisioning times.
This automation brought other benefits as well, including better consistency in how certificates are managed, which eliminates the need for teams to shoehorn one-size-fits-all security solution into their applications. It also has inspired teams to rethink the possibilities of how they manage their own certificate consumption.
Whether it’s an InfoSec hack-a-thon where delivery of certificates is extended to secure vaulting technologies, an intern challenge to bring automation and real-time alerting to team tools, or an innovation sprint to enable client/server mutual certificate authentication, embracing a learning culture has helped us to create innovative solutions that raise the bar and eliminate operational work and risk. We’ve also rolled out new capabilities like geo-location attributes to support projects like Unimatrix and piloted solutions using the ACME protocol.
The paradigm shift to cloud computing and microservices architecture dictates that security services must also undergo a paradigm shift. Engineering teams expect security solutions that enable the rapid pace of agile development while maintaining high security standards. Security solutions of today must be stable, easy to consume, and operate at the speed and scale of cloud. These three points are the mantra of the Digital Certs & Crypto Services team.
Now, what about those engineers who gave you the eye roll and story about their application going down? Today, they are more likely to smile and tell a story of how things “used to be” and dive into a tale of stability through automation, of new functionality and achievement that could only occur through a culture of collaboration, learning and innovation.
Aaron Everett is an enterprise security architect and product owner of the Digital Certs & Crypto Services team at Target. Tim Sward is a lead engineer on the Digital Certs & Crypto Services team at Target.