As I write this, I’m on my way home from HashiConf 2024, hosted in the great city of Boston.
This year’s HashiConf was the 10th of its kind and the first to be hosted on the East Coast- also know as the best coast or beast coast. I thought I would take a moment to collect my thoughts on the event as a whole, the announcements made during the keynote, and particularly awesome sessions I attended. Since I’m guessing you’re already chomping at the bit- or chopping, I’m not judging what you do with your bits- I’ll go over the big announcements.
HashiConf took place over two days, with Tuesday being focused on Infrastructure Lifecycle Management (ILM) and Wednesday shifting left into Security Lifecycle Management (SLM). I truly appreciate how HashiCorp has focused and simplified its messaging in the last few years. It’s helped to sharpen the product portfolio and direct feature development.
The biggest announcement of day 1 was the public beta of Terraform Stacks on HCP Terraform. Stacks was announced as an upcoming feature at HashiConf 2023, and over the last 12 months the Stacks team has been working tirelessly to get the feature ready for public testing.
I was part of the private testing and due to the restrictions involved, I was absolutely not allowed to talk about it. Which is sad, because I think Stacks is a game changer for anyone using Terraform at scale and I really wanted to talk about it. But now, I can! What are these mysterious Stacks? That might take a moment to explain.
HCP Terraform has the concept of workspaces to separately manage environments. Within a workspace, a Terraform configuration describes the intended state of the world and an instance of state data records the results of the most recent apply. A workspace is essentially a logical unit of work and an administrative boundary on HCP Terraform.
There is a tension when it comes to workspace design. On the one hand, you’re trying and keep related resources together. The contents of a workspace should share a common lifecycle and be tightly coupled. Within a Terraform configuration it’s trivial to express dependencies through implicit references or explicit dependencies.
One the other end of the tension is a desire to: minimize the size of state, include only resources that share a common administrative boundary, and limit blast radius due to changes. For instance, shared networking resources probably shouldn’t be in the same workspace as the applications that consume it. There’s probably a different team managing the shared network; the lifecycle of the network is not tightly coupled to the applications; and a change in an application workspace should not blow everything up for other apps or the network as a whole.
Thus it makes sense to separate resources into different workspaces based on blast radius, admin boundary, and separation of duties. However, there are still dependencies between resources in separate workspaces, and it is far more difficult to express those dependencies and references across the workspace boundary. For instance, if the subnet ID of a network resource changes in workspace A, how does workspace B know it needs to update the network cards that use the subnet? And how does it get the value of the new subnet ID?
You can use remote state or query via data sources to get actual values. And you can use run triggers to watch for changes to the state of one workspace and have that cascade to other workspaces. That feels a little like a reactive kludge though, and orchestrating changes across multiple workspaces is a fraught endeavor.
Compounding these challenges is the need to have multiple instances of the same configuration for different environments, e.g. Dev, QA, Prod etc. Each follows the same patterns, but needs to be configured separately.
On top of all that, there are times when a roll out has to be sequenced through glue scripts, because values are not yet known. One obvious example is provisioning a Kubernetes cluster and then trying to bootstrap the cluster. Since the cluster doesn’t exist during the initial deployment, the planning phase will throw an error or timeout trying to reach a Kubernetes API endpoint that doesn’t exist.
Stacks solves all of these problems by introducing a new organizing principle- the titular Stack- two new core constructs: Components and Deployments. Each component is a module reference with inputs, providers, and dependencies. Components are combined in a declarative way that allows you to express the relationship between the layers of the Stack. Deployments are an instance of the stack you want to deploy with the opportunity to supply unique inputs for that environment.
The Stacks engine on HCP Terraform understands that sometimes a full plan for the entire stack is not possible, since what is in one layer may rely on resources that do not yet exist in another layer. The resulting plan will contain “deferred changes” which will be planned and applied on following runs once the missing resources are in place. You can also create orchestration rules that automate the application of deferred changes.
That’s a very quick breakdown of Stacks, and if you want a more thorough explanation, check out this video from Sarah Hernandez.
As I said, Stacks is now in public beta, so go kick the tires yourself! It should be available to any organization on one of the current crop of plans (including Free.)
Terraform Stacks may have stolen the show on the ILM side, but the SLM side had two products that I think deserve equal airtime: Vault Radar and Vault Secrets.
As a quick side note, HashiCorp Vault has morphed from a single stand-alone product to a suite of products under the Vault Umbrella. A similar thing happened to Microsoft’s Azure Stack product. I didn’t like it then, and I don’t think I like it now, but naming is hard and I doubt I could do any better. The original HCP Vault product is now referred to as HCP Vault Dedicated. I guess OG Vault is Vault Server now? It’s unclear.
Before I talk about Vault Radar, allow me to tell a little story about when I did something stupid. As you may already know, I’ve created several courses and demos around using Terraform with AWS. In some of those demos, I show how you can hard-code your AWS credentials into a Terraform configuration, and then I tell you to NEVER DO THAT.
To my great embarrassment, after creating those demos I have then accidentally committed my code to GitHub, including those hard-coded credentials 🤦🤦🤦. Even worse? I’ve done it at least three times.
Fortunately, GitHub constantly scans repository commits for AWS keys and alerts AWS when it happens. You’ll quickly get a nastygram from AWS that your credentials have been compromised and the user involved will have their permissions and roles reduced until you revoke the credentials, chastise the user, and make a proper blood sacrifice under a harvest moon.
Checking access credentials into source control is hardly an uncommon event, and it’s only one of many ways that secrets are placed in vulnerable locations. If you’re a security professional trying to get your arms around secrets protection and lifecycle management, you need something to help you find and identify secrets across your organization wherever they might live.
Basically, that’s what Vault Radar does. It constantly scans the sources you provide it access to, and it finds things that look like secrets and lets you know. Vault Radar is officially in public beta now, so you can go try it out for yourself.
As part of the public beta, HashiCorp announced that Vault Radar will support agents to scan for secrets behind your firewalls on internal systems. Vault Radar also includes a pre-commit hook that developers can implement to catch potentials secrets before they are even committed to git. I sure could have used that a couple years ago!
Once you’ve identified where secrets are being stored, you may wish to remove them or manage them with a more robust solution than asking Developer Rick to change them once a quarter. You know Rick is pretty busy and is probably going to forget.
One option is to roll out Vault Server or HCP Vault Dedicated and force your developers to store all secrets securely in Vault. I’m sure they have nothing more pressing and can dedicate the next couple months integrating Vault into all their workflows and application logic. Seems like a piece of cake. I know Rick said he’s busy, but we both know he’s just addicted to Candy Crush.
Okay, so maybe forcing all your developers to drop what they’re doing and implement the Vault API in their code isn’t very feasible. What if you could meet them in the middle? Or even better, what if you could manage and rotate their secrets for them and they didn’t have to change anything? That’s what Vault Secrets is all about.
Vault Secrets is a service on HCP that allows you to manage secrets centrally and synchronize them to targets like AWS Secrets Manager, Azure Key Vault, and Kubernetes. Hopefully your developers are using one of these services to store their secrets and not simply writing them to a random CSV on a file share. Vault Radar can help you figure that out.
Assuming that they are using a service like GitHub Secrets, they will not need to change their workflow or application. Vault Secrets can rotate secrets on demand and synchronize the new value to the service being used by an application or pipeline. Developers can also pull the secret value directly if they aren’t using a secrets service today, which they should be, but Rick is like so close to hitting level 600.
HCP Vault Secrets has been GA since last year. At HashiConf 2024, they added two new features I think are super cool. The first is the general availability of auto-rotation for certain secret types. While Vault Secrets could already rotate secrets, it was an on-demand affair. Auto-rotate will automatically create a new version of the secret and keep both in force for a period of time you choose, giving applications a chance to refresh to the new secret value. When the period elapses, the older version of the secret is revoked.
The second big feature is dynamic secrets for certain secret types. Just like dynamic secrets in Vault Server, dynamic secrets on Vault Secrets are created on demand for an application and have a limited lifetime. For instance, you may have a pipeline that needs temporary credentials for AWS. Vault Secrets can provision credentials on demand that are good for the next 30 minutes, after which time they expire. If another pipeline process requests credentials, it will get a separate, unique set of credentials that are also only good for 30 minutes. This helps limit the blast radius for leaked credentials and simplifies tracking credential leaks. Dynamic secrets is now in public beta and supports AWS and GCP with Azure support coming later this year.
Terraform Stacks, Vault Radar, and Vault Secrets are not the only things announced during the two keynotes. If you want a full rundown, you can check out the official HashiCorp blog posts for ILM and SLM. As far as I’m concerned, though, these three items are the most significant and impactful.
HashiConf is always a very busy time for me. I struggle to walk from one end of the expo to the other without being stopped a half-dozen times. Nevertheless, I did manage to attend more sessions that just the two keynotes. All the sessions I mention below will be posted to the HashiConf site and YouTube in due time, and I will add links to the post once they’re available.
I went into the Microsoft Azure and Terraform session thinking I knew what was going on with Azure and Terraform. And I was pleasantly surprised to discover I was wrong. Turns out Microsoft has been hella busy while I wasn’t looking.
Mark Gray and Steven Ma demoed several features that are going into private preview shortly. There are three I’d like to focus in on: Terraform Export, Portal Copilot, and VSCode Extensions.
Terraform Export is a feature being added to the portal to help export an existing resource or set of resources to Terraform. When you’re looking at a resource today, there’s an Automation section of the left-hand menu that includes the Export option. That option will generate an ARM template that matches the deployed resource. But ARM templates are awful and I hate them. The preview feature Mark demoed included tabs for Bicep and Terraform. You could already do something similar with the Azure Export for Terraform command line tool, but now it will be directly in the portal and allegedly give more robust results.
Speaking of better results, Copilot in Azure has now been trained on the AzureRM and AzAPI providers and Terraform documentation. Generic LLMs can give pretty inconsistent results- read: terrible- when it comes to generating Terraform code, and by inconsistent I mean completely unusable and hopelessly out of date. That’s because ChatGPT hasn’t specifically been trained to produce valid Terraform and doesn’t have the resource definitions handy to guide it. Copilot in the Azure portal now has that additional context, and should be able to generate better Terraform code that actually includes proper input variables and real resource naming.
In a similar vein, the updated VSCode Extension will offer a conversational interface to get the same results without having to leave your IDE. I try to avoid the portal when I can, so this is a welcome update.
If you’re interested in trying out these preview features and more, join the Azure Terraform Community at the link!
Mattias Fjellstrom gave an excellent presentation on how he leverages GitHub Actions and GitHub Issues to enable self-service for Boundary access. In case you didn’t already know, Boundary provides reverse proxy access to remote systems with dynamically injected credentials and scoped permissions.
When a user requests access to a remote system, Boundary verifies they have permission to the host and then works with Vault to dynamically generate and inject credentials into the session. But what if a user doesn’t have access to a system and they need it?
They could raise a ticket and wait for someone to grant access, and then remind that person to revoke their access when they’re done. Or, you could set up a self-service system that provides just-in-time access and revokes that access after a defined period. That would be pretty cool right?
That is exactly what Mattias built. When a user needs access to a system, they create a GitHub issue. That kicks off a GitHub Action granting them access via a Terraform run. When the time period expires, another GitHub Action fires to remove access. Alternatively, the user can close the issue, which will trigger access termination as well. Mattias even set up an approval workflow for systems marked as sensitive.
If you want to check it out for yourself, he’s written a thorough blog post.
When I was walking out of the Speaker Room on Wednesday morning, I held the door for Anthony Ralston, and I could just tell by the look in his eye that he was going ot deliver a killer talk. Call it confirmation bias, but I was absolutely right.
Anthony works for Canva- which I use on a daily basis- and he wanted to talk about how they deployed the Vault Operator on their Kubernetes clusters to leverage dynamic secrets for Datadog.
But he didn’t start with that, instead he started with a problem statement. Every time a new Kubernetes cluster was built, a Datadog API key had to be generated and added to the cluster so they could collect and send metrics through the Datadog agent. To rotate the API key, they had to generate and replace it manually for each cluster. Both processes were time consuming and inefficient. Canva was already using Vault heavily for storing and generating secret values, so they thought perhaps Vault could provision and distribute Datadog API keys dynamically.
Anthony walked through their decision process and included some of the dead ends they hit along the way. It was a nuanced talk that not only highlighted the power of Vault, but also the importance of understanding your requirements and limitations ahead of time and working towards a solution that addresses the actual business problem. Deploying tech for tech’s sake is not a valid justification for a project.
I also learned about the differences between the Vault CSI, Vault Secrets Injector, and Vault Operator. Each solution has its use cases, and it was informative to see them compared in a real world context. Who won out for Anthony’s team? You’ll have to watch the presentation to find out.
I look forward to HashiConf every year. Honestly, its the highlight of conference season for me, and the only conference I’ll be attending this fall. Why? Community and scale.
I’ve been to KubeCon and re:Invent and Microsoft Ignite. And you know what? Conferences with 20k+ attendees kinda suck. Everything is overwhelming: the crowds, the sessions, the venue, the crowds, the lines, the crowds. Are you getting the sense I’m not big on crowds? You would be correct.
There is something about the massive scale of those events that feels necessarily impersonal. You are a badge to be scanned. An attendee to be counted. Sheep to be herded around from session to session. Smaller conferences like HashiConf are nothing like that. People know who you are. You can easily get to every session you want to. You don’t have to show up 15 minutes early to get a seat. You can make real connections with other people.
I’m not trying to yuck anyone’s yum here. If you love going to re:Invent, then more for you. I’ve always been one to prefer a more intimate setting. I went to a small high school, small college, and worked at small businesses. I prefer a show at a 200 person venue to a stadium concert. I’d rather do a trail race with 100 people than the Broad Street Run in Philly with 30k strangers. Every time I try joining some large scale organization, it just doesn’t fit me. Hell, I’ve worked for myself for over five years. You cannot work for a small organization than that.
My main point is that I enjoy conferences with a personal feel and a vibrant caring community. I value substance over spectacle and quality over quantity. My two favorite events this year were HashiConf and DevOpsDays Philly. Both are built on a foundation of a supportive community that is accepting of all kinds.
As HashiCorp says in their code of conduct: For Everyone, Everywhere.
Sponsored Note: This blog post was sponsored by HashiCorp. The opinions and information in the post are mine alone and where not reviewed or edited by HashiCorp.
Resourcely Guardrails and Blueprints
November 15, 2024
Deploying Azure Landing Zones with Terraform
November 12, 2024
October 18, 2024
What's New in the AzureRM Provider Version 4?
August 27, 2024