I was working on updating some Terraform code as part of a consulting engagement and I came across an EC2 configuration that was using the template_cloudinit_config
data source to create user_data
to send to the instance. Since I know that the template
provider has been archived by HashiCorp and the recommendation is to use the templatefile
function, I endeavoured to replace the template_cloudinit_config
data source with templatefile
and that is where I fell down a rabbit hole of the MIME format, cloud-init picadillos, and nested templates.
I thought I would write a post about my little adventure and the eventual workaround. If you don’t care and just want the answer, feel free to skip to the end or simply use the module I wrote to solve this problem.
The template
provider has been archived by HashiCorp in favor of the templatefile
function. To understand why, you can check out a whole video I did on it, but I can quickly summarize here. The template
provider has a data source called template_file
which will render text based on a template and variable inputs. Since you’re using a provider, Terraform has to hand off the work to a provider plugin. The templatefile
function does exactly the same thing, but because it’s a function, it is included in the Terraform binary.
The evaluation and execution time for a function is much faster than a provider plugin. With the introduction of the templatefile
, the template_file
data source was no longer required. However, there is another data source in the template
provider that doesn’t have a comparable function, template_cloudinit_config
.
The other data source in the template
provider is template_cloudinit_config
. To support folks who want to use that data source, HashiCorp created the cloudinit
provider with a single data source called cloudinit_config
. Essentialy, it functions exactly like the template_cloudinit_config
data source, but it’s in a new provider that is being actively maintained by HashiCorp.
But wait. If the templatefile
function is faster than the template_file
data source, wouldn’t the same be true for the template_cloudinit_config
data source? Unfortunately, there is no templatecloudinit
function, so how can I create the same thing using functions? First we need to understand what is being created by the template_cloudinit_config
data source and recreate it.
The template_cloudinit_config
data source creates a multipart MIME configuration for cloud-init. This is the moment I realized I was in for a yak-shaving expidition. What the hell is MIME? And what it multipart about it? And what does it have to do with cloud-ini? MIME at least sounds familiar.
MIME is the multipurpose internet mail extensions standard created for handling mail messages that use non-ASCII characters and to support attachments. As a former Exchange Admin I remember seeing MIME from time to time in various menus and dropdowns, but I never had to do anything with it.
Even though MIME was originally intended for email messages, it has been adapted for use in HTTP communcation and the cloud-init standard. In addition to supporting different media types, MIME allows you to construct a single configuration that contains multiple parts, each with their own content type. So there we have it, multipart MIME. Now, what does that have to do with cloud-init? Further down the rabbit hole we go!
Cloud-init is an industry standard used to initialize compute instances in a cloud by reading in information like cloud metadata, user data, and vendor data. User data is provided by the client to initialize the system after the cloud metadata portion is complete.
User data must be in a multipart MIME format and optionally gzipped to keep the user-data content under the 16KB limit. Cloud-init supports multiple content types in the MIME configuration including cloud-config, jinja2, and x-shellscript. If you don’t know what any of those are, don’t worry, the official cloud-init docs have you covered.
To sum up, multipart MIME is a format originally intended for email messages, but adatped for use by cloud-init to assist with configuring compute instances on first boot. The template_cloudinit_config
data source creates a multipart MIME configuration. We need to understand the format to recreate it without the data source.
If I want to natively produce multipart MIME content using Terraform functions, I will need to know what the resulting content looks like. The general format is something like this:
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary=MIMEBOUNDARY
This is the beginning of the cloud-init, followed by a boundary delimiter for the next part.
--MIMEBOUNDARY
Content-Transfer-Encoding: 7bit
Content-Type: text/cloud-config
Mime-Version: 1.0
YAML for the cloud-config
--MIMEBOUNDARY
Content-Transfer-Encoding: 7bit
Content-Type: text/x-shellscript
Mime-Version: 1.0
Bash script to run
--MIMEBOUNDARY--
That should be pretty easy to replicate. I can use the templatefile
function for each part of the MIME content and build it inline using standard Terraform constructs like for expressions.
The last thing to cover is the encoding. The template_cloudinit_config
data source gives the option to compress with gzip and encode with base64. Fortuantely, Terrform has a base64gzip
function which will take care of that for me.
Here’s the orginal code that uses the template_cloudinit_config
data source:
data "template_file" "cloud_init" {
template = file("cloud-init.yaml")
vars = {
package_update = "true"
package_upgrade = "false"
}
}
data "template_file" "x_shellscript" {
template = file("startup-script.sh")
vars = {
name = "Arthur"
}
}
data "template_cloudinit_config" "config" {
gzip = true
base64_encode = true
part {
content_type = "text/cloud-config"
content = data.template_file.cloud_init.rendered
}
part {
content_type = "text/x-shellscript"
content = data.template_file.x_shellscript.rendered
}
}
We’re using the template_file
data source twice and the template_cloudinit_config
data source once. The goal is to replace all of those with native functions. First we need to build the parts for cloud-config
and x-shellscript
. Ideally, this should be extensible, so if someone wants to add more parts, it’s pretty easy to do so.
The parts information includes the content type, content from a file, and variables for that file. We can store that with a list of objects stored in a local value:
locals {
cloud_init_parts = [
{
filepath = "cloud-init.yaml"
content-type = "text/cloud-config"
vars = {
package_update = "true"
package_upgrade = "false"
}
},
{
filepath = "startup-script.sh"
content-type = "text/x-shellscript"
vars = {
name = "Arthur"
}
}
]
}
We can add more parts by adding another object to the cloud_init_parts
list. Next we need to render each part into the content used by the multipart MIME format. Using a local value and a for expression, each part can be stored in a list as a string.
locals {
cloud_init_parts_rendered = [ for part in local.cloud_init_parts : <<EOF
--MIMEBOUNDARY
Content-Transfer-Encoding: 7bit
Content-Type: ${part.content-type}
Mime-Version: 1.0
${templatefile(part.filepath, part.vars)}
EOF
]
}
Finally, we need to put it all together with the header and footer of the format. I created a cloud-init.tpl file with a for expression we can pass our rendered parts to:
Content-Type: multipart/mixed; boundary="MIMEBOUNDARY"
MIME-Version: 1.0
%{~ for part in cloud_init_parts ~}
${part}
%{~ endfor ~}
--MIMEBOUNDARY--
Using a combination of the templatefile
and base64gzip
functions, we have the final product:
locals {
cloud_init_gzip = base64gzip(templatefile("cloud-init.tpl", {cloud_init_parts = local.cloud_init_parts_rendered}))
}
Voila! The local value cloud_init_gzip
can be used in place of the rendered content from the template_cloudinit_config
data source. And I didn’t have to use any provider plugins to do it.
You might be wondering if all this mucking about was worth it. I mean there is a perfectly good cloudinit
provider. Why not just use that and call it a day? That’s fair! In part, I just wanted the challenge of doing it with native functions. But there are two other considerations here. First, we’ve removed a dependency on a plugin. That’s one less codebase we have to trust and pull on each terraform init
.
Second, in theory using the native functions should be faster than the provider plugin. So, is it? Yes!
Here’s a run of the original code:
$ time terraform apply -auto-approve
No changes. Your infrastructure matches the configuration.
Terraform has compared your real infrastructure against your configuration and found no differences, so no changes are needed.
Apply complete! Resources: 0 added, 0 changed, 0 destroyed.
real 0m4.598s
user 0m1.267s
sys 0m1.039s
And here’s a run of the updated code:
$ time terraform apply -auto-approve
No changes. Your infrastructure matches the configuration.
Terraform has compared your real infrastructure against your configuration and found no differences, so no changes are needed.
Apply complete! Resources: 0 added, 0 changed, 0 destroyed.
real 0m0.197s
user 0m0.013s
sys 0m0.099s
So um, yeah. It’s a bit faster. Does saving ~4s matter in the grand scheme of things? Not at my scale. But imagine if you’ve got a large configuration that needs to render multiple template
provider data sources on every run. And that configuration is baked into a CI/CD pipeline that runs every time someone opens a PR or makes a commit. The time savings could start to stack up!
Replacing the template_file
data source with the templatefile
function is a slam dunk in terms of simplicity and support. But getting rid of the template_cloudinit_config
data source is less straightforward. While you could use the cloudinit
provider, there’s an opportunity to save time and remove a dependency if you’re willing to do a little extra work. And I kinda did the extra work for you!
In fact, if you’d like to consume this as a module, you can do exactly that: https://registry.terraform.io/modules/ned1313/native/cloudinit/latest.
Of course that introduces a new dependency on an external module, so that’s entirely up to you. But at the very least, you’ll still get the performance improvements.
Resourcely Guardrails and Blueprints
November 15, 2024
Deploying Azure Landing Zones with Terraform
November 12, 2024
October 18, 2024
What's New in the AzureRM Provider Version 4?
August 27, 2024