Building IaaS in the cloud is becoming more popular. And part of building IaaS is providing some level of disaster recovery. After spending the last few weeks working in AWS, I realized that the toolsets I expected to exist just don’t. So what did I do? Scripted my own, or at least a start.
Coming from the world of Azure, I am used to the availability of what they call LRS (locally redundant storage) and GRS (globally redundant storage). When a virtual machine is backed by LRS, there are always three-local copies of the data in the data center which are in separate failure domains. GRS takes that a step further by creating three copies of the data in a companion datacenter. When I started deploying instances in AWS, I mistakenly assumed that a similar storage backing was available to provide disaster recovery to a companion region. Evidently, I was wrong. So what’s the suggested alternative? Create snapshots of the volumes backing your instance, and then copy those snapshots to another region for durability. So naturally I looked for the setting to auto-create snapshots, and… there isn’t one. So instead I rolled my own using AWS PowerShell (since I love PowerShell) and running the process as a scheduled task.
The first script in the series is below. It takes the following:
The script does a few things. First it finds the existing volumes for the instance and tags them with new tags. I included the instance name, instance ID, the mount point of the volume, and what region it’s in. All this information can be used later when the snapshot it copied over to another region. If it’s the root volume of the instance, I also add in the VPC ID and the Subnet ID, and set an IsRootVolume tag to true. I’m doing that in order to automate the recovery of the instance in another region. That script is still in development, but I figure if I know the instance ID, VPC, and subnet, I can probably recover a bunch of instances in a mirrored config in another region. Now that the volumes are tagged appropriately, I’ll create a snapshot. The snapshot will copy all the tags on the volume, and add a Date and Name tag to the snapshot. The date is nice for deleting stale snapshots, or finding a consistent point in time for a bunch of them.
The script ends by outputting the snapshot IDs, which can be ingested by another script to copy them over to another region.
Here is the current version of the script:
In a follow up post I will show a script for copying the snapshots over, and an orchestration script to perform the operation for an entire set of instances based on tags or VPC ID.
Resourcely Guardrails and Blueprints
November 15, 2024
Deploying Azure Landing Zones with Terraform
November 12, 2024
October 18, 2024
What's New in the AzureRM Provider Version 4?
August 27, 2024