In a previous post, I performed a storage performance benchmark of Azure Managed Disks and Azure Files for Azure Kubernetes Service. The testing included the now generally available Ultra SSD class of Managed Disk. The process for using Ultra SSD with AKS was fraught with peril, caveats, and an assist from the AKS product group to get it all working. I thought I would detail how I went about enabling Ultra SSDs with AKS in case someone else was struggling with the same.
The Ultra SSD class of storage from Microsoft Azure is their highest performance tier of managed disks. This is the first time Microsoft has added provisioned IOPS and throughput to a storage offering, something that AWS has had for quite some time. The disks can serve up to 160,000 IOPS and 2000 MBps throughput to an Azure VM. The solution was in preview until August 15th of this year, when it was released to general availability. Even though it is GA, the Ultra SSD class is not available in all regions and the feature is not enable by default on all subscriptions. It’s also worth noting that there is no Azure VM that is capable of pushing 160k IOPS or 2000 MBps. The highest published numbers are 80k IOPS and 1200 MBps.
There are several requirements and prerequisites that need to be met before using the Ultra SSD. At a high level they are:
In order to get your subscription enabled for Ultra SSD, you will need to fill out a form and wait. I wish there was a super sexy Azure CLI command that would register the feature, you know like az feature register --namespace Microsoft.Compute --name UltraSSD
. But that will not work. Ultra SSDs are available by request only even though the feature is no longer in preview. Go ahead and fill out the form and wait for your subscription to be enabled. I’ll wait…
All good? Great. Let’s proceed.
Now that you have Ultra SSDs enabled, it’s time to pick a region. The supported regions as of this post are East US 2, North Europe, and Southeast Asia. You can always check the most recent list on the Ultra Disk section of the FAQs. For my testing I chose to go with East US 2.
There are a few other things to know about Ultra SSDs. The disks are provisioned in an availability zone (AZ) and will only attach to a VM in the same AZ. It makes sense that only regions that have AZs could support Ultra SSDs.
The reason to force availability zone matching between disk and VM also makes a lot of sense. The storage backend supporting the Ultra SSD disk feature needs to be in the same data center as the VM attached to the disk in order to meet the target IOPS and bandwidth. Because of this requirement, an AKS cluster that want to use Ultra SSD disks will need to be provisioned with AZ support, which also requires the use of Virtual Machine Scale Sets. These are preview features that must be enabled within your subscription:
You are also going to need to install the aks-preview extension for your Azure CLI instance. I recommend doing all of this in Cloud Shell to simplify matters.
First let’s install the aks-preview extension.
az extension add --name aks-preview
az extension update --name aks-preview
Great, now register the preview features.
az feature register --name AvailabilityZonePreview --namespace Microsoft.ContainerService
az feature register --name AKSAzureStandardLoadBalancer --namespace Microsoft.ContainerService
az feature register --name VMSSPreview --namespace Microsoft.ContainerService
The features may take up to ten minutes to register. You can check on the features by running the following command.
az feature list -o table --query "[?contains(name, 'Microsoft.ContainerService/')].{Name:name,State:properties.state}"
Then refresh the registration by running.
az provider register --namespace Microsoft.ContainerService
You are now ready to deploy your AKS cluster. Run the following command, substituting the proper values for the placeholders in ALL_CAPS
.
#Change these
rg="RESOURCE_GROUP_NAME"
loc="LOCATION"
clus="CLUSTER_NAME"
#Leave this one alone
mcrg="MC_${rg}_${clus}_${loc}"
az group create --name $rg --location $loc
az aks create --resource-group $rg --name $clus --generate-ssh-keys --enable-vmss --load-balancer-sku standard --node-count 1 --node-zones 1 --node-vm-size Standard_D64s_v3
az aks get-credentials --resource-group $rg --name $clus
Only the VM families DS and ES V3 support Ultra SSDs. The node pool used with the Ultra SSD must be of that family. Also, to maximize potential performance, I went with the D64s size that has the highest disk throughput available on an Azure VM - 80,000 IOPS and 1,200 MBps throughput for uncached disks. Ultra SSDs only support a setting of None for caching, so the uncached performance is what we’re looking at here. The Ultra SSDs are also deployed in a specific availability zone, so the node pool only needs to have one node in a single AZ. When we create the Ultra SSD, we will create it in the same zone as the single node in the node pool.
The node pool is created using a VMSS, but it isn’t ready to support Ultra SSD disks just yet. There is an ultraSSDEnabled
property setting that needs to be configured on all VMs and VMSS.
{
"additionalCapabilities": {
"ultraSSDEnabled": true
}
}
The Azure CLI provides a switch for adding this property when creating a VM or VMSS directly. Since the node pool creation process abstracts the underlying VMSS creation, there is no opportunity to set this property. The property must be added after creation by deallocating the VMSS, updating the setting, and starting the VMSS back up.
There is a resource group which is created when the AKS cluster is generated with the naming standard MC_resourcegroup_clustername_location. The VMSS is in that resource group and is named something like aks-nodepool1-#######-vmss, where the ####### is some set of integers. Since there is only a single VMSS in the resource group - assuming you only have one node pool - then we can simply show all VMSSs and query the name.
vmss=$(az vmss list --resource-group $mcrg --query [].name -o tsv)
az vmss deallocate -g $mcrg -n $vmss
az vmss update -g $mcrg -n $vmss --set additionalCapabilities.ultraSSDEnabled=true
az vmss start -g $mcrg -n $vmss
That az vmss update
command is not well documented, or at least I had trouble understanding exactly how the command wanted me to structure the set parameter, or if I should use the add parameter instead. Big thanks to the AKS product team for helping out there!
My original plan was to create an Ultra SSD storage class and use that to provision volumes for the pods. While I was able to create a storage class, there is no setting in the Azure disk provider to specify the AZ targeted for creation. It appears that the provider will simply create an Ultra SSD disk in a random zone that is included in the cluster configuration. That will work if the node pool supporting the Ultra SSDs covers all of the AZs set in the cluster configuration. If that is not the case, you either have to roll the dice or manually create the disk and attach it.
All Azure managed disks are subject to this limitation with availability zones, and Microsoft makes note of that in the AKS documentation. Other managed disk classes do not have to be deployed in an availability zone, and thus the cluster doesn’t have to use AZs. Due to the unique nature of Ultra SSDs, the cluster must use AZs and thus the problem comes to the forefront.
Here is the storage class I created for testing.
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: managed-ultra
provisioner: kubernetes.io/azure-disk
parameters:
storageaccounttype: UltraSSD_LRS
kind: Managed
cachingMode: None
DiskIOPSReadWrite: "160000"
DiskMBpsReadWrite: "2000"
You can save that to a file and run kubectl apply -f azure-ultra-sc.yaml
to create the storage class. If you’d like to see the storage class in action, simply create a new file with following contents.
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: dbench-pv-claim
spec:
storageClassName: managed-ultra
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1024Gi
Then run kubectl apply -f azure-ultra-pvc.yaml
. That will generate an Ultra SSD Disk in the cluster’s resource group. You can view the disk by running az disk list --resource-group $mcrg
We can destroy the disk by running kubectl delete -f azure-ultra-pvc.yaml
. These disks are pretty expensive, so I recommend deleting it as soon as possible.
If you are in a situation where the storage class will not work because the zonality is random, then it is pretty easy to create an Ultra SSD disk and use it within a configuration. Create the disk by running the following.
az disk create -g $mcrg --name ultraSSD --size-gb 1024 --zone 1 --sku UltraSSD_LRS --disk-iops-read-write 160000 --disk-mbps-read-write 2000
az disk show -g $mcrg -n ultraSSD --query id -o tsv
You are going to need the disk URI in order to attach it to a pod. Here is an example configuration that I used for storage testing.
apiVersion: batch/v1
kind: Job
metadata:
name: dbench
spec:
template:
spec:
containers:
- name: dbench
image: ndrpnt/dbench:1.0.0
imagePullPolicy: Always
env:
- name: DBENCH_MOUNTPOINT
value: /data
- name: DBENCH_QUICK
value: "no"
- name: FIO_SIZE
value: 1G
- name: FIO_OFFSET_INCREMENT
value: 256M
- name: FIO_DIRECT
value: "1"
volumeMounts:
- name: dbench-pv
mountPath: /data
restartPolicy: Never
volumes:
- name: dbench-pv
azureDisk:
kind: Managed
diskName: ultraSSD
diskURI: DISK_URI
cachingMode: None
backoffLimit: 4
Simply update the DISK_URI
placeholder with the correct value and save it. Then run kubectl apply
to create it. Don’t forget to run az disk delete -g $mcrg --name ultraSSD
at the end to clean up the Ultra SSD disk!
The process of using Ultra SSDs with AKS is a bit more challenging than other types of managed disk. While all of these hurdles can be overcome, the biggest remaining challenge is dealing with the zonality of Ultra SSD disks. Ideally the Azure disk provider should be able to specify an AZ, rather than picking one at random. I’m sure that will be a feature added as availability zones in AKS move towards GA.
I hope this was helpful to some people out there! Let me know if you have additional questions or run into trouble.
The Science and Magic of Network Mapping and Measurement
January 9, 2025
January 2, 2025
December 30, 2024
Resourcely Guardrails and Blueprints
November 15, 2024