Azure Stack 1901 Update to VM Scale Limits

Azure Stack regularly releases updates, typically monthly. Sometimes in those updates we find good feature additions, bug fixes, and other items. Sometimes we see a bit of the ugly of cloud in known issues or workaround. In this particular update there is one point that stands out above the rest as being a little odd:

There is a new consideration for accurately planning Azure Stack capacity. With the 1901 update, there is now a limit on the total number of Virtual Machines that can be created. This limit is intended to be temporary to avoid solution instability. The source of the stability issue at higher numbers of VMs is being addressed but a specific timeline for remediation has not yet been determined. With the 1901 update, there is now a per server limit of 60 VMs with a total solution limit of 700. For example, an 8 server Azure Stack VM limit would be 480 (8 * 60). For a 12 to 16 server Azure Stack solution the limit would be 700. This limit has been created keeping all the compute capacity considerations in mind such as the resiliency reserve and the CPU virtual to physical ratio that an operator would like to maintain on the stamp. For more information, see the new release of the capacity planner.
In the event that the VM scale limit has been reached, the following error codes would be returned as a result: VMsPerScaleUnitLimitExceeded, VMsPerScaleUnitNodeLimitExceeded.
— https://docs.microsoft.com/en-us/azure/azure-stack/azure-stack-update-1901

This is one of those ugly ones, and is caused by stability issues at high VM counts we’re told. So basically they’ve added a new limit on the number of VMs per node (60) and the number of VMs per solution (700, or 11.6 nodes at 60 per node). Now whats weird about this is its a VM limit. Not a vCPU, not a RAM, not a disk or other hard resource limit, but rather a VM. This means any calculations you do for capacity on Azure Stack need to take this new limit into consideration. Don’t simply assume that adding nodes will allow you to add more VM capacity if your already at these limits, even if your not at the 16 node limit yet.

I can certainly attest to some stability issues with high numbers of VMs accross large numbers of nodes, so it makes sense, but it bears repeating: Always read the release notes!