I’ve been intending for a couple of months on how to shut down an Azure Stack integrated system ‘the right way’. Why? Because I had to turn off an instance a couple of months ago due to the location hosting the appliance having planned utility maintenance (it hosts pilot/demo kit only so no need for generators), and didn’t want any issues with tenant workloads or S2D.
Anyway, I don’t particularly need to detail the process now as Microsoft have recently updated their documentation detailing the process (get it here).
So, why did I feel the need to make this post?
Primarily, it’s to highlight the importance of regularly checking back on the Azure Stack doc pages; they are constantly adding and updating the guidance, especially when new code updates are released. The link provided pertains to version 1712 and above. The Test-AzureStack PowerShell CmdLet has been added to this version, allowing the operator to confirm that all required Azure Stack roles and services are functioning.
Secondly, I wanted to show give some more detail on the steps, output you will see and time it can take for each stage.
Here’s the high level steps of what happens when you stop Azure Stack:
Connect to the Privileged Endpoint via PowerShell Remoting
$Pep = 'azs-ercs01' $cred = Get-Credential -UserName 'azurestack\cloudadmin' -Message 'Enter CloudAdmin Password' enter-pssession -computer $Pep -ConfigurationName PrivilegedEndpoint -Credential $cred
Change the $Pep variable to match the name or IP address of one of the ERCS VMs.
Change the domain name to match that defined when the integrated system was deployed or leave as-is for ASDK deployments.
Start the shutdown procedure
You’ll see some verbose output on the progress:
As you can see from the image above, the following tasks are carried out when you run the command:
- The tenant VM’s are shutdown (actually saved if you were to check Hyper-V Manager)
- This includes servers required for PaaS
- ADFS and WAS (portals) are shutdown
- Fabric Ring services are shutdown (Resource Providers)
- Azure Consistent Storage VMs are shutdown
- Azure core infra SQL servers are shutdown
- Gateway VMs are shutdown
- Software Load Balancer VM’s are shutdown
- Border Gateway Protocol VM is shutdown
- Certificate authority VMs are shutdown
- Network Controller VMs are shutdown
- Finally, the physical nodes are shutdown
The time it takes to complete is dependent on the number of tenant workloads and PaaS servers you have running
If you close down the session that you ran the command from, you can still check on the progress by connecting to the PEP again and running the following command:
When you’re ready to power the appliance back on, here’s what you need to do:
- Power on the physical nodes.
- If you have an integrated system, ensure that you power on all nodes at the same time, otherwise you may find yourself having issues due to S2D re-balancing
- After a period of time (around 10-15 minutes), connect to the HLH and connect to the PEP via PowerShell Remoting (use procedure detailed earlier)
- The only VMs that will be running on the nodes will be the DCs and Emergency Recovery Service computers
- Run the following command:
You’ll get to see something that looks similar to this:
For some reason, the logging is not as ‘verbose’ as I would like. You’ll see lots of this:
Rest assured, things are happening, although it’s not clear! I did try running the command again once it had completed and I did see the correct verbose messages. Not sure what happened first time round:
Here’s the process
- Wait for DCs to start
- Wait for storage to be ready (S2D cluster)
- Start Network Contoller VMs
- Start Certificate Authority VMs
- Wait for Certificate Authority Service
- Validate Certificate Authority.
- Start BGP VMs
- Start SLB VMs
- Start Gateway VMs
- Start GW service
- Start SQL VMs
- Start SQL Cluster
- Azure Consistent Storage VMs are started
- Fabric Ring services are started (Resource Providers)
- ADFS and WAS (portals) are started
- Wait for WAS (admin) portal start-up
- Wait for WAS (Public) portal start-up
- Finally, the tenant VM’s are resumed
If you want to see what the progress is, use the Get-ActionStatus Cmdlet:
You’ll need to decipher the output, but it’s not entirely unreadable:
As with stopping the instance, times will vary. Expect it to take from between 1 – 2 hours, dependent on Tenant workloads and if you have any PaaS services installed.
Here’s what I saw when I first ran the command:
The output above indicates it’s still in progress, so at this point i ran the Start-AzureStack command again. This time I got the verbose output I would expect to see:
After another hour, eventually I got the following output:
Just a note on this: Although I ran the Start-AzureStack command again, I also ran the Test-AzureStack command in parallel. It reported that all tests passed, so go figure what was actually happening. I trust the tests, so use those as the gate to release the instance back into production.
All being well, once the Start-AzureStack command has completed, you’ll have a fully operational system. You *could* resume normal operations and trust it’s working. for peace of mind, I prefer to know that everything is working before letting it back into the wild.
The Test-AzureStack Cmdlet runs a number of tests that will give you the reassurance.
Run the command from the PEP and after a few minutes you should see output like this:
Anything that doesn’t pass, you’re going to need to speak to Microsoft support 🙁
Remember to close the PEP session. Either:
Close-PrivilegedEndpoint -TranscriptsPathDestination '\\yourserver\share' -Credential (get-credential)