How to shutdown and start an Azure Stack system.

azure-stack-operator1.jpg

I've been intending for a couple of months on how to shut down an Azure Stack integrated system 'the right way'.  Why?  Because I had to turn off an instance a couple of months ago due to the location hosting the appliance having planned utility maintenance (it hosts pilot/demo kit only so no need for generators), and didn't want any issues with tenant workloads or S2D. Anyway, I don't particularly need to detail the process now as Microsoft have recently updated their documentation detailing the process (get it here).

So, why did I feel the need to make this post?

Primarily, it's to highlight the importance of regularly checking back on the Azure Stack doc pages; they are constantly adding and updating the guidance, especially when new code updates are released.  The link provided pertains to version 1712 and above.  The  Test-AzureStack PowerShell CmdLet has been added to this version, allowing the operator to confirm that all required Azure Stack roles and services are functioning.

Secondly, I wanted to show give some more detail on the steps, output you will see and time it can take for each stage.

Stop-AzureStack

Here's the high level steps of what happens when you stop Azure Stack:

Connect to the Privileged Endpoint via PowerShell Remoting


$Pep = 'azs-ercs01' $cred = Get-Credential -UserName 'azurestack\cloudadmin' -Message 'Enter CloudAdmin Password' enter-pssession -computer $Pep -ConfigurationName PrivilegedEndpoint -Credential $cred

Change the $Pep variable to match the name or IP address of one of the ERCS VMs. Change the domain name to match that defined when the integrated system was deployed or leave as-is for ASDK deployments.

Start the shutdown procedure


Stop-AzureStack

The following tasks are carried out when you run the command:

  1. The tenant VM's are shutdown (actually saved if you were to check Hyper-V Manager)

    • This includes servers required for PaaS

  2. ADFS and WAS (portals) are shutdown

  3. Fabric Ring services are shutdown (Resource Providers)

  4. Azure Consistent Storage VMs are shutdown

  5. Azure core infra SQL servers are shutdown

  6. Gateway VMs are shutdown

  7. Software Load Balancer VM's are shutdown

  8. Border Gateway Protocol VM is shutdown

  9. Certificate authority VMs are shutdown

  10. Network Controller VMs are shutdown

  11. Finally, the physical nodes are shutdown

The time it takes to complete is dependent on the number of tenant workloads and PaaS servers you have running

If you close down the session that you ran the command from, you can still check on the progress by connecting to the PEP again and running the following command:


Get-ActionStatu Stop-AzureStack

For some reason, the logging is not as 'verbose' as I would like. 

Rest assured, things are happening, although it's not clear! I did try running the command again once it had completed and I did see the correct verbose messages.  Not sure what happened first time round:

Here's the process

  1. Wait for DCs to start

  2. Wait for storage to be ready (S2D cluster)

  3. Start Network Contoller VMs

  4. Start Certificate Authority VMs

  5. Wait for Certificate Authority Service

  6. Validate Certificate Authority.

  7. Start BGP VMs

  8. Start SLB VMs

  9. Start Gateway VMs

  10. Start GW service

  11. Start SQL VMs

  12. Start SQL Cluster

  13. Azure Consistent Storage VMs are started

  14. Fabric Ring services are started (Resource Providers)

  15. ADFS and WAS (portals) are started

  16. Wait for WAS (admin) portal start-up

  17. Wait for WAS (Public) portal start-up

  18. Finally, the tenant VM's are resumed

If you want to see what the progress is, use the Get-ActionStatus Cmdlet:


Get-ActionStatus Start-AzureStack

As with stopping the instance, times will vary.  Expect it to take from between 1 - 2 hours, dependent on Tenant workloads and if you have any PaaS services installed. I had to run the command again as things appeared to have stalled. It didn’t have any adverse affect.

Just a note on this: Although I ran the Start-AzureStack command again, I also ran the Test-AzureStack command in parallel.  It reported that all tests passed, so go figure what was actually happening.  I trust the tests, so use those as the gate to release the instance back into production.

Test-AzureStack

All being well, once the Start-AzureStack command has completed, you'll have a fully operational system.  You *could* resume normal operations and trust it's working.  for peace of mind, I prefer to know that everything is working before letting it back into the wild.

The Test-AzureStack Cmdlet runs a number of tests that will give you the reassurance.


Test-AzureStack

Run the command from the PEP and after  a few minutes you should see a report on components that have passed or failed (hopefully not!)

Anything that doesn't pass, you're going to need to speak to Microsoft support :(

Remember to close the PEP session. Either:


Close-PrivilegedEndpoint -TranscriptsPathDestination '\\yourserver\share' -Credential (get-credential)

Or:


Exit-PSSession