Azure Stack TP1 POC Stable Install notes

azurestack1.png

I thought about writing yet another detailed step by step guide for installing TP1, but figured there are enough of those out there. If you need one you can google here. At Azure Field Notes, we're about sharing things we’ve learned thru our experiance in the field, so I decided to hit the high points based on the notes from one of our most stable POC installs to date. Now, this is a fully supported POC, meaning its running on the supported hardware and we’re not modifying any of the install scripts here. We will post other articles soon that cover some of those tricks and tweaks. With that, lets get going:

 

Hardware:

Dell PowerEdge R630 Dual Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz 12core (24 cores total) 384GB DDR4 Registered ECC RAM network card: Emulex OCm14104-U1-D 10Gb rNDC (supported for the POC install, not supported for storage spaces direct) Physical Disks: PERC H730 1GB Cache configured as follows 6 300GB SAS 2 disks in a raid 1 mirror for the OS, other 4 disks are pass-through

(Note this box is supported for the POC, but likely won’t match any of the production supported hardware, so don’t go buy a fleet of these expecting to run this stuff in prod). Production supported hardware will be only sold as a pre configured, pre integrated system per this blog post from Mike Neil.

 

Next the process:

  • Perform a complete firmware update of all components
  • configure the array with a single raid 1 mirror and 4 non raid disks set to pass-through mode
  • Set the Bios to boot from UEFI
  • Ensure the time zone and time are set correctly in the Bios
  • Load a servicing OS with boot from vhd support (2016 TP5 works well). Ensure you install to the mirror
  • Setup a static IP on the NIC
  • Copy the stack bits locally
  • Copy any drivers needed for the machine locally
  • Copy the TP4 VHD from the Stack bits to the root
  • Use BCDEdit to change the boot vhd to the TP4 stack vhd.
  • Boot the machine, copy any drivers from the local drive to win\inf and reboot
  • Enable RDP and set the NIC IP
  • Reboot and install chrome or some non edge browser and ensure it is the default (Edge doesn't like the built in admin account by default. You can obviously choose to change its settings, or use IE, but chrome comes in handy).
  • Install update KB3124262 if needed
  • Disable all except the primary NIC port and ensure you have no more then 4 raw disks in disk manager (should match the 4 SAS disks that are passed through)
  • Disable Windows update
  • Disable defender
  • Double check the system time zone and time and ensure it is correct. (Later if you get an AAD Auth Error talking about verifying the message then your time zone is off, check it again)
  • Install Stack. Use an AAD account, set the natVM Static IP to something on the same subnet as your host that isn't used, set the static gateway to your gateway and set the admin password etc.

Script:

[powershell] $secpasswd = ConvertTo-SecureString “urpassword” -AsPlainText -Force $adminpwd = ConvertTo-SecureString “uradminpassword” -AsPlainText -Force $mycreds = New-Object System.Management.Automation.PSCredential (“uraadaccount@yourtenant.com”, $secpasswd) .\DeployAzureStack.ps1 –Verbose -NATVMStaticIP 172.20.40.31/24 –NATVMStaticGateway 172.20.40.1 -adminpassword $adminpwd -AADCredential $mycreds [/powershell]

    • Wait a bit and check for errors. I had none after I did all of the above. If you do run into errors, I highly recommend blowing away the TP4 VHD and starting over. Re running the installer appears to work, but we’ve had instability later with random failures when it doesn't complete in one pass.
    • Login to the client VM and Install Chrome and set to default browser (see above)
    • Go through and disable windows update and defender on all stack VMs (Look for a post on how to do this in an unsupported way as part of the install, some boxes arent domain joined, so a simple GPO wont do it. Update, Matt's post is live HERE)
    • Turn off TIP tests once you think things are working properly (from the client VM) (If its around 12am the TIP tests will be running, wait until they are complete and have cleaned up everything]

[powershell] Disable-ScheduledTask -TaskName AzureStackSystemvalidationTask [/powershell]

Probably not required, but seems to speed up demos and such:

  • Shutdown the environment
  • Up the memory and CPU on all VMs to 16gig min 100%, update cores to 4, 8 or 12 depending on the original setting.
  • Start it back up, wait about 10 minutes and then run the validation script to make sure all the services are online properly.

 

Some additional notes:

  • The MUXVM and BGPBM seem to hang occasionally when connected via the hyper-v console. This appears to be Hyper-v on TP4 issues. During these hangs they also seem to stop responding to the network.
  • Sometimes rebooting a VM causes the host to bluescreen and reboot (also appears to be a TP4 issue)
  • Once your up and running, screens in stack that show a user picker seem to sit and spin for  wile, especially on group pickers. Other things are simple not done (new buttons on some of the resource RPs), or need some time to JIT the first time they’re accessed (Quota menus when creating a plan for example).
  • If doing scripted testing, it seems creating and tearing down an empty resource group through Powershell is pretty repeatable. I’d start with deploying a template containing nothing but a resource group if trying to test toolchains. Next least impactful seems to be storage accounts. Compute/Network seems to be the heaviest, and also the most inconsistent with if it will be successful or not on any given attempt.
  • You’ll notice we’re not installing any additional Resource Providers. We’ve had significant stability problems with the current builds and so are waiting until new builds are available before loading them in anything but our most bleeding edge environments.
  • Finally, Remember folks, its TP1, and according to Mike Neil’s Blog post, we’re a year away.