Simulating an Azure storage account failure

redundancy_banner.jpg

Storage Redundancy in the Cloud Redundancy and failover is always an important factor when designing and deploying applications.  As we start to build out applications in the cloud we have seen major disruptions due to a single Azure storage account being used across and entire application.  Logically this makes sense as a single container for all items when considering redundancy this becomes a single point of failure.  Just because it is in the cloud doesn't mean it's redundant.

While Azure storage is generally something you can consider stable, this is IT and anything that can happen, will happen.  We have seen developers and administrators accidentally deleting accounts and Azure has had outages which include storage account failures and in some cases data loss although far less common.  At this point I should mention deployment and use of RBAC might have prevented some of these accidental deletions, but not in every case.

An Example Application

In this example let's consider you are building an application with two web front end servers using IaaS VMs you would like to ensure is as redundant as possible.  We could use an Azure load balancers and deploy two VMs into an availability set and as shown below.  While the load balancer handles traffic and the availability set handles fault and upgrade domains for the VMs, these VMs are still all present on a single storage account that is outside of the availability set protection. If you lose the storage account both of the VMs will fail and your application will go offline.

Capture1_thumb.jpg

Adding Redundancy

If we take this design and add a second storage account for one of the IaaS VMs we can eliminate several scenarios where the application might go offline.  There are several options for redundancy in the storage which you can evaluate depending on your needs, budget and performance.  As well there are limits to the number of storage accounts you can provision and management could also become more complex.

Capture2.jpg

This recommendation is focusing on a single storage account going 'offline' for whatever reason. As you scale up to larger applications you may want to have two or more storage accounts supporting multiple VMs. To reduce the number of storage accounts you could consider striping storage accounts across multiple load balanced application tiers. It's worth noting, that this will help protect against accidental deletion, but even having two storage accounts may not protect you against Azure failures. There is no garantee that a second storage account wont be on the same hardware, or otherwise within the same failure envelope as the first. Its better then nothing, but if you need a higher RTO/RPO, you need to look at proper active/active configuration in seperate regions.

While LRS is the recommended strategy for VHDs to increase performance and reduce costs you may want to consider more resilient options and use ZRS or GRS storage replication for at least one of the storage accounts. Note, there are limitations to using ZRS and GRS as a VMs, specifically around performance and corruption when disk striping. You may even consider deploying more VMs in another availability set to another region depending on the applications requirements.

Simulating Storage Account Failure 

As an administrator, if you are trying to test a redundant application, there is no ‘offline’ option to allow us to test storage failure.  There are some options to simulate this, if you break the lease to the blob you can simulate a hard stop, but this is potentially destructive to the OS. You can find more information about breaking a lease at this Microsoft Reference. You could stop and start the VMs yourself but as the application scales and grows more complex, again you can introduce human error and you still have to go and start them all again anyway. To help with this administrative task, I have modified a script for stopping and starting VMs.

Script and Source

Using an existing script I made some minor modifications and combined code from Darren Robinson here that utilizes RamblingCookieMonster's invoke-parallel function and put them into a single script.

This script allows the administrator to specify a resource group and a storage account, the script will find all VMs on that storage account in that resource group and shutdown the VMs gracefully.  The invoke-parallel allow for tasks to be run at the same time, saving time.  You can then conduct application testing. Once testing is complete, you can use the same script to start the VMs again.

The script Change-VMStateByStorageAccount will ask for your Azure Credentials if they aren't present in your PowerShell session. The script itself requires three parameters as follows (ResourceGroup, StorageAccount, Power (Stop or Start))

Example to stop all VMs

[powershell] Change-VMStateByStorageAccount -ResourceGroup "MyResourceGroup" -StorageAccount "StorageAccount01" -Power "Stop" [/powershell]

PowerShell Code

[powershell] Param( [Parameter(Mandatory=$true)] [String] $ResourceGroup, [Parameter(Mandatory=$true)] [String] $StorageAccount, [Parameter(Mandatory=$true)] [String] $Power )

$StorageSuffix = "blob.core.windows.net"

if (!$Power){Write-host "No powerstate specified. Use -Power start|stop"} if (!$ResourceGroup){Write-host "No Azure Resource Group specified. Use -ResourceGroup 'ResourceGroupName'"} if (!$StorageAccount){Write-host "No Azure Storage Accout name specified. Use -StorageAccount 'storageaccount'"}

function Invoke-Parallel { [cmdletbinding(DefaultParameterSetName='ScriptBlock')] Param ( [Parameter(Mandatory=$false,position=0,ParameterSetName='ScriptBlock')] [System.Management.Automation.ScriptBlock]$ScriptBlock,

[Parameter(Mandatory=$false,ParameterSetName='ScriptFile')] [ValidateScript({test-path $_ -pathtype leaf})] $ScriptFile,

[Parameter(Mandatory=$true,ValueFromPipeline=$true)] [Alias('CN','__Server','IPAddress','Server','ComputerName')] [PSObject]$InputObject,

[PSObject]$Parameter,

[switch]$ImportVariables,

[switch]$ImportModules,

[int]$Throttle = 20,

[int]$SleepTimer = 200,

[int]$RunspaceTimeout = 0,

[switch]$NoCloseOnTimeout = $false,

[int]$MaxQueue,

[validatescript({Test-Path (Split-Path $_ -parent)})] [string]$LogFile = "C:\temp\log.log",

[switch] $Quiet = $false )

Begin {

#No max queue specified? Estimate one. #We use the script scope to resolve an odd PowerShell 2 issue where MaxQueue isn't seen later in the function if ( -not $PSBoundParameters.ContainsKey('MaxQueue')) { if($RunspaceTimeout -ne 0){ $script:MaxQueue = $Throttle } else{ $script:MaxQueue = $Throttle * 3 } } else { $script:MaxQueue = $MaxQueue }

Write-Verbose "Throttle: '$throttle' SleepTimer '$sleepTimer' runSpaceTimeout '$runspaceTimeout' maxQueue '$maxQueue' logFile '$logFile'"

#If they want to import variables or modules, create a clean runspace, get loaded items, use those to exclude items if ($ImportVariables -or $ImportModules) { $StandardUserEnv = [powershell]::Create().addscript({

#Get modules and snapins in this clean runspace $Modules = Get-Module | Select -ExpandProperty Name $Snapins = Get-PSSnapin | Select -ExpandProperty Name

#Get variables in this clean runspace #Called last to get vars like $? into session $Variables = Get-Variable | Select -ExpandProperty Name

#Return a hashtable where we can access each. @{ Variables = $Variables Modules = $Modules Snapins = $Snapins } }).invoke()[0]

if ($ImportVariables) { #Exclude common parameters, bound parameters, and automatic variables Function _temp {[cmdletbinding()] param() } $VariablesToExclude = @( (Get-Command _temp | Select -ExpandProperty parameters).Keys + $PSBoundParameters.Keys + $StandardUserEnv.Variables ) Write-Verbose "Excluding variables $( ($VariablesToExclude | sort ) -join ", ")"

# we don't use 'Get-Variable -Exclude', because it uses regexps. # One of the veriables that we pass is '$?'. # There could be other variables with such problems. # Scope 2 required if we move to a real module $UserVariables = @( Get-Variable | Where { -not ($VariablesToExclude -contains $_.Name) } ) Write-Verbose "Found variables to import: $( ($UserVariables | Select -expandproperty Name | Sort ) -join ", " | Out-String).`n"

}

if ($ImportModules) { $UserModules = @( Get-Module | Where {$StandardUserEnv.Modules -notcontains $_.Name -and (Test-Path $_.Path -ErrorAction SilentlyContinue)} | Select -ExpandProperty Path ) $UserSnapins = @( Get-PSSnapin | Select -ExpandProperty Name | Where {$StandardUserEnv.Snapins -notcontains $_ } ) } }

#region functions

Function Get-RunspaceData { [cmdletbinding()] param( [switch]$Wait )

#loop through runspaces #if $wait is specified, keep looping until all complete Do {

#set more to false for tracking completion $more = $false

#Progress bar if we have inputobject count (bound parameter) if (-not $Quiet) { Write-Progress -Activity "Running Query" -Status "Starting threads"` -CurrentOperation "$startedCount threads defined - $totalCount input objects - $script:completedCount input objects processed"` -PercentComplete $( Try { $script:completedCount / $totalCount * 100 } Catch {0} ) }

#run through each runspace. Foreach($runspace in $runspaces) {

#get the duration - inaccurate $currentdate = Get-Date $runtime = $currentdate - $runspace.startTime $runMin = [math]::Round( $runtime.totalminutes ,2 )

#set up log object $log = "" | select Date, Action, Runtime, Status, Details $log.Action = "Removing:'$($runspace.object)'" $log.Date = $currentdate $log.Runtime = "$runMin minutes"

#If runspace completed, end invoke, dispose, recycle, counter++ If ($runspace.Runspace.isCompleted) {

$script:completedCount++

#check if there were errors if($runspace.powershell.Streams.Error.Count -gt 0) {

#set the logging info and move the file to completed $log.status = "CompletedWithErrors" Write-Verbose ($log | ConvertTo-Csv -Delimiter ";" -NoTypeInformation)[1] foreach($ErrorRecord in $runspace.powershell.Streams.Error) { Write-Error -ErrorRecord $ErrorRecord } } else {

#add logging details and cleanup $log.status = "Completed" Write-Verbose ($log | ConvertTo-Csv -Delimiter ";" -NoTypeInformation)[1] }

#everything is logged, clean up the runspace $runspace.powershell.EndInvoke($runspace.Runspace) $runspace.powershell.dispose() $runspace.Runspace = $null $runspace.powershell = $null

}

#If runtime exceeds max, dispose the runspace ElseIf ( $runspaceTimeout -ne 0 -and $runtime.totalseconds -gt $runspaceTimeout) {

$script:completedCount++ $timedOutTasks = $true

#add logging details and cleanup $log.status = "TimedOut" Write-Verbose ($log | ConvertTo-Csv -Delimiter ";" -NoTypeInformation)[1] Write-Error "Runspace timed out at $($runtime.totalseconds) seconds for the object:`n$($runspace.object | out-string)"

#Depending on how it hangs, we could still get stuck here as dispose calls a synchronous method on the powershell instance if (!$noCloseOnTimeout) { $runspace.powershell.dispose() } $runspace.Runspace = $null $runspace.powershell = $null $completedCount++

}

#If runspace isn't null set more to true ElseIf ($runspace.Runspace -ne $null ) { $log = $null $more = $true }

#log the results if a log file was indicated if($logFile -and $log){ ($log | ConvertTo-Csv -Delimiter ";" -NoTypeInformation)[1] | out-file $LogFile -append } }

#Clean out unused runspace jobs $temphash = $runspaces.clone() $temphash | Where { $_.runspace -eq $Null } | ForEach { $Runspaces.remove($_) }

#sleep for a bit if we will loop again if($PSBoundParameters['Wait']){ Start-Sleep -milliseconds $SleepTimer }

#Loop again only if -wait parameter and there are more runspaces to process } while ($more -and $PSBoundParameters['Wait'])

#End of runspace function }

#endregion functions

#region Init

if($PSCmdlet.ParameterSetName -eq 'ScriptFile') { $ScriptBlock = [scriptblock]::Create( $(Get-Content $ScriptFile | out-string) ) } elseif($PSCmdlet.ParameterSetName -eq 'ScriptBlock') { #Start building parameter names for the param block [string[]]$ParamsToAdd = '$_' if( $PSBoundParameters.ContainsKey('Parameter') ) { $ParamsToAdd += '$Parameter' }

$UsingVariableData = $Null

# This code enables $Using support through the AST. # This is entirely from Boe Prox, and his https://github.com/proxb/PoshRSJob module; all credit to Boe!

if($PSVersionTable.PSVersion.Major -gt 2) { #Extract using references $UsingVariables = $ScriptBlock.ast.FindAll({$args[0] -is [System.Management.Automation.Language.UsingExpressionAst]},$True)

If ($UsingVariables) { $List = New-Object 'System.Collections.Generic.List`1[System.Management.Automation.Language.VariableExpressionAst]' ForEach ($Ast in $UsingVariables) { [void]$list.Add($Ast.SubExpression) }

$UsingVar = $UsingVariables | Group SubExpression | ForEach {$_.Group | Select -First 1}

#Extract the name, value, and create replacements for each $UsingVariableData = ForEach ($Var in $UsingVar) { Try { $Value = Get-Variable -Name $Var.SubExpression.VariablePath.UserPath -ErrorAction Stop [pscustomobject]@{ Name = $Var.SubExpression.Extent.Text Value = $Value.Value NewName = ('$__using_{0}' -f $Var.SubExpression.VariablePath.UserPath) NewVarName = ('__using_{0}' -f $Var.SubExpression.VariablePath.UserPath) } } Catch { Write-Error "$($Var.SubExpression.Extent.Text) is not a valid Using: variable!" } } $ParamsToAdd += $UsingVariableData | Select -ExpandProperty NewName -Unique

$NewParams = $UsingVariableData.NewName -join ', ' $Tuple = [Tuple]::Create($list, $NewParams) $bindingFlags = [Reflection.BindingFlags]"Default,NonPublic,Instance" $GetWithInputHandlingForInvokeCommandImpl = ($ScriptBlock.ast.gettype().GetMethod('GetWithInputHandlingForInvokeCommandImpl',$bindingFlags))

$StringScriptBlock = $GetWithInputHandlingForInvokeCommandImpl.Invoke($ScriptBlock.ast,@($Tuple))

$ScriptBlock = [scriptblock]::Create($StringScriptBlock)

Write-Verbose $StringScriptBlock } }

$ScriptBlock = $ExecutionContext.InvokeCommand.NewScriptBlock("param($($ParamsToAdd -Join ", "))`r`n" + $Scriptblock.ToString()) } else { Throw "Must provide ScriptBlock or ScriptFile"; Break }

Write-Debug "`$ScriptBlock: $($ScriptBlock | Out-String)" Write-Verbose "Creating runspace pool and session states"

#If specified, add variables and modules/snapins to session state $sessionstate = [System.Management.Automation.Runspaces.InitialSessionState]::CreateDefault() if ($ImportVariables) { if($UserVariables.count -gt 0) { foreach($Variable in $UserVariables) { $sessionstate.Variables.Add( (New-Object -TypeName System.Management.Automation.Runspaces.SessionStateVariableEntry -ArgumentList $Variable.Name, $Variable.Value, $null) ) } } } if ($ImportModules) { if($UserModules.count -gt 0) { foreach($ModulePath in $UserModules) { $sessionstate.ImportPSModule($ModulePath) } } if($UserSnapins.count -gt 0) { foreach($PSSnapin in $UserSnapins) { [void]$sessionstate.ImportPSSnapIn($PSSnapin, [ref]$null) } } }

#Create runspace pool $runspacepool = [runspacefactory]::CreateRunspacePool(1, $Throttle, $sessionstate, $Host) $runspacepool.Open()

Write-Verbose "Creating empty collection to hold runspace jobs" $Script:runspaces = New-Object System.Collections.ArrayList

#If inputObject is bound get a total count and set bound to true $bound = $PSBoundParameters.keys -contains "InputObject" if(-not $bound) { [System.Collections.ArrayList]$allObjects = @() }

#Set up log file if specified if( $LogFile ){ New-Item -ItemType file -path $logFile -force | Out-Null ("" | Select Date, Action, Runtime, Status, Details | ConvertTo-Csv -NoTypeInformation -Delimiter ";")[0] | Out-File $LogFile }

#write initial log entry $log = "" | Select Date, Action, Runtime, Status, Details $log.Date = Get-Date $log.Action = "Batch processing started" $log.Runtime = $null $log.Status = "Started" $log.Details = $null if($logFile) { ($log | convertto-csv -Delimiter ";" -NoTypeInformation)[1] | Out-File $LogFile -Append }

$timedOutTasks = $false

#endregion INIT }

Process {

#add piped objects to all objects or set all objects to bound input object parameter if($bound) { $allObjects = $InputObject } Else { [void]$allObjects.add( $InputObject ) } }

End {

#Use Try/Finally to catch Ctrl+C and clean up. Try { #counts for progress $totalCount = $allObjects.count $script:completedCount = 0 $startedCount = 0

foreach($object in $allObjects){

#region add scripts to runspace pool

#Create the powershell instance, set verbose if needed, supply the scriptblock and parameters $powershell = [powershell]::Create()

if ($VerbosePreference -eq 'Continue') { [void]$PowerShell.AddScript({$VerbosePreference = 'Continue'}) }

[void]$PowerShell.AddScript($ScriptBlock).AddArgument($object)

if ($parameter) { [void]$PowerShell.AddArgument($parameter) }

# $Using support from Boe Prox if ($UsingVariableData) { Foreach($UsingVariable in $UsingVariableData) { Write-Verbose "Adding $($UsingVariable.Name) with value: $($UsingVariable.Value)" [void]$PowerShell.AddArgument($UsingVariable.Value) } }

#Add the runspace into the powershell instance $powershell.RunspacePool = $runspacepool

#Create a temporary collection for each runspace $temp = "" | Select-Object PowerShell, StartTime, object, Runspace $temp.PowerShell = $powershell $temp.StartTime = Get-Date $temp.object = $object

#Save the handle output when calling BeginInvoke() that will be used later to end the runspace $temp.Runspace = $powershell.BeginInvoke() $startedCount++

#Add the temp tracking info to $runspaces collection Write-Verbose ( "Adding {0} to collection at {1}" -f $temp.object, $temp.starttime.tostring() ) $runspaces.Add($temp) | Out-Null

#loop through existing runspaces one time Get-RunspaceData

#If we have more running than max queue (used to control timeout accuracy) #Script scope resolves odd PowerShell 2 issue $firstRun = $true while ($runspaces.count -ge $Script:MaxQueue) {

#give verbose output if($firstRun){ Write-Verbose "$($runspaces.count) items running - exceeded $Script:MaxQueue limit." } $firstRun = $false

#run get-runspace data and sleep for a short while Get-RunspaceData Start-Sleep -Milliseconds $sleepTimer

}

#endregion add scripts to runspace pool }

Write-Verbose ( "Finish processing the remaining runspace jobs: {0}" -f ( @($runspaces | Where {$_.Runspace -ne $Null}).Count) ) Get-RunspaceData -wait

if (-not $quiet) { Write-Progress -Activity "Running Query" -Status "Starting threads" -Completed } } Finally { #Close the runspace pool, unless we specified no close on timeout and something timed out if ( ($timedOutTasks -eq $false) -or ( ($timedOutTasks -eq $true) -and ($noCloseOnTimeout -eq $false) ) ) { Write-Verbose "Closing the runspace pool" $runspacepool.close() }

#collect garbage [gc]::Collect() } } }

$StorageAccountName = $StorageAccount.ToLower()

# see if we already have a session. If we don't don't re-authN if (!$AzureRMAccount.Context.Tenant) { $AzureRMAccount = Add-AzureRmAccount }

$SubscriptionName = Get-AzureRmSubscription | sort SubscriptionName | Select SubscriptionName $TenantId = $AzureRMAccount.Context.Tenant.TenantId

Select-AzureRmSubscription -TenantId $TenantId write-host "Enumerating VM's from AzureRM in Resource Group '" $ResourceGroup "' from '" $StorageAccountName "'"

$StorageVMs = get-azurermvm | where {$_.storageprofile.osdisk.vhd.uri -like "*$StorageAccountName.$storageSuffix*"} $vmrunninglist = @() $vmstoppedlist = @()

Foreach($vmonstore in $StorageVMs) { $vmstatus = Get-AzureRMVM -ResourceGroupName $ResourceGroup -name $vmonstore.name -Status $PowerState = (get-culture).TextInfo.ToTitleCase(($vmstatus.statuses)[1].code.split("/")[1])

write-host "VM: '"$vmonstore.Name"' is" $PowerState if ($Powerstate -eq 'Running') { $vmrunninglist = $vmrunninglist + $vmonstore.name } if ($Powerstate -eq 'Deallocated') { $vmstoppedlist = $vmstoppedlist + $vmonstore.name } }

if ($Power -eq 'start') { write-host "Starting VM's "$vmstoppedlist " in Resource Group "$ResourceGroup $vmstoppedlist | Invoke-Parallel -ImportVariables -NoCloseOnTimeout -ScriptBlock { Start-AzureRMVM -ResourceGroupName $ResourceGroup -Name $_ -Verbose } }

if ($Power -eq 'stop') { write-host "Stopping VM's "$vmrunninglist " in Resource Group "$ResourceGroup $vmrunninglist | Invoke-Parallel -ImportVariables -NoCloseOnTimeout -ScriptBlock { Stop-AzureRMVM -ResourceGroupName $ResourceGroup -Name $_ -Verbose -Force } } [/powershell]