Sunday 25 January 2009

Using Powershell to Monitor VMware Guests - on a Budget...

...i.e. a budget of £0.

(Update 28/01/09 - FYI..got some feedback about this post and the reason we are not using the built in alerts in the VI client is because the CPU alerts in this case were not granular enough for us.)

So this all stemmed from trying to track down which process was causing particular servers' CPU to hit 100% for a period. So first of all my colleague and Get-Scripting co-host Alan Renouf traded a script back and forth which ended up as the CheckHighCPU function - it is now pretty cool and comes back with a list of processes sorted by how much CPU they are using and very importantly for our circumstance who is the owner of each process.

The servers in question all belong to a particular cluster in ESX. Rather than constantly having to monitor the console, wait for a VM to turn red and then run the script to track down the process we decided to try and monitor them with a Powershell script, kick off the CheckHighCPU function when a server's CPU hit 100% for a significant enough period and email a warning through with the process details - and so the below script was born.

OK, its not Operations Manager and to be honest its not really production quality, but it does the job we need it to.

The VI Toolkit from VMware is a great set of cmdlets you can plug into your Powershell console to manage your VMware environment. You can use the Get-Cluster and Get-VM cmdlets to return a list of all the VM's in that cluster as objects. You can then use the very handy Get-Stat cmdlet to retrieve performance data for each VM.

In this case we check the cpu.usage.average statistic of a period of the last few minutes (watch out for the -IntervalMins parameter it can produce a period slightly different to what you would expect) and if its over 99% run the CheckHighCPU function and send the results by email.

We then make the script sleep for a short time period so that we are not constantly bombarded with alerts if a warning is triggered.

Obviously if we wanted to make it production quality we would add in some error checking and testing to see if an alert had recently been sent, but for the time being its doing a great job for the requirements that exist.

You could obviously re-use the below to monitor for different statistics offered by Get-Stat like disk or memory.



Function EmailWarning ()
{
param ($ServerName,$Attachment)
#Email warning
Write-Output "Creating E-Mail Structure"

$smtpServer = "servername"

$msg = new-object Net.Mail.MailMessage
$att = new-object Net.Mail.Attachment($attachment)
$smtp = new-object Net.Mail.SmtpClient($smtpServer)

$msg.From = "sender"
$msg.To.Add("recipient")
$msg.Subject = "Server Warning - High CPU on $Servername"
$msg.Body = "$Servername has a CPU value of $HighCPU %"
$msg.Attachments.Add($att)

Write-Output "Send E-Mail"
$smtp.Send($msg)

$att.Dispose();

}

Function CheckHighCPU ()
{
param ($Target)

$procs_total = Get-WmiObject -Class Win32_PerfRawData_PerfProc_Process -Filter 'name="_total"' -ComputerName $Target
$procs = Get-WmiObject -Class Win32_PerfRawData_PerfProc_Process -Filter 'name<>"_total"' -ComputerName $Target

[
int64]$totalpercentuser = 0
foreach ($proc in $procs_total)
{
$totalpercentuser = $totalpercentuser + $proc.PercentUserTime}

[
decimal] $perc = [System.Convert]::ToDecimal($totalpercentuser)

$myCol = @()
Foreach ($proc in $procs){
$proc_perct = (($proc.PercentUserTime / $perc) * 100)
if ($proc_perct -gt 1){
$Process = Get-WmiObject win32_process -ComputerName $target | where {$_.ProcessID -eq $proc.IDProcess}
$MYInfo = "" | select-Object Name, CPUUsage,Owner, ProcessID
$MYInfo.Name = $proc.name
$MYInfo.ProcessID = $proc.IdProcess
$MYInfo.CPUUsage = [Math]::Round($proc_perct, 0)
$MYInfo.Owner = $process.GetOwner().user
$myCol += $MYInfo
}
}

$myCol | Sort-Object CPUUsage -Descending | Out-File $file
EmailWarning $VMname $file
}

Connect-VIServer servername
$vms = Get-Cluster clustername | get-vm
$time = Get-Date

do {

foreach ($vm in $vms){

$VMname = $vm.name
$filename = $VMname + '.txt'
$file = "C:\Scripts\$filename"
$stats = Get-Stat -entity $vm -IntervalMins 2 -stat cpu.usage.average -MaxSamples 1
write-host $VMname
$stats

if ($stats.value -ge 99){
$HighCPU = $stats.value
Write-Host "Warning!" -ForegroundColor red
CheckHighCPU
$VMname
}

else
{
}
}

Start-Sleep -Seconds 30
}

until ($time.hour -ge 17)

8 comments:

Anonymous said...

Did you have this set as a service or scheduled task? I need to do something almost exactly like this...and I wonder how did you run this script or have it run...

Jonathan Medd said...

At the minute, I run it manually as and when I need it. You could run it as a scheduled task using something like the below:

http://tinyurl.com/37q7vd

Anonymous said...

Sounds good...did you just run the script as above directly...I think I am missing something?

Jonathan Medd said...

Just updated a section where you need to change details for your environment. You need to supply the name of your virtual center server and the name of the cluster.

Connect-VIServer servername
$vms = Get-Cluster clustername | get-vm

Anonymous said...

I missed the name of the cluster ... just fixed that and hopefully it should work :)

Jonathan Medd said...

No problem. I hadn't coded that bit very well, but hopefully now the update makes it a bit clearer.

Anonymous said...

Is there any documentation available on how to use the "Performance Data" under the Links in the VI Toolkit?

Jonathan Medd said...

I usually find the best place to start is within the inbuilt help in Powershell, e.g. help Get-Stat. Alternatively head on over to the VI Toolkit community page and post your question.

http://communities.vmware.com/community/developer/windows_toolkit?view=discussions