Monday, June 29, 2009

Simple and easy site monitoring with PowerShell

In the *nix world, I really like monit for watching and managing small to medium apps. A while back we wrote an intranet app in Grails, running on Tomcat on a Windows 2003 server. The client was hosting the app themselves, but didn't have any IT staff whatsoever, so monitoring and managing this app was a little tricky.

I searched around for something like monit for windows, but didn't find any good free or cheap tools to do what it does, which is primarily to monitor and restart services upon failure. There's a million monitor and alert tools out there, but none that I found had a clean way to restart my Tomcat service if something went wrong. The closest thing I found was Hyperic's open source monitoring solution, but after 30 minutes of futzing around, it seemed too klunky to me for this small problem. I'm sure it's fine, and for bigger solutions it would probably fit well.

Then I ran across a couple PowerShell scripts that intrigued me - the first was for scraping a web page. After that find, it was just a matter of googling to find the remaining piece - restarting a windows service. Bingo! I hadn't done any PowerShell scripting yet, but it turns out to be a really sweet solution for this type of problem. All I had to do after that was to tweak the code to be a little more reusable, and I also created a simple grails view that would exercise the integration features I wanted to include in my ping test. From there it was just a matter of calling the script on a regular basis. In my case, I used the windows task scheduler. Here's what the script looks like:

$webClient = new-object System.Net.WebClient

###################################################
# BEGIN USER-EDITABLE VARIABLES

# the URL to ping
$HeartbeatUrl
= "http://someplace.com/somepage/"

# the response string to look for that indicates things are working ok
$SuccessResponseString
= "Some Text"

# the name of the windows service to restart (the service name, not the display name)
$ServiceName
= "Tomcat6"

# the log file used for monitoring output
$LogFile
= "c:\temp\heartbeat.log"

# used to indicate that the service has failed since the last time we checked.
$FailureLogFile
= "c:\temp\failure.log"

# END USER-EDITABLE VARIABLES
###################################################

# create the log file if it doesn't already exist.
if (!(Test-Path $LogFile)) {
New-Item $LogFile -type file
}

$startTime
= get-date
$output
= $webClient.DownloadString($HeartbeatUrl)
$endTime
= get-date

if ($output -like "*" + $SuccessResponseString + "*") {
# uncomment the below line if you want positive confirmation
#"Success`t`t" + $startTime.DateTime + "`t`t" + ($endTime - $startTime).TotalSeconds + " seconds" >> $LogFile

# remove the FailureLog if it exists to indicate we're in good shape.
if (Test-Path $FailureLogFile) {
Remove-Item $FailureLogFile
}

}
else {
"Fail`t`t" + $startTime.DateTime + "`t`t" + ($endTime - $startTime).TotalSeconds + " seconds" >> $LogFile

# restart the service if this is the first time it's failed since the last successful check.
if (!(Test-Path $FailureLogFile)) {
New-Item $FailureLogFile -type file
"Initial failure:" + $startTime.DateTime >> $FailureLogFile
Restart-Service $ServiceName
}
}
I posted my question (and my own answer eventually) on stackoverflow here, so keep an eye on that if you're looking for improvements to it. There's a lot more you can do with this script, like email upon failure, or monitor multiple sites. Hopefully this helps some other folks out there with problems like this. It was a really nice experience for me; simple and powerful. So hats off to the authors of PowerShell - I'll be reaching for that tool more and more in the future.

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

Links to this post:

Create a Link

<< Home