Friday, October 30, 2009

A luser's guide to Iometer

Hi, my name is Dan, and I'm a software development luser who sometimes dreams of being a sysadmin. Actually, what I really want is the authority to pick all the cool hardware, but with none of the "the server's down!" accountability. If you hear of that job, lemmeno. In the meantime...

If you haven't lived under a rock during the last 5 years, you're aware that virtualization is taking over. This is a good thing. The trouble is, supporting virtualized environments requires a completely new skill set. The ROI is there, but it takes a real knowledge investment! And if you install virtualized servers in a company with overworked and stressed out BOFH's that aren't able or willing to make that learning investment, you can end up worse off than before.

In this particular case, I installed an instance of JIRA and Confluence for a client as part of our big development project. I've installed it before at a couple other places, and performance has always been really good. But at this client, it sucks. As in 10-second-response-time sucks. The instances are running on the VMWare ESX platform, which is the key variable that's different. The trouble for me as a developer in this problem is that:
a) I'm not a VMWare tuning expert.
b) I'm not a BOFH, so naturally, the infrastructure group hates me. Just kidding, they really like me...I know that knife I found in my "Let's deploy EJBs and crush the souls of our poor operations staff" book was just for fun.

But seriously, what we had here was a real failure to communicate. I knew that this thing just wasn't running right. But I didn't know how to narrow the problem down and quantitatively convince them of the issue. I was able to see that it wasn't CPU bound just by watching the CPU meter. I knew it wasn't the database, because I tried a couple different databases, each with the same performance. During my local testing on my laptop, the full production instance ran great. So I was kinda sure that it was I/O bound, but I needed proof.

Enter Iometer. Iometer is an open source tool, originally written by Intel, and open-sourced in 2001. It's become one of the most popular storage performance and benchmarking utilities available. Even VMWare recommends it as a tool you can use to analyze storage throughput.

I'm still an Iometer n00b, so take what I say with a grain of salt, and teach me if you know more than I do please! e.g. One of the questions I have is this one over at serverfault.

Short story long, the punchline is this: I ran an Iometer test comparing performance between my development laptop (7200 RPM drive) and the virtual server instance running on a SAN. Here's the results:







Pretty amazing, huh? Amazingly bad for the server - it's nearly 3 times as slow as my feeble little laptop! Like I mentioned, I'm not an Iometer expert by any means, and there are different ways to configure the tests. But I'd be surprised if you could configure the tests to find this big of a performance difference between systems that were anywhere near each other's performance.

On how to run the tests - here's one tutorial on configuring it, and here's another. The only tricky things I found:
1. On the Disk Targets tab, there's a setting for the Maximum Disk Size, and it's specified in sectors. It's been a couple centuries since I worked with sectors, but fortunately google is your friend.
2. The user guide is a little confusing about what a yellow disk target icon with a red slash through it means. If it's yellow with a red slash, it just means that IOmeter will need to create its iobw.tst file on the drive when it starts the test - you don't have to do anything. Do select the drive you want to test against though - this will be marked with a little X next to the drive when you select it.

Turns out that diagnosing I/O performance is easier than I thought it would be. Now that just leaves the hard part of trying to get our BOFH friends to fix problems like these when they happen...good luck!