Wednesday, August 5, 2009

When evaluating design decisions, let the numbers talk

[The title of this article is as much a reminder to myself as it is advice to others.]

One of my clients has a decision to make about when they should use an ESB to expose services in their upcoming architecture.

What's interesting about this decision is that there is a heated debate around it (I say use them only when absolutely necessary, while the company's long-term services company says use them for every public service interface), and this is going to be an expensive decision. It makes for some interesting dynamics in the decision making process.

The conversation became useful when we started talking about concrete ways to compare the approaches for building a particular service. We came up with this rough evaluation criteria:







































CategoryTestsMetrics
Performance1 user, 1 requestresponse time
ScalabilityX users, Y requests per secondavg response time
Initial Effort


Number of hours to implement
MaintainabilityAdd an attributeNumber of hours to implement



Change the interfaceNumber of hours to implement



Version the serviceNumber of hours to implement



It's that last category that really intrigued me. It's obviously a difficult thing to measure with all the variables involved, but I think you can get pretty close with some forethought, and maintainability is the key question here. For example, you can eliminate massive developer differences by having the same person implement all variations of the approaches, or at least pick developers that are about the same in skill and speed levels. You'd also want to eliminate (yet still measure) learning curve factors in any approach, since initial learning curves might inappropriately throw off the long-term comparison measurements. You'd still want to measure those, since that's going to be a factor when you need to ramp up another person to support this application.

Ideally there'd be something like the above in a standard format, much like how the Pet Store application is used to compare platform stacks. I did a little bit of searching on this subject to see if there were any other standard operations to consider as part of an evaluation, but didn't find anything worth looking into. What other metrics or tools have others found to be useful in situations like this?