« Dotproject | Main | On Intelligence »

April 27, 2005

Performance Measurement

Since 1993, I've built a fair number of interactive web applications. One of the most important areas where my company Vinq differentiates our work is by delivering truly stellar performance and interactivity. This article will cover a few of the techniques we use to evaluate and improve the performance of applications we build.

Later I'll follow this up with another article about some of the specific bottlenecks we've found, how we diagnosed them, and how we set out to improve them.

Technorati Tags: , ,

Measure, Measure, Measure

Our principal technique is to build into the development and testing workflow a way to measure the performance of our application. In our case, we're using Time::HiRes, a module for Perl which measures times to millisecond accuracy. We have built a utility function with the signature timer(depth,message) that we can call from within any module (these are Mason components). It stores the current timing information in a variable which is global to the Mason process. Then, at the end of our main template ("autohandler", in Mason parlance), we call another function showtimer() we wrote, which displays the timer output in detail.

This looks like the following:

Timer
0.000 Start
0.000 0.000configuration name is kp2markd3
0.002 0.002/comp/database-connect.comp
0.002 0.000connect to DBI:mysql:database=*****
0.003 0.001connect to DBI:mysql:database=*****
0.003 0.000Connect to database in sandbox /usr/local/apache/htdocs/*****
0.004 0.001Tie Apache::Session
0.004 0.000Bake a cookie
0.006 0.001/comp/get-user-info.comp
...
0.026 0.000Start whatwherewhen.html
0.026 0.000if params exist -- fill in text boxes
0.028 0.002Add user location
0.029 0.001getuserindicators
0.030 0.001getuserlocations
0.030 0.000getrecentindicators
0.032 0.001getrecentlocations
0.032 0.000Finish init of whatwherewhen.html
0.034 0.002/cds/comp/related_on_kplex.comp
0.034 0.000Start related on knowledgeplex
0.035 0.001found 1 containing places
0.138 0.102found 1464 contained places of 2:26
0.214 0.076selected related by locale facet_id
0.215 0.001selected related by locale facet_id
0.216 0.001loop through places
0.216 0.000done with init
0.217 0.001Finish
In this way, we can see the detailed timing for many of the internal functions that are called as part of a web request, so we know where to focus our performance tuning effort. This timer() function also serves as a useful "debug" type of printout -- if we want to check the value of a variable deep in a component, we can print it to this showtimer() area so it doesn't just come out in the middle of the page content.

We set up showtimer() so it is only called if &timer=1 is present in the URL, or if the user is logged in and is a member of the "admin" group. This means we don't have to go to any extra effort to enable performance measurement while we're logged in and working with the site.

Recently, we've added code to showtimer() that highlights the timing for any steps which were slow, so we can focus our attention there.

Logfiles

Logfiles are a tried-and-true performance measurement technique. You have your application write to a logfile somewhere, then you watch the log as you execute the app, and try to interpret what you see. A good way to watch the logfile is "tail -f", which lets you "follow" the logfile in a terminal window and see new lines as they are appended by other programs.

You can sometimes identify slow operations using this method. It's essential to have millisecond-resolution timing, as above. This is also the only method you can use to diagnose problems users report which you didn't directly observe yourself, if you can't reproduce the slow performance when you try again.

One problem with logs is that they can be crowded with extraneous information; info resulting from the activities of other users on the same web server, or just debug messages that you're not interested in. This can be cured in 2 ways:

  1. Filter the information at the source, by turning off things you're not interested in. Do this by writing to the log within some kind of DEBUG macro or function, which takes a "log level", then increase the log level for the parts of the code you care about. The advantage of this is that you can leave most of the debug info turned off most of the time, which may improve performance since you're not writing to the logfile so much.
  2. Filter the information at read-time. I do this by writing Perl/Mason based filters that show selected content from a logfile, based on matching certain patterns. An advantage is that you can change the filtering on the fly, without editing your source code. You can also make the filter show the elapsed time from the start of an operation, even if the timings stored in the logfile are all absolute.

Load Testing and Performance Testing

You can get a certain amount of information about application performance from load-testing or performance-testing, which involves simulating one or many client interactions with your web application by following a script. Performance Testing involves running a scenario through the application once, to measure the time taken for each step. Load Testing is like performance testing, but you simulate many clients, and see how the whole application performs as the load is increased.

I usually just write a performance test script specific to the application I'm testing, because I want to be able to capture specific timing or content information from the resulting pages, and roll those statistics up across many separate sessions. For example, I might make showtimer() output its results in an XML format, then parse and summarize the output in my script. Or I could add specific timing information about each database operation automatically to my global "timer" variable, then output all of this detail in XML to be aggregated by my performance test script.

The most sophisticated version of this I wrote simulated n user sessions with a delay-between-pageviews of d seconds. It automatically staggered the start times by d/n. It kept statistics on the average, max, and 95% results for the total page timing, as well as tracking those same stats for several of the sub-page timings we care about.

I had a big "aha" moment at StockMaster when I found that for a single-server application (all tiers running on one server), if your showtimer() says it took 250ms (1/4 second) to prepare and draw a page, then your capacity is probably constrained to 4 pageviews per second per CPU. There can be other bottlenecks if you're disk or I/O bound, but by and large this rule of thumb holds true.

What this means is that if you can cut the time for a page from 500ms down to 250ms, you've probably doubled the capacity of what you can get from the same hardware. Not to mention that you've improved the perceived user experience by at least a little bit (it may be dominated by the latency over the internet, or the bandwidth at the end client, which suggests optimizing the size of your content instead).

In my experience, MANY COMMERCIAL WEB APPLICATIONS are poorly optimized. This may be because they are overly complex multi-tier J2EE servlet-based systems running on several distributed servers, or it may be just that their programmers are not that good at performance tuning, or haven't taken the time to do so. In our case, when we're helping our client evaluate commercial solutions and making a build-vs-buy recommendation, we are often agreeing with the client that a commercial solution "feels slow" compared to the apps we build. I don't think it is the J2EE architecture as much as it is details of how the application fetches, processes, caches, and moves data, including the database architecture, proper use of indexes and clustering, testing and measuring each SQL statement, and avoiding redundant work or work that will be thrown away. I'll cover this more in a future article.

Posted by mark at April 27, 2005 02:48 PM

Trackback Pings

TrackBack URL for this entry:
http://www.marktorrance.com/blog/mt-tb.cgi/5

Comments

Post a comment




Remember Me?