So one of the issues that I deal with a lot is tuning web applications to not suck. This is done by a few things; by monitoring, by profiling, caching, caching (CACHING!), and by tuning. The process for making a web application more awesome basically boils down to this list of steps:
- Monitor your application performance (http threads, cpu, memory, thread response time, etc)
- Profile your code
- Fix slow requests/implement caching
- Tune your web-server
- Goto 2.
Monitoring the response time of your application is useful and awesome for making positive changes to your environment . This means paying attention to your application response time, cpu, memory, network traffic, disk IO, disk capacity, etc. All those metrics that say whether your application is healthy or not. There are a few different tools available for this that all work pretty well, here’s an incomplete list:
- Cacti – http://www.cacti.net
- Munin – http://www.munin-monitoring.org
- Cricket – http://cricket.sourceforge.net
They all work well and solve slightly different problems. Pick the one you like most. I’m a fan of Cacti.
Profiling means being able to see how long each call that an application makes takes to execute. It’s invaluable for getting a feel for what parts, what components, of your application perform badly or perform well.
Whenever an application fetches data from a resource, that’s an opportunity to improve performance. Every time something is fetched, there’s the ability to take that result set, and keep it. Caching the results of database calls, of external API lookups, of intermediate steps, all these things leave lots of room for improving performance. Memcached is the de facto standard for a caching engine.
Cache early, cache often!
A well configured web-server is crucial to a happy environment. This means not running into swap, not wasting time with connections that are dead, and other such things that waste time. In short, don’t look up what you don’t need, don’t save what you don’t need, and be efficient. Here are some basic things that apply to Apache:
KeepAlive Off (Or On, see below, it depends on workload) Timeout 5 HostnameLookups Off MaxClients (RAMinMB * 0.8) / (AverageThreadSizeInMB) MinSpareServers (0.25 * MaxClients) MaxSpareServers (0.75 * MaxClients)
About these parameters:
- KeepAlive – this controls whether when one request from a client to the web-server is completed whether that thread will remain connected to the clients for subsequent requests. In high-scale applications, this can lead to contention for available resources. Some workloads, however, benefit from keeping this on. If you are serving lots of different content types on a page to a client, leaving this on can be a good thing. Test it out, YMMV.
- Timeout — how long before we assume that the client has gone away and won’t be requesting further data. The default is 300. It is in seconds. This value is aggressive.
- HostnameLookups — this is for logging, and if it is on, each client will cause a DNS request to be made. This slows down the request.
- MaxClients — the total number of threads that the server will allow to run site. Each thread consumes memory. This model assumes that 20% overhead for other system tasks is appropriate and will keep us out of swap. On machines above 16GB of ram, use 0.9 instead of 0.8.
- MinSpareServers — the fewest threads that Apache will leave running. Setting this too low will result in load spikes when traffic increases.
- MaxSpareServers — the most spare, unutilized threads that Apache will leave running. Setting this too low will result in lots of process thrashing and threads are used and then terminated. The tradeoff is utilized Ram.
There are a lot of other things that can be done as well, so don’t take this as a complete set…
These are my handout style tips on performance tuning. There are whole volumes of books dedicated to this topic. Some great resources include:
- The Art of Capacity Planning (Allspaw)