Category Archives: MySQL

Web Operations Performance – The Handout

So one of the issues that I deal with a lot is tuning web applications to not suck.  This is done by a few things; by monitoring, by profiling, caching, caching (CACHING!), and by tuning.  The process for making a web application more awesome basically boils down to this list of steps:

  1. Monitor your application performance (http threads, cpu, memory, thread response time, etc)
  2. Profile your code
  3. Fix slow requests/implement caching
  4. Tune your web-server
  5. Goto 2.

Monitoring

Monitoring the response time of your application is useful and awesome for making positive changes to your environment .  This means paying attention to your application response time, cpu, memory, network traffic, disk IO, disk capacity, etc.  All those metrics that say whether your application is healthy or not.  There are a few different tools available for this that all work pretty well, here’s an incomplete list:

  • Cacti – http://www.cacti.net
  • Munin – http://www.munin-monitoring.org
  • Cricket – http://cricket.sourceforge.net

They all work well and solve slightly different problems.  Pick the one you like most.  I’m a fan of Cacti.

Profiling

Profiling means being able to see how long each call that an application makes takes to execute.  It’s invaluable for getting a feel for what parts, what components, of your application perform badly or perform well.

Caching

Whenever an application fetches data from a resource, that’s an opportunity to improve performance.  Every time something is fetched, there’s the ability to take that result set, and keep it.  Caching the results of database calls, of external API lookups, of intermediate steps, all these things leave lots of room for improving performance. Memcached is the de facto standard for a caching engine.

Cache early, cache often!

(Apache) Tuning

A well configured web-server is crucial to a happy environment.  This means not running into swap, not wasting time with connections that are dead, and other such things that waste time.  In short, don’t look up what you don’t need, don’t save what you don’t need, and be efficient.  Here are some basic things that apply to Apache:

   KeepAlive          Off (Or On, see below, it depends on workload)
   Timeout            5
   HostnameLookups    Off
   MaxClients         (RAMinMB * 0.8) / (AverageThreadSizeInMB)
   MinSpareServers    (0.25 * MaxClients)
   MaxSpareServers    (0.75 * MaxClients)

About these parameters:

  • KeepAlive – this controls whether when one request from a client to the web-server is completed whether that thread will remain connected to the clients for subsequent requests.  In high-scale applications, this can lead to contention for available resources.  Some workloads, however, benefit from keeping this on.  If you are serving lots of different content types on a page to a client, leaving this on can be a good thing.  Test it out, YMMV.
  • Timeout — how long before we assume that the client has gone away and won’t be requesting further data.  The default is 300.  It is in seconds.  This value is aggressive.
  • HostnameLookups — this is for logging, and if it is on, each client will cause a DNS request to be made.  This slows down the request.
  • MaxClients — the total number of threads that the server will allow to run site.  Each thread consumes memory.  This model assumes that 20% overhead for other system tasks is appropriate and will keep us out of swap.  On machines above 16GB of ram, use 0.9 instead of 0.8.
  • MinSpareServers — the fewest threads that Apache will leave running.  Setting this too low will result in load spikes when traffic increases.
  • MaxSpareServers — the most spare, unutilized threads that Apache will leave running.  Setting this too low will result in lots of process thrashing and threads are used and then terminated.  The tradeoff is utilized Ram.

There are a lot of other things that can be done as well, so don’t take this as a complete set…

These are my handout style tips on performance tuning.  There are whole volumes of books dedicated to this topic.  Some great resources include:

-Gabriel

Of Tuning WordPress on Cherokee

So at work, I had a blog.  Not my blog, of course.  One I support.  So this blog was thrown together quickly to facilitate business goals.  Like you do.  And that’s great.  We met the deadline.  We got the product functional.  But performance kinda sucked.  A little backstory.  Here’s how it got set up to start.

We love virtualization here.  Everyone should.  It’s a fantastic way to take adventage of hardware that you’ve got laying around that’s being idle.  Idle hardware is lame. So we run KVM.  This means we get to manage machines more effectively, and can provision things faster.  That’s awesome.

So this blog.  It’s a cluster of boxes, made up of a pair of webservers, and a pair of DB backends.  The DB backends are a master, and a backup.  This is a small project, so these got set up with just 4GB of ram.  So that’s fine.  The web servers are each a VM with 1 vcpu, 4GB of ram, and some disk space.

So we set that all up, and it did fine.  But not great.  here’s what it did.

Incidentally, if you’ve got data you’ve collected for performance with Pylot, and need to CSV the averages to make a pretty graph, here’s my way of extracting the averages from the results HTML to feed into gnuplot.

for file in load-dir/*/results.html ; do cat $file  | egrep 'agents|avg' | head -3 | perl -e 'while(<>){s/<\/?t[dr]>//g; s/(\d+\.?\d*)/\t$1/g; chomp; @p=split /\t/,$_; push @r,$p[1]; }; printf "%s\n", join ",",@r; '; done

So what we can see then from these graphs is that a pair of single core boxes, tuned with the default cherokee+php installation don’t do all that great.  They can handle 1-2 simultaneous requests, but past that, the response time gets pretty bad. That’s where the project got left for a while, until the other day I got the request to make it handle 500 requests per second.  “Wow, shit.” I thought.  So I dove in to see what I could do to improve performance on the blog, and it turned out that there was a lot I could do.

  1. Single-core boxes don’t have a lot of performance available to them.
  2. Cherokee uses PHP in FastCGI mode, and does a terrible job of tuning it out of the box.  It defaults to a single PHP thread.
  3. WordPress is very hungry for CPU time.  It chews up a lot of resources doing its DB queries and wrangling them.

To address these points, I did the following.

The VMs themselves were restarted with more CPU cores — four cores per box.   This allowed me to dive into discovering that Cherokee wasn’t tuned well.  Under a single core, I saw 80% cpu utilization on PHP, with high system wait time.  That sucked.   But after I bumped up the core count, it still looked bad.  Still only one CPU @ 80%. WTF.  So then I turned to Cherokee, and I tuned PHP so that it would invoke 16 children, enough to handle the request volume.  This helped a lot, but there was still room to do better.  So I added APC, the Alternative PHP Cache, to the configuration.  This helped out a lot.  I then looked at wordpress specific caching engines, and settled on using W3-Total-Cache to help out.  This brought performance into the range of fulfilling the customer’s request.  I felt great about it.

I used pylot to graph performance at various points through this project so I could figure out how I was doing a better or worse job of tuning the boxes.

Here’s the performance data showing the improvements that caching at various layers added to the party:

So it turns out that these are some great ways to make sure your blog performs:

  • Make sure that there’s enough hardware supporting it
  • Cache in depth and breadth, early and often
  • Tune your PHP/WebServer to handle the project
  • Employ performance testing to measure your real improvements

With some hard work and performance tuning, you can turn an abysmal 7 requests/sec peak performing blog into one that can sustain 350 requests per second, and do it in a tiny fraction of the time.

-Gabriel

 

Flying is Fine — Extra Screening Sucks Hard

TSA Security Line
Thanks to flickr user redjar for the image.

I hate airports.  Not flying, flying I love.  But the process of dealing with airlines, and luggage, and most especially the TSA/Security Theater aspect of the process drives me crazy.  I flew down to SJC this week for a technical conference, MySQLConf.  It was a great conference: I met neat people, I learned new things, I got excited about tech again.  I loved that part.  The flight down was fine.  Easy, no problem through the airport.  Only the usual anxiety about the process.

Continue reading

MySQL, Alter Table, and How to Observe Progress

Network Gear

This only works on MySQL 5.0.  If you’re using 5.1+, this approach isn’t for you.

Today we had to do a very large alter table on some production data.  The table in question is big. 47 million rows big.  We have to make some changes to drop a column and change some indexing to support our application.  Fine, ALTER TABLE blah blah blah.  But how do we keep an eye on the progress of the alter?

There’s nothing in the documentation for MySQL that speaks to a method, nor does any google searching pull up anything (except for the one InnoDB file per table patch.)

So what’s a clever systems administrator to do?

I looked at the output of lots of status commands from MySQL, and finally I came across something useful in the InnoDB engine status output.  I noticed that there was an entry in the TRANSACTION section that speaks to the undo log entries.  I noticed that after several hours of our ALTER TABLE running that the undo log entries value was up around 33 million.  And then it occurred to me that since InnoDB does row-level operations, that this value might correspond to the number of rows that have been operated on throughout the process of the ALTER:

---TRANSACTION 0 2769988559, ACTIVE 53210 sec, process no 4115, OS thread id 1157654880 inserting, thread declared inside InnoDB 91
mysql tables in use 2, locked 2
111667 lock struct(s), heap size 13580272, undo log entries 33707735
MySQL thread id 948913367, query id 1662722204 localhost root copy to tmp table

So I wrote some bash viagrasstore.net.  And some awk.  And some perl, and I made a hacky little one-liner script to dump out the rate of change, the average rate of change, the % complete, and an estimate (based on average rate of change) as to when the query might be complete.

That script is below, and you are welcome to it.  I have yet to wrap it in more clever perl to dump out an actual status bar, but I’m happy with it, and hopeful that this is a meaningful way of keeping an eye on the rate of change, and the progress of ALTER statements.

The script:

#!/bin/bash
(while(true); do  \
(mysql -e 'show innodb status \G' | grep undo\ log\ entries ; sleep 1 ;  \
mysql -e 'show innodb status \G' | grep undo\ log\ entries ) |    \
egrep '[0-9][0-9][0-9][0-9]' |awk '{print $10;}' ; done ) | \
perl -e '$t = ROWS_IN_TABLE; while(1) { \
$n ++; $nn; $a = <>; $b = <>; $nn += ($b-$a); \
printf "Rate: %d, avg: %d, %0.3f%% complete, done in %d sec\n", \
$b-$a, $nn/$n, ($b/$t)*100, ($t-$b)/($nn/$n); }';

I hope you find this tool useful.

It took a LONG time for the query we performed to run, and it was very helpful to have this to gauge the time it’d take.   The moving average is not the best for long-term accuracy; restart the script periodically for better views (and wait for ~100+ trips through the loop.)

It was within three minutes of accurate.  It look a little longer than the script indicated (log undo entries was greater than the # of rows, but not by more than a small percentage (less than 0.01%.)  The total query time was 22 hours, 9 minutes, and 33 seconds.  So it was off by only 1 hundredth of a percent.

This is only valid for ALTER TABLE, but appears VERY useful. :)

WOOHOO, acheter viagra!

Thoughts from MySQL Conf, Oracle’s Acquisition of Sun, and Meeting New People

I attended MySQL Conf, and it’s been an neat conference.  The most surprising thing was the news was the purchase of Sun by Oracle.  This sounds to me like a play for a solid vertically integrated market for Oracle.  I can appreciate that — they get Sun’s hardware and OS skills, they bring their own DB services to the table, and they get a lot of community with MySQL.

Lots of people have spoken about their fears that Oracle will suffocate MySQL, but I don’t think they’ll do that.  I think that Oracle realizes how centrally important to the web community that MySQL is.  I do think they’ll try to sell Oracle to high-volume MySQL sites, and that makes sense.  And I think they’ll continue MySQL to bring DB share under the Oracle roof.

MySQL is hugely popular, it wouldn’t make any sense to throw away that good will.

Other things from the conference have been that I’ve run into people I haven’t seen in a while, and I’m learning things.

One of the most interesting talks I went to was Rockyou’s talk about how they built their federated platform for supporting their Facebook applications.    Very neat to see their approach, especially since we’re buidling our own sort of Facebook applications, specifically Bejeweled Blitz.   Good times.

I met the photographer called Julian Cash, a very nice guy located in San Francisco.  He did a light painting portrait of me, which was fun.   We talked about various things for a while, and he talked to me about his project that he’s working on.  More to come later.

Today is the last day.  Things are winding down, and I’m preparing to go home.