the symlink trick

Courtenay writes on scaling rails applications at Caboo.se. He says:

Take a look at your logs: are you performing over 10 database calls per request? You need to fix this. Are you performing over 90? You’re a dumba**.

Today viewed the logs of a rails application I am writing. To calculate one particular page I was performing 31,211 SELECT requests and the page took 1m7.091s to generate. Ouch.

After an hour of tweaking, optimizing queries, and piggy-backing some attributes I was able to get down to 9,839 queries and the page rendered in 0m16.958s. While this may be respectable in terms of improvment, but atrocious according to Courtenay’s benchmark. (I think they have a word for systems that take over 9 thousand queries to generate a single page, but I won’t repeat it here.)

Fortunately, caching the entire page makes sense functionally. However, one problem with Rails’ built-in caching is that before the page is cached the first person to hit this page will be forced wait 17 seconds for the page to render (assuming no further optimization). In the case of a high amount of traffic, hundreds of visitors to the site will pile up and many will be dropped. It’s the dreaded cache-gap.

Steve Conover at Pivitoal Labs has a great technique for dealing with this kind of issue that he calls the symlink trick. A variation on Steve’s idea goes like this:

  • Symlink index.html to index.html.current.
  • When index.html.current is out of date, generate index.html.new
  • Have cron check the cache every 2 minutes and move index.html.new over index.html.current

Because *nix mv is atomic there is no gap where the cached page is deleted and then requests are waiting for the page to be regenerated. Below is a diagram of the process.

symlink_trick

The great thing is that this caching technique is general and can be applied to any web application, not just Rails.

Share:
  • del.icio.us
  • Reddit
  • Technorati
  • Twitter
  • Facebook
  • Google Bookmarks
  • HackerNews
  • PDF
  • RSS
This entry was posted in tips. Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.
  • Matt Pulver

    Could this be simplified by the following steps?

    1) index.html is a regular file.
    2) Generate index.html.new.
    3) mv index.html.new index.html.

    This should still avoid any “cache hole” due to the atomicity of mv. Someone also suggesting this on Conover’s blog.

  • Nate Murray

    @Matt: I guess it could be. Then it could be called “the mv trick”. Which quickly starts to look like less of a trick and more like “common-sense”.