Posted by Matt Pulver - Apr 8, 2008
Say you have the following data model

and you want to execute a single query that returns all the data at once within the ActiveRecord tables, with the proper rails associations between them. Wouldn’t it be nice if you could do something like
? Though this is not even valid ruby code, it actually comes very close to what you can do in Ruby on Rails. To get this right, let’s take a closer look at the rails associations within the class definitions:
Let’s try the rails code again, putting an ’s’ after the :c and :e as required by rails in order to denote they are “many”-type associations:
That’s closer, but still not valid ruby code. To fix that, think of the => operator as being right-associative, and instead of putting in parentheses (), put in curly braces {} in order to create nested hashes:
That’s it! Looking in the logs, we see that this only produced a single query, with all the desired SQL joins:
A Load Including Associations (0.001088) SELECT `as`.`id` AS t0_r0, `as`.`b_id` AS t0_r1, `bs`.`id` AS t1_r0, `cs`.`id` AS t2_r0, `cs`.`b_id` AS t2_r1, `cs`.`d_id` AS t2_r2, `ds`.`id` AS t3_r0, `ds`.`c_id` AS t3_r1, `es`.`id` AS t4_r0 FROM `as` LEFT OUTER JOIN `bs` ON `bs`.id = `as`.b_id LEFT OUTER JOIN `cs` ON cs.b_id = bs.id LEFT OUTER JOIN `ds` ON ds.c_id = cs.id LEFT OUTER JOIN `ds_es` ON `ds_es`.d_id = `ds`.id LEFT OUTER JOIN `es` ON `es`.id = `ds_es`.e_id
With this tool in mind, you can use this in any ActiveRecord function that accepts the :include option to reduce the number of times the rails app hits the database, and ultimately speed up your rails application.
Posted by Nate Murray - Apr 4, 2008
Courtenay writes on scaling rails applications at Caboo.se. He says:
Take a look at your logs: are you performing over 10 database calls per request? You need to fix this. Are you performing over 90? You’re a dumba**.
Today viewed the logs of a rails application I am writing. To calculate one particular page I was performing 31,211 SELECT requests and the page took 1m7.091s to generate. Ouch.
After an hour of tweaking, optimizing queries, and piggy-backing some attributes I was able to get down to 9,839 queries and the page rendered in 0m16.958s. While this may be respectable in terms of improvment, but atrocious according to Courtenay’s benchmark. (I think they have a word for systems that take over 9 thousand queries to generate a single page, but I won’t repeat it here.)
Fortunately, caching the entire page makes sense functionally. However, one problem with Rails’ built-in caching is that before the page is cached the first person to hit this page will be forced wait 17 seconds for the page to render (assuming no further optimization). In the case of a high amount of traffic, hundreds of visitors to the site will pile up and many will be dropped. It’s the dreaded cache-gap.
Steve Conover at Pivitoal Labs has a great technique for dealing with this kind of issue that he calls the symlink trick. A variation on Steve’s idea goes like this:
- Symlink index.html to index.html.current.
- When index.html.current is out of date, generate index.html.new
- Have cron check the cache every 2 minutes and move index.html.new over index.html.current
Because *nix mv is atomic there is no gap where the cached page is deleted and then requests are waiting for the page to be regenerated. Below is a diagram of the process.

The great thing is that this caching technique is general and can be applied to any web application, not just Rails.
Recent Comments