X-Combinator

Avatar

making the human scalable

String interpolation is faster than printf in Ruby

Just wanted to make a note, string interpolation is faster than printf in Ruby. Example:

$ irb
>> require 'benchmark'
=> true
>> Benchmark.measure { 100000.times {
  "%s" % ["hello world"] } }.total
=> 0.21
>> Benchmark.measure { 100000.times {
  "#{'hello world'}"} }.total
=> 0.04

RailsConf08: Passenger or mod_rails RIP

There was no lack of hubris on the stage today as the guys from Phusion talked about their new Apache extension Passenger. If Passenger lives up to its claims it seems that it could quickly become the de-facto standard for deploying Rails (and more) applications.

The 19 22-year-old duo was obviously ecstatic about sharing about what they created but I kept getting the feeling that they were surprised that the crowd didn’t give them a presidential-state-of-the-union-like standing-ovation every few minutes. (If passenger does what they claim maybe they deserve it but respect and appreciation often lose something when they are too eagerly expected.)

So what was it that they claimed? That passenger will not only make Rails deployment dead-simple (think PHP, upload and go) but also crank out better performance while using less memory. It’s a worthy goal and as Kent Beck said in our keynote, “Humility is not a prerequisite to ideas with impact”. I’d like to write up a bit about Passenger and the session they presented at RailsConf. You can download the Keynote slides here [zip] (I’m sure the PDF will appear somewhere soon. If you find it, please feel free to leave a URL in the comments).

Memory Usage and Clustering

Memory Usage

First lets talk about memory usage. When you’re using Mongrel each Mongrel process holds both a full copy of the Rails and application code in memory plus the private memory for the individual process. In this model of N Mongrel processes you have N copies of the application code. With Passenger, each process shares one copy of your Rails/application code. Each process still gets its own chunk of private memory but the shared code greatly reduces the overall memory usage.

The Phusion guys also patched Ruby (and they’ve horribly named it “Ruby Enterprise Edition”). This version has modified garbage collection and causes Ruby to use significantly less memory. They achieve this by doing copy-on-write for memory management. This hasn’t been released so no word yet on how well it works or how stable it is.

Clustering

Another nice feature is that with clustering they use “fair load balancing”. The idea is that you keep track of how many jobs each process in the cluster has and you give the next job to the process with the least amount of work to do.

Competitor Comparison

They compared Passenger’s performance (as an Apache extension) to many competing products (including Nginx and Mongrel) and claimed that it used significantly less memory and was much faster. I won’t repeat all the statistics, you can check out the slides.

mod_rails, RIP

Although they had greatly simplified deployment for Rails they didn’t stop there. Passenger now supports Rack. I see this as probably the coolest thing about Passenger. Now any custom server that you write using Rack can be basically “dropped” into Apache and is effortlessly handled by Passenger. This also means that rails alternatives like Merb and Camping work out of the box. But there is more…

They also added in support for Python’s Django. It was almost comical: when they announced this the crowd nearly boo’ed them. I think it was a bit unfair, but I guess they should have expected it at a Rails conference. Either way, you have to give them props for taking the initiative and pushing the software to its boundaries.

Because Passenger supports frameworks other than Rails they decided to drop the name mod_rails and call it Phusion Passenger. They mentioned that their focus is still going to be developing and perfecting Rails. Passenger will simply not be exclusive to Rails.

Case Studies

Passenger is already being used in production at Dreamhost. It is also being used soocial and ilike.

The Q&A

By the Q&A the crowd was full of skeptics; what they promised seemed too good to be true. However, one gentleman from the crowd sensed this, stood up, and said that he works with a developer/deployer Tom that has been working with production Rails deployment for 4 years. Tom apparently knows the in’s and out’s of Mongrel, Monit, Capistrano, etc. They found out about Passenger two or three weeks ago and he deployed all of their existing applications to Passenger in a single day. His comment was:

Everything I learned [about Rails deployment] over the last few years is moot now, and that is a good thing.Tom the deployer

He said that “[Passenger] is incredibly awesome when it comes to rails deployment.”

A skeptic stood and rightly asked: “Why wasn’t this done five years ago? What was the technological hurdle?” The team answered that they believed the problem was largely social. Developers that had written Rails applications wanted to deploy them as quick as possible. They researched, learned about Capistano, Nginx, and Mongrel etc., and made it work. The Phusion team said that the people that were smart enough to tackle this problem were complacent and choose to deploy applications in this (painful) way.

There is nothing technically preventing [ease of rails deployment]. We’ve shown it’s possible. Why it hasn’t been done is a social or political problem. There is no technical things stopping you. -Hongli Lai (Phusion)

Summary

Time will tell us if Apache/Passenger will live up to the hype and become the new standard. I, for one, am hoping that it really does take hold. If Rails deployment becomes as common and easy as PHP deployment we can spend time solving more interesting problems and that will be a good thing.

RailsConf08: Engine Yard on Rails Deployment Issues

Yesterday I sat in on a session on rails deployment, headed up by the guys from Engine Yard. The idea was to discuss deployment problems but it turned into general deployment tips. If anyone knows about deploying rails it is these guys and they have some fantastic ideas. I took away some interesting things that I’d like to share with you.

Server choice

One of the issues discussed was the choice of what rails server to run your application on.

  • ebb ebb is extremely fast, but probably not production ready today
  • thin thin can be good for requests that are completed quickly. Because thin is event-driven it doesn’t work as well for longer running requests
  • mongrel

Tom pointed out one important fact to consider when choosing between these. He said:

They can all respond to requests faster than your application can generate. There are way more important things to spend time on. Tom Mornini, Engine Yard

The improvements you gain by switching between these is often insignificant when compared to your use of caching, limiting disk i/o in your apps, and controlling your overall application architecture.

Mongrel is, obviously, going to be most people’s first choice, because it’s great for general purpose. But when using mongrel, a common question is “how many mongrel processes should I be running?” Tom said that “you can burn out a modern CPU with 3 mongrels” and there is no reason to run more than 3 mongrel processes per core. Typically if you have more than 3 mongrel processes per CPU core they are generally wasted.

The guys at Engine Yard love nginx. They said they’ve had no problems with it. Tom said that in internal tests against statics files they’ve seen nginx serve 40 megabytes per second of static images and not show up in top.

Misc other tips

Static Files Static files don’t have to be local. They can be shared across the entire cluster with a clustered file system. They use RedHat GFS for static storage and it is convenient because multiple machines can read the same filesystem. “If you can avoid NFS, do… NFS was really, really cool in 1979.”

Static resource domains Browsers limit the number of requests per domain. At Engine Yard they have had success in improving load times by creating domain name aliases that often point to the same physical machine. e.g. images1.domain.com, images2.domain.com, etc. can all point to the exact same machine and exact same IP address but the browser is tricked into loading them concurrently because the domain names are different. They have seen significant improvement in load times by using this technique on pages that need to load a lot of files.

Virtualization They use (the free, open-source version of) Xen and love it. Nearly everything at Engine Yard is virtualized. Because of the way Xen works they said they have very little performance hit when using virtualization. One tip they gave was that it is not always good for each service to be on a separate virtual machine. They said that, by default, every slice (vm) at Enine Yard has nginx, 3 mongrels, and memcachd. They group the services and find that this often works well.

After the session I chatted with the guys. I told them that I spent a few weeks with the free version of Xen and found it very complicated to work with. They said that it took them nine months to perfect their use of Xen. I’m glad to find out that it wasn’t just me. However, it does inspire me to give it a second chance.

mod_rails and passenger When asked about the new mod_rails they said that they are much more interested in rubinius and mod_rubinius. More posts on both rubinius and passenger to come.

RailsConf08: Testing and Contributing to Open-Source

Yesterday afternoon I attended a lecture and workshop on how to contribute to open-source projects and associated testing tools. I want to briefly share with you some of the tools and philosophies.

rcov

rcov. rcov example rcov is a code coverage tool for Ruby. The idea is that you run it on your tests. Lines that get executed are green and “ok” and lines that are never executed are marked red for being dangerous because they are never tested. This is an easy way for you to clearly see which lines of code are never being touched by any of your test cases.

I think that rcov will significantly change the way I approach testing with my own team and consultants. I’ll have to talk to my team, but it seems reasonable to me that every piece of code that is going to be used in production should have at least 50% code coverage. In growing, changing production systems this seems like a very valuable way to keep the code malleable (because you can refactor with less fear of breaking) which then leads to increased confidence and reduced cost-of-ownership.

flog

flog - a ruby code complexity analyzer. Now the first question you may be asking is “what in the world is a complexity analyzer?” The idea is that flog reads your code and assigns it a score based on criteria which is determined by its author, Ryan Davis. Basically it rates on the type of code Ryan Davis likes to see. This isn’t as terrible as it may sound because Ryan has some very tasteful ideas on what readable code looks like.

You may or may not agree with the way Ryan scores things, but as our speaker pointed out, flog and rcov often agree with each other about where a program’s problem areas are. More often than not the high flog score areas are not tested because they are too bulky to be tested. When one writes tests you often have to design the code in smaller, more testable pieces. Readability and testability often go hand in hand.

When running flog it can be helpful to just grep the output for #. This will give a high level view of the methods and their scores. Good candidates for refactoring are methods with scores between 80-150. Higher than 150 are typically serious problem areas (in terms of readability). Make sure you have them very well tested before you try to refactor them too much.

heckle

From the heckle website: “Heckle is a mutation tester. It modifies your code and runs your tests to make sure they fail. The idea is that if code can be changed and your tests don’t notice, either that code isn’t being covered or it doesn’t anything.” The idea is that hackle tests your tests and makes sure you are actually writing tests that are meaningful.

tarantula

tarantula is a “fuzzy spider”. It crawls your web application and submits tons of garbage data and tries raise exceptions in your rails application. The premise of this tool is that no matter what data is posted, your application should not be returning 500 errors. You should handle bad data gracefully and not let unnecessary exceptions bubble to the surface. (I’m not entirely clear on how this gels with RESTful development where one often uses the return status codes to return meaningful information)

summary

It was pointed out that none of these tools replace thought; they are here to enhance it. flog, for instance, is a tool that helps enhance your sense of code-smell. You may run flog, look at a “complex” method and decide that it is perfectly acceptable for your system. Obviously, you as the programmer know better than an automated tool. However, it is often the case that flog will point out passages of code that are challenging and difficult for anyone but the original author to read. The idea is to encourage code that is simple to test, read, and understand. This makes the code more robust and open to change.

RailsConf08: Meta-programming Ruby for Fun and Profit

I’m currently at RailsConf and it is fantastic. I’ve met a good number of interesting people and attended some interesting sessions.

Ruby Internals

This morning Neal Ford and Patrick Farley gave a great session on Meta-programming Ruby for Fun and Profit. The slides should eventually show up
here.

Particularly interesting was Patrick’s portion about the internals of Ruby’s method dispatch and how it relates to inheritance, mixins, and the object’s eigenclass. The basic idea is that when you call a class-method in Ruby the eigenclass (or metaclass) has a parallel inheritance structure to the actual class. I may write more about this with diagrams after Patrick posts the slides, but the the understanding of the system leads to this koan-like ruby truth:

The super-class of the meta-class is the meta-class of the super-class

Don’t worry if it isn’t clear right away. Reid and I
attended the session and then discussed it over launch and it still took a while to sink in. In the meantime, checkout _why’s article Seeing Metaclasses Clearly.

Use class_eval instead of reopening a class

Patrick gave another very handy tip when dealing with meta-programming. He mentioned that using class_eval is much safer than re-opening a class and defining a method. This is because you don’t always know when files are loaded and when you open a class you may be defining it without realizing it. When you use class_eval you are using an existing constant. For example:

A Recorder

Neal shared with us a very interesting class that records the messages sent to it`and can play them back later on. Here’s the code for it:

I haven’t fully processed the power of this idea yet. However, I feel like it could have some application in genetic programming.

Tabula Rasa, the Recorder, and DSLs

Neal pointed out that one of the issues with Recorder is the following:

The issue is that the #freeze method went to the Recorder and not to the string. This is a problem you are likely to run into with a class like this because a standard ruby Object contains about 40 other methods; methods who’s names may conflict with what you want to delegate or capture with #method_missing (like the recorder). Thankfully Jim Weirich has created a BlankSlate class that you can use that will undefine all of the existing methods. If you have Recorder inherit from BlankSlate it will then work as expected. Neal mentioned that this class is being integrated into ruby 1.9 as SimpleObject.

When the slides become available checkout the section on the Quantifier module. The code is a bit lengthy to reproduce here but worth a look.

Simple Rails Log Query Analyzer

Intro

This is a simple rails log analyzer. The initial goal is to
identify SQL queries that could be optimized or find queries that are being
called more often than they should be.

In its current state it is little more than simple regexes. My goal, however is
to build more useful tools on top of it. I’ve added it to a git repository to
allow anyone to make edits and improvements.

It’s not really ready for prime-time (for instance, it doesn’t work well
with log colorization), but I have found it useful so I thought I would share
it. If nothing else it could save someone else 10 minutes to write
a similar script.

What it does

Right now all this does is read a log file and extract SQL “SELECT” statements
and organize them by the amount of time they take. It will also show a summary
of the number of queries called to load each model.

How to get it


git clone \
git://gitorious.org/simple_rails_log_analyzer/mainline.git \
simple_rails_log_analyzer

How to use it

The typical work flow looks like this:

rake log:clear
# hit the URL of the page we want to profile, run the rake task, etc
ruby bin/query_log_analyzer.rb log/development.log

Or visit http://gitorious.org/projects/simple_rails_log_analyzer

TODO

  • Deal with the colorization of the logs
  • Update the code to not be so SELECT centric. Support other types of queries
  • Be smarter about each request’s individual queries

Contributions

I’ve placed this in a git repository, so feel free to submit any patches and
I’ll integrate them into this master.

Similar Projects

,