X-Combinator

Avatar

making the human scalable

ActiveRecord from_json and from_xml

Unless I’m missing something, the default rails from_json and from_xml methods don’t work with data that have associated objects. Here are routines that work for some simple examples I’ve tested. I placed this code in vendor/rails/activerecord/lib/active_record/base.rb within the ActiveRecord::Base class, but there might be a better place to put it.

Also available on pastie.

Example usage, using the to_xml example:

Note: this only works if the classes of the nested objects are canonically named. If any of your associations use the :class_name option, you will probably need to update this code to discover the proper class names.

I just wrote this a couple hours ago, so please consider this to be an alpha version.

RailsConf08: Passenger or mod_rails RIP

There was no lack of hubris on the stage today as the guys from Phusion talked about their new Apache extension Passenger. If Passenger lives up to its claims it seems that it could quickly become the de-facto standard for deploying Rails (and more) applications.

The 19 22-year-old duo was obviously ecstatic about sharing about what they created but I kept getting the feeling that they were surprised that the crowd didn’t give them a presidential-state-of-the-union-like standing-ovation every few minutes. (If passenger does what they claim maybe they deserve it but respect and appreciation often lose something when they are too eagerly expected.)

So what was it that they claimed? That passenger will not only make Rails deployment dead-simple (think PHP, upload and go) but also crank out better performance while using less memory. It’s a worthy goal and as Kent Beck said in our keynote, “Humility is not a prerequisite to ideas with impact”. I’d like to write up a bit about Passenger and the session they presented at RailsConf. You can download the Keynote slides here [zip] (I’m sure the PDF will appear somewhere soon. If you find it, please feel free to leave a URL in the comments).

Memory Usage and Clustering

Memory Usage

First lets talk about memory usage. When you’re using Mongrel each Mongrel process holds both a full copy of the Rails and application code in memory plus the private memory for the individual process. In this model of N Mongrel processes you have N copies of the application code. With Passenger, each process shares one copy of your Rails/application code. Each process still gets its own chunk of private memory but the shared code greatly reduces the overall memory usage.

The Phusion guys also patched Ruby (and they’ve horribly named it “Ruby Enterprise Edition”). This version has modified garbage collection and causes Ruby to use significantly less memory. They achieve this by doing copy-on-write for memory management. This hasn’t been released so no word yet on how well it works or how stable it is.

Clustering

Another nice feature is that with clustering they use “fair load balancing”. The idea is that you keep track of how many jobs each process in the cluster has and you give the next job to the process with the least amount of work to do.

Competitor Comparison

They compared Passenger’s performance (as an Apache extension) to many competing products (including Nginx and Mongrel) and claimed that it used significantly less memory and was much faster. I won’t repeat all the statistics, you can check out the slides.

mod_rails, RIP

Although they had greatly simplified deployment for Rails they didn’t stop there. Passenger now supports Rack. I see this as probably the coolest thing about Passenger. Now any custom server that you write using Rack can be basically “dropped” into Apache and is effortlessly handled by Passenger. This also means that rails alternatives like Merb and Camping work out of the box. But there is more…

They also added in support for Python’s Django. It was almost comical: when they announced this the crowd nearly boo’ed them. I think it was a bit unfair, but I guess they should have expected it at a Rails conference. Either way, you have to give them props for taking the initiative and pushing the software to its boundaries.

Because Passenger supports frameworks other than Rails they decided to drop the name mod_rails and call it Phusion Passenger. They mentioned that their focus is still going to be developing and perfecting Rails. Passenger will simply not be exclusive to Rails.

Case Studies

Passenger is already being used in production at Dreamhost. It is also being used soocial and ilike.

The Q&A

By the Q&A the crowd was full of skeptics; what they promised seemed too good to be true. However, one gentleman from the crowd sensed this, stood up, and said that he works with a developer/deployer Tom that has been working with production Rails deployment for 4 years. Tom apparently knows the in’s and out’s of Mongrel, Monit, Capistrano, etc. They found out about Passenger two or three weeks ago and he deployed all of their existing applications to Passenger in a single day. His comment was:

Everything I learned [about Rails deployment] over the last few years is moot now, and that is a good thing.Tom the deployer

He said that “[Passenger] is incredibly awesome when it comes to rails deployment.”

A skeptic stood and rightly asked: “Why wasn’t this done five years ago? What was the technological hurdle?” The team answered that they believed the problem was largely social. Developers that had written Rails applications wanted to deploy them as quick as possible. They researched, learned about Capistano, Nginx, and Mongrel etc., and made it work. The Phusion team said that the people that were smart enough to tackle this problem were complacent and choose to deploy applications in this (painful) way.

There is nothing technically preventing [ease of rails deployment]. We’ve shown it’s possible. Why it hasn’t been done is a social or political problem. There is no technical things stopping you. -Hongli Lai (Phusion)

Summary

Time will tell us if Apache/Passenger will live up to the hype and become the new standard. I, for one, am hoping that it really does take hold. If Rails deployment becomes as common and easy as PHP deployment we can spend time solving more interesting problems and that will be a good thing.

RailsConf08: Engine Yard on Rails Deployment Issues

Yesterday I sat in on a session on rails deployment, headed up by the guys from Engine Yard. The idea was to discuss deployment problems but it turned into general deployment tips. If anyone knows about deploying rails it is these guys and they have some fantastic ideas. I took away some interesting things that I’d like to share with you.

Server choice

One of the issues discussed was the choice of what rails server to run your application on.

  • ebb ebb is extremely fast, but probably not production ready today
  • thin thin can be good for requests that are completed quickly. Because thin is event-driven it doesn’t work as well for longer running requests
  • mongrel

Tom pointed out one important fact to consider when choosing between these. He said:

They can all respond to requests faster than your application can generate. There are way more important things to spend time on. Tom Mornini, Engine Yard

The improvements you gain by switching between these is often insignificant when compared to your use of caching, limiting disk i/o in your apps, and controlling your overall application architecture.

Mongrel is, obviously, going to be most people’s first choice, because it’s great for general purpose. But when using mongrel, a common question is “how many mongrel processes should I be running?” Tom said that “you can burn out a modern CPU with 3 mongrels” and there is no reason to run more than 3 mongrel processes per core. Typically if you have more than 3 mongrel processes per CPU core they are generally wasted.

The guys at Engine Yard love nginx. They said they’ve had no problems with it. Tom said that in internal tests against statics files they’ve seen nginx serve 40 megabytes per second of static images and not show up in top.

Misc other tips

Static Files Static files don’t have to be local. They can be shared across the entire cluster with a clustered file system. They use RedHat GFS for static storage and it is convenient because multiple machines can read the same filesystem. “If you can avoid NFS, do… NFS was really, really cool in 1979.”

Static resource domains Browsers limit the number of requests per domain. At Engine Yard they have had success in improving load times by creating domain name aliases that often point to the same physical machine. e.g. images1.domain.com, images2.domain.com, etc. can all point to the exact same machine and exact same IP address but the browser is tricked into loading them concurrently because the domain names are different. They have seen significant improvement in load times by using this technique on pages that need to load a lot of files.

Virtualization They use (the free, open-source version of) Xen and love it. Nearly everything at Engine Yard is virtualized. Because of the way Xen works they said they have very little performance hit when using virtualization. One tip they gave was that it is not always good for each service to be on a separate virtual machine. They said that, by default, every slice (vm) at Enine Yard has nginx, 3 mongrels, and memcachd. They group the services and find that this often works well.

After the session I chatted with the guys. I told them that I spent a few weeks with the free version of Xen and found it very complicated to work with. They said that it took them nine months to perfect their use of Xen. I’m glad to find out that it wasn’t just me. However, it does inspire me to give it a second chance.

mod_rails and passenger When asked about the new mod_rails they said that they are much more interested in rubinius and mod_rubinius. More posts on both rubinius and passenger to come.

RailsConf08: Testing and Contributing to Open-Source

Yesterday afternoon I attended a lecture and workshop on how to contribute to open-source projects and associated testing tools. I want to briefly share with you some of the tools and philosophies.

rcov

rcov. rcov example rcov is a code coverage tool for Ruby. The idea is that you run it on your tests. Lines that get executed are green and “ok” and lines that are never executed are marked red for being dangerous because they are never tested. This is an easy way for you to clearly see which lines of code are never being touched by any of your test cases.

I think that rcov will significantly change the way I approach testing with my own team and consultants. I’ll have to talk to my team, but it seems reasonable to me that every piece of code that is going to be used in production should have at least 50% code coverage. In growing, changing production systems this seems like a very valuable way to keep the code malleable (because you can refactor with less fear of breaking) which then leads to increased confidence and reduced cost-of-ownership.

flog

flog - a ruby code complexity analyzer. Now the first question you may be asking is “what in the world is a complexity analyzer?” The idea is that flog reads your code and assigns it a score based on criteria which is determined by its author, Ryan Davis. Basically it rates on the type of code Ryan Davis likes to see. This isn’t as terrible as it may sound because Ryan has some very tasteful ideas on what readable code looks like.

You may or may not agree with the way Ryan scores things, but as our speaker pointed out, flog and rcov often agree with each other about where a program’s problem areas are. More often than not the high flog score areas are not tested because they are too bulky to be tested. When one writes tests you often have to design the code in smaller, more testable pieces. Readability and testability often go hand in hand.

When running flog it can be helpful to just grep the output for #. This will give a high level view of the methods and their scores. Good candidates for refactoring are methods with scores between 80-150. Higher than 150 are typically serious problem areas (in terms of readability). Make sure you have them very well tested before you try to refactor them too much.

heckle

From the heckle website: “Heckle is a mutation tester. It modifies your code and runs your tests to make sure they fail. The idea is that if code can be changed and your tests don’t notice, either that code isn’t being covered or it doesn’t anything.” The idea is that hackle tests your tests and makes sure you are actually writing tests that are meaningful.

tarantula

tarantula is a “fuzzy spider”. It crawls your web application and submits tons of garbage data and tries raise exceptions in your rails application. The premise of this tool is that no matter what data is posted, your application should not be returning 500 errors. You should handle bad data gracefully and not let unnecessary exceptions bubble to the surface. (I’m not entirely clear on how this gels with RESTful development where one often uses the return status codes to return meaningful information)

summary

It was pointed out that none of these tools replace thought; they are here to enhance it. flog, for instance, is a tool that helps enhance your sense of code-smell. You may run flog, look at a “complex” method and decide that it is perfectly acceptable for your system. Obviously, you as the programmer know better than an automated tool. However, it is often the case that flog will point out passages of code that are challenging and difficult for anyone but the original author to read. The idea is to encourage code that is simple to test, read, and understand. This makes the code more robust and open to change.

Git from the Bottom Up

John Wiegley recently wrote a great article about git called Git from the Bottom Up (pdf). I found it to be very helpful in clarifying how git works and that understanding makes git feel more accessible.

Understanding commits is the key to grokking Git. You’ll know you have reached the Zen plateau of branching wisdom when your mind contains only commit topologies, leaving behind the confusion of branches, tags, local and remote repositories, etc.

Even after reading his pdf it took me two days for this idea to sink in. I finally had my “ah-ha!-moment” after I poked around in the .git/refs folder for a while.

Git Pieces

Git Pieces: Taken from John Wiegley’s article, “Git from the Bottom Up”

I think I resisted learning the git internals for a while because I didn’t want to understand it I just wanted to use it. The problem was I wanted to use it like svn and git requires a mental paradigm shift. After reading John’s article I’ve come to realize once again that there are no shortcuts to progress and often the quickest way to learn is to first take the time to understand.

I’d highly recommend that anyone that still working on “grokking git” should take the 30 minutes required to read John’s article.

Repairing your MySQL database

Today one of our computer’s /usr partition filled up and caused data corruption in the database. After space was freed up on the partition, we (Moises and I) ran some commands to repair the database. Here is what we did:

  1. cd into your directory where your db files are stored. In our case it was in the /usr…/data/mysql/ folder.
  2. We then run the command:
    myisamchk *.MYI | grep -3 –color corrupted
  3. This should give you some output on the current state of your files and indicate which files are corrupted and needing repair. It also gives you the next command to run on the line that reads:
    Fix it using switch “-r” or “-o”
  4. So that is just what we did
    myisamchk -r file.MYI
  5. Voila, your db should be repaired

Fix for “sslv3 alert handshake failure (OpenSSL::SSL::SSLError)”

If you are using the ruby httpclient library (v2.1.2) and getting an SSL error similar to

/path/to/httpclient-2.1.2/lib/httpclient.rb:1039:in `connect': sslv3 alert handshake failure (OpenSSL::SSL::SSLError)

then there are (at least) two possible solutions to this.
[Read more]

Chaining :include’s in Rails to reduce the number of SQL queries

Say you have the following data model

A-B-C-D-E

and you want to execute a single query that returns all the data at once within the ActiveRecord tables, with the proper rails associations between them. Wouldn’t it be nice if you could do something like

? Though this is not even valid ruby code, it actually comes very close to what you can do in Ruby on Rails. To get this right, let’s take a closer look at the rails associations within the class definitions:

Let’s try the rails code again, putting an ’s’ after the :c and :e as required by rails in order to denote they are “many”-type associations:

That’s closer, but still not valid ruby code. To fix that, think of the => operator as being right-associative, and instead of putting in parentheses (), put in curly braces {} in order to create nested hashes:

That’s it! Looking in the logs, we see that this only produced a single query, with all the desired SQL joins:

A Load Including Associations (0.001088) SELECT `as`.`id` AS t0_r0, `as`.`b_id` AS t0_r1, `bs`.`id` AS t1_r0, `cs`.`id` AS t2_r0, `cs`.`b_id` AS t2_r1, `cs`.`d_id` AS t2_r2, `ds`.`id` AS t3_r0, `ds`.`c_id` AS t3_r1, `es`.`id` AS t4_r0 FROM `as` LEFT OUTER JOIN `bs` ON `bs`.id = `as`.b_id LEFT OUTER JOIN `cs` ON cs.b_id = bs.id LEFT OUTER JOIN `ds` ON ds.c_id = cs.id LEFT OUTER JOIN `ds_es` ON `ds_es`.d_id = `ds`.id LEFT OUTER JOIN `es` ON `es`.id = `ds_es`.e_id

With this tool in mind, you can use this in any ActiveRecord function that accepts the :include option to reduce the number of times the rails app hits the database, and ultimately speed up your rails application.

MySQL Views Introduction

MySQL 5.0 supports a great new feature called views. Here’s a quick summary of what views are.

Views are also sometimes known as virtual tables, because they’re defined in terms of (other) tables through the use of queries…
However, a view is not merely a convenient container for a subset of records from a table. For one thing, a view is a “live” or dynamic snapshot of table data; when the data in the underlying table changes, so does that in the view.
-Devshed

As you can imagine, views can be very handy when doing database analysis.

Read more:

Simple Rails Log Query Analyzer

Intro

This is a simple rails log analyzer. The initial goal is to
identify SQL queries that could be optimized or find queries that are being
called more often than they should be.

In its current state it is little more than simple regexes. My goal, however is
to build more useful tools on top of it. I’ve added it to a git repository to
allow anyone to make edits and improvements.

It’s not really ready for prime-time (for instance, it doesn’t work well
with log colorization), but I have found it useful so I thought I would share
it. If nothing else it could save someone else 10 minutes to write
a similar script.

What it does

Right now all this does is read a log file and extract SQL “SELECT” statements
and organize them by the amount of time they take. It will also show a summary
of the number of queries called to load each model.

How to get it


git clone \
git://gitorious.org/simple_rails_log_analyzer/mainline.git \
simple_rails_log_analyzer

How to use it

The typical work flow looks like this:

rake log:clear
# hit the URL of the page we want to profile, run the rake task, etc
ruby bin/query_log_analyzer.rb log/development.log

Or visit http://gitorious.org/projects/simple_rails_log_analyzer

TODO

  • Deal with the colorization of the logs
  • Update the code to not be so SELECT centric. Support other types of queries
  • Be smarter about each request’s individual queries

Contributions

I’ve placed this in a git repository, so feel free to submit any patches and
I’ll integrate them into this master.

Similar Projects

Introduction to Array and Hash Methods

One significant programming paradigm I first learned with perl, and have carried over to other programming languages such as ruby, is to think of arrays and hashes as fundamental data types, accompanied by their associated functions/methods. Though not as great of a leap as the transition from procedural to object-oriented programming; to be able to work and think fluidly, and write succinct code using arrays and hashes is an important programming skill to have. Once one develops this habit, to work in a language in which these methods are lacking feels similar to speaking in a foreign language that lacks words for the ideas you want to express. In programming, properly using these methods effectively translates into less lines of code and/or greater abstraction.
[Read more]

ruby inject and category breadcrumbs

In addition to mapping out the Mandelbrot set, ruby’s inject method can also be used to easily find and/or create nested categories given a breadcrumb path.

Assuming Category is a Ruby on Rails ActiveRecord class that acts_as_tree, and breadcrumbs is an array of breadcrumb strings, e.g. breadcrumbs = [’Widgets’, ‘Green Widgets’, ‘14V Widgets’] :

The final “id && Category.find(id)” is to have the function return nil in case breadcrumbs is empty.

This returns the final Category object if it exists, and creates any or all components of the path as needed.

zero-width negative look-ahead assertion

Currently I need to do some bulk transformation of some file names.
I have, say, 3 file names: 2.640-849.0.jpg, g2650ohr.jpg, and k2.26mr.jpg (so these are treated as three separate strings)
I want a (hopefully) single regex to substitute all of the . (periods) with - (dashes) except for the .jpg extension.

A first draft would be

s/\./-/g

but that gives you 2-640-849-0-jpg. Note the extension is transformed as well.

I could then do a second regex like s/-jpg/.jpg/ but that feels less than optimal.

The solution is a zero-width negative look-ahead assertion
man perlre for more information on this, but the basic idea is that
you match occurances of the preceeding expression that ate not followed by the zwnla.
For example /foo(?!bar)/ matches any occurrence of foo that isn’t followed by bar.

Combine this with the common perl utility rename and we get the following:

rename -n 's/\.(?!jpg)/-/g' *

Note that Matz has decided not to include the zero-width negative look-ahead assertion to Ruby.

UPDATE
Thanks goes out to Matt Pulver for submitting these improvements to the above regex:

This will handles cases like test.jpg.jpg

s/\.+(?!jpg$)/-/g

If you want to substitute all non-alpha-numeric characters and support other file extensions try:

s/[^0-9a-z](?!\w+$)/-/g

ruby inject and the Mandelbrot set

Ever since I came across ruby’s Enumerable inject function, I have been curious about its applications, and how to best think of it in its most general mathematical form.
[Read more]

creating circular drop-shadows with processing

Here is an example of a simple way to do circular drop-shadows in Processing.


screenshot of drop-shadow maker

You can view the interactive demo here.

Read more for the source code.
[Read more]

Generate Rails Pages from a Helper

In Rails, you sometimes need to to break the MVC pattern. In my case, I wanted to take a fully-generated page and extract a certain portion of it in a helper. In this way I was able to base a foreign-channel’s template on my application’s template.
The key to this black magic is ActionController::Integration:: Session.new. Note that in order to use this class you will need to put the following line in config/environment.rb.

Then from your helper function you can GET a page given a url. In the example below I want to extract the portion of the page between begin #{token} and end #{token}

if statements vs. classification

A very senior Microsoft developer who moved to Google told me that Google works and thinks at a higher level of abstraction than Microsoft. “Google uses Bayesian filtering the way Microsoft uses the if statement,” he said. Reg quoting Joel likening the interview process to bayesian filtering.

assert_xpath

Hey guys, here’s a handy tip for testing XML: assert_xpath. I’m not sure where I found this.
Posted below or at: http://pastie.caboo.se/102945

FizzBuzz

Keep in mind, FizzBuzz is basically an idiot detector. Anybody who bothers to code FizzBuzz is saying, ooh, an idiot detector. I’d better aim this at my own head and see what happens.
- Giles Bowkett

,