X-Combinator

Avatar

making the human scalable

Git from the Bottom Up

John Wiegley recently wrote a great article about git called Git from the Bottom Up (pdf). I found it to be very helpful in clarifying how git works and that understanding makes git feel more accessible.

Understanding commits is the key to grokking Git. You’ll know you have reached the Zen plateau of branching wisdom when your mind contains only commit topologies, leaving behind the confusion of branches, tags, local and remote repositories, etc.

Even after reading his pdf it took me two days for this idea to sink in. I finally had my “ah-ha!-moment” after I poked around in the .git/refs folder for a while.

Git Pieces

Git Pieces: Taken from John Wiegley’s article, “Git from the Bottom Up”

I think I resisted learning the git internals for a while because I didn’t want to understand it I just wanted to use it. The problem was I wanted to use it like svn and git requires a mental paradigm shift. After reading John’s article I’ve come to realize once again that there are no shortcuts to progress and often the quickest way to learn is to first take the time to understand.

I’d highly recommend that anyone that still working on “grokking git” should take the 30 minutes required to read John’s article.

Repairing your MySQL database

Today one of our computer’s /usr partition filled up and caused data corruption in the database. After space was freed up on the partition, we (Moises and I) ran some commands to repair the database. Here is what we did:

  1. cd into your directory where your db files are stored. In our case it was in the /usr…/data/mysql/ folder.
  2. We then run the command:
    myisamchk *.MYI | grep -3 –color corrupted
  3. This should give you some output on the current state of your files and indicate which files are corrupted and needing repair. It also gives you the next command to run on the line that reads:
    Fix it using switch “-r” or “-o”
  4. So that is just what we did
    myisamchk -r file.MYI
  5. Voila, your db should be repaired

One of my Favorite IE errors

What are you supposed to do if you don’t want to load the page?

IE Content Loading error

Fix for “sslv3 alert handshake failure (OpenSSL::SSL::SSLError)”

If you are using the ruby httpclient library (v2.1.2) and getting an SSL error similar to

/path/to/httpclient-2.1.2/lib/httpclient.rb:1039:in `connect': sslv3 alert handshake failure (OpenSSL::SSL::SSLError)

then there are (at least) two possible solutions to this.
[Read more]

Chaining :include’s in Rails to reduce the number of SQL queries

Say you have the following data model

A-B-C-D-E

and you want to execute a single query that returns all the data at once within the ActiveRecord tables, with the proper rails associations between them. Wouldn’t it be nice if you could do something like

? Though this is not even valid ruby code, it actually comes very close to what you can do in Ruby on Rails. To get this right, let’s take a closer look at the rails associations within the class definitions:

Let’s try the rails code again, putting an ’s’ after the :c and :e as required by rails in order to denote they are “many”-type associations:

That’s closer, but still not valid ruby code. To fix that, think of the => operator as being right-associative, and instead of putting in parentheses (), put in curly braces {} in order to create nested hashes:

That’s it! Looking in the logs, we see that this only produced a single query, with all the desired SQL joins:

A Load Including Associations (0.001088) SELECT `as`.`id` AS t0_r0, `as`.`b_id` AS t0_r1, `bs`.`id` AS t1_r0, `cs`.`id` AS t2_r0, `cs`.`b_id` AS t2_r1, `cs`.`d_id` AS t2_r2, `ds`.`id` AS t3_r0, `ds`.`c_id` AS t3_r1, `es`.`id` AS t4_r0 FROM `as` LEFT OUTER JOIN `bs` ON `bs`.id = `as`.b_id LEFT OUTER JOIN `cs` ON cs.b_id = bs.id LEFT OUTER JOIN `ds` ON ds.c_id = cs.id LEFT OUTER JOIN `ds_es` ON `ds_es`.d_id = `ds`.id LEFT OUTER JOIN `es` ON `es`.id = `ds_es`.e_id

With this tool in mind, you can use this in any ActiveRecord function that accepts the :include option to reduce the number of times the rails app hits the database, and ultimately speed up your rails application.

the symlink trick

Courtenay writes on scaling rails applications at Caboo.se. He says:

Take a look at your logs: are you performing over 10 database calls per request? You need to fix this. Are you performing over 90? You’re a dumba**.

Today viewed the logs of a rails application I am writing. To calculate one particular page I was performing 31,211 SELECT requests and the page took 1m7.091s to generate. Ouch.

After an hour of tweaking, optimizing queries, and piggy-backing some attributes I was able to get down to 9,839 queries and the page rendered in 0m16.958s. While this may be respectable in terms of improvment, but atrocious according to Courtenay’s benchmark. (I think they have a word for systems that take over 9 thousand queries to generate a single page, but I won’t repeat it here.)

Fortunately, caching the entire page makes sense functionally. However, one problem with Rails’ built-in caching is that before the page is cached the first person to hit this page will be forced wait 17 seconds for the page to render (assuming no further optimization). In the case of a high amount of traffic, hundreds of visitors to the site will pile up and many will be dropped. It’s the dreaded cache-gap.

Steve Conover at Pivitoal Labs has a great technique for dealing with this kind of issue that he calls the symlink trick. A variation on Steve’s idea goes like this:

  • Symlink index.html to index.html.current.
  • When index.html.current is out of date, generate index.html.new
  • Have cron check the cache every 2 minutes and move index.html.new over index.html.current

Because *nix mv is atomic there is no gap where the cached page is deleted and then requests are waiting for the page to be regenerated. Below is a diagram of the process.

symlink_trick

The great thing is that this caching technique is general and can be applied to any web application, not just Rails.

MySQL Views Introduction

MySQL 5.0 supports a great new feature called views. Here’s a quick summary of what views are.

Views are also sometimes known as virtual tables, because they’re defined in terms of (other) tables through the use of queries…
However, a view is not merely a convenient container for a subset of records from a table. For one thing, a view is a “live” or dynamic snapshot of table data; when the data in the underlying table changes, so does that in the view.
-Devshed

As you can imagine, views can be very handy when doing database analysis.

Read more:

Backspace in Screen

I finally got sick of hitting Ctrl-H to backspace while in a screen session today, so I found a way around it:

Try editing

~/.bashrc

And adding:

alias screen='TERM=screen screen'

Not sure if this has adverse effects and there might be a better way to do it, but hey, it works!

Simple Rails Log Query Analyzer

Intro

This is a simple rails log analyzer. The initial goal is to
identify SQL queries that could be optimized or find queries that are being
called more often than they should be.

In its current state it is little more than simple regexes. My goal, however is
to build more useful tools on top of it. I’ve added it to a git repository to
allow anyone to make edits and improvements.

It’s not really ready for prime-time (for instance, it doesn’t work well
with log colorization), but I have found it useful so I thought I would share
it. If nothing else it could save someone else 10 minutes to write
a similar script.

What it does

Right now all this does is read a log file and extract SQL “SELECT” statements
and organize them by the amount of time they take. It will also show a summary
of the number of queries called to load each model.

How to get it


git clone \
git://gitorious.org/simple_rails_log_analyzer/mainline.git \
simple_rails_log_analyzer

How to use it

The typical work flow looks like this:

rake log:clear
# hit the URL of the page we want to profile, run the rake task, etc
ruby bin/query_log_analyzer.rb log/development.log

Or visit http://gitorious.org/projects/simple_rails_log_analyzer

TODO

  • Deal with the colorization of the logs
  • Update the code to not be so SELECT centric. Support other types of queries
  • Be smarter about each request’s individual queries

Contributions

I’ve placed this in a git repository, so feel free to submit any patches and
I’ll integrate them into this master.

Similar Projects

another reason to love vim

In vim 7.0 try the following:

:tabnew http://www.google.com
:e # to reload

Very handy to quickly view source, or edit any remote HTML/CSS, etc.

Introduction to Array and Hash Methods

One significant programming paradigm I first learned with perl, and have carried over to other programming languages such as ruby, is to think of arrays and hashes as fundamental data types, accompanied by their associated functions/methods. Though not as great of a leap as the transition from procedural to object-oriented programming; to be able to work and think fluidly, and write succinct code using arrays and hashes is an important programming skill to have. Once one develops this habit, to work in a language in which these methods are lacking feels similar to speaking in a foreign language that lacks words for the ideas you want to express. In programming, properly using these methods effectively translates into less lines of code and/or greater abstraction.
[Read more]

Bookmarks for March 13th

ruby inject and category breadcrumbs

In addition to mapping out the Mandelbrot set, ruby’s inject method can also be used to easily find and/or create nested categories given a breadcrumb path.

Assuming Category is a Ruby on Rails ActiveRecord class that acts_as_tree, and breadcrumbs is an array of breadcrumb strings, e.g. breadcrumbs = [’Widgets’, ‘Green Widgets’, ‘14V Widgets’] :

The final “id && Category.find(id)” is to have the function return nil in case breadcrumbs is empty.

This returns the final Category object if it exists, and creates any or all components of the path as needed.

zero-width negative look-ahead assertion

Currently I need to do some bulk transformation of some file names.
I have, say, 3 file names: 2.640-849.0.jpg, g2650ohr.jpg, and k2.26mr.jpg (so these are treated as three separate strings)
I want a (hopefully) single regex to substitute all of the . (periods) with - (dashes) except for the .jpg extension.

A first draft would be

s/\./-/g

but that gives you 2-640-849-0-jpg. Note the extension is transformed as well.

I could then do a second regex like s/-jpg/.jpg/ but that feels less than optimal.

The solution is a zero-width negative look-ahead assertion
man perlre for more information on this, but the basic idea is that
you match occurances of the preceeding expression that ate not followed by the zwnla.
For example /foo(?!bar)/ matches any occurrence of foo that isn’t followed by bar.

Combine this with the common perl utility rename and we get the following:

rename -n 's/\.(?!jpg)/-/g' *

Note that Matz has decided not to include the zero-width negative look-ahead assertion to Ruby.

UPDATE
Thanks goes out to Matt Pulver for submitting these improvements to the above regex:

This will handles cases like test.jpg.jpg

s/\.+(?!jpg$)/-/g

If you want to substitute all non-alpha-numeric characters and support other file extensions try:

s/[^0-9a-z](?!\w+$)/-/g

use env to test scripts for cron and monit

I’ve used /usr/bin/env for years but I just realized today that it can be very useful for testing processes that are run with cron or monit.

env has a nice option for clearing out all of the environment variables:

-i, --ignore-environment - start with an empty environment

Now you can try to run your script in an environment very similar to cron or monit. Note that this option even clears the PATH so you need to specify full paths or specify a PATH variable.

/usr/bin/env -i HOME=/path/to/my/home /path/to/do_script.sh

increment the number in a file

Lets say you have a file called REVISION which contains a single number. If you want to increment the number in that file you could run the following command:

Wrap that up as a nice shell script and you get a nice increment command:


[nathan@nate ~]$ cat REVISION
1
[nathan@nate ~]$ increment REVISION
[nathan@nate ~]$ cat REVISION
2

ruby inject and the Mandelbrot set

Ever since I came across ruby’s Enumerable inject function, I have been curious about its applications, and how to best think of it in its most general mathematical form.
[Read more]

Bookmarks for February 20th

creating circular drop-shadows with processing

Here is an example of a simple way to do circular drop-shadows in Processing.


screenshot of drop-shadow maker

You can view the interactive demo here.

Read more for the source code.
[Read more]

djb daemontools with Ubuntu’s upstart

In case anyone wants to get djb’s daemontools up and running quickly on an OS that uses Ubuntu’s System V init-replacement upstart, here’s what works for me:

Create a 3-line file /etc/event.d/svscanboot :

That should start svscanboot with each system bootup. To have init start svscanboot without rebooting, do

as root after creating the above file.

Generate Rails Pages from a Helper

In Rails, you sometimes need to to break the MVC pattern. In my case, I wanted to take a fully-generated page and extract a certain portion of it in a helper. In this way I was able to base a foreign-channel’s template on my application’s template.
The key to this black magic is ActionController::Integration:: Session.new. Note that in order to use this class you will need to put the following line in config/environment.rb.

Then from your helper function you can GET a page given a url. In the example below I want to extract the portion of the page between begin #{token} and end #{token}

if statements vs. classification

A very senior Microsoft developer who moved to Google told me that Google works and thinks at a higher level of abstraction than Microsoft. “Google uses Bayesian filtering the way Microsoft uses the if statement,” he said. Reg quoting Joel likening the interview process to bayesian filtering.

Paul Graham - Holding a Program in One’s Head

Paul Graham has a nice, short, essay on programmers holding a whole project in their head. Its an interesting read and I agree with many of his insights into the programmer’s mind. He also lists helpful tips which, indirectly, give suggestions for creating a productive programming environment.
http://www.paulgraham.com/head.html

edit css live in ie

Hey guys, here is a new program I have not tried (hence the “claim”). It is supposed to allow you to edit css live in IE on windows:
CSSVista
Edit your CSS code live in Internet Explorer and Firefox

assert_xpath

Hey guys, here’s a handy tip for testing XML: assert_xpath. I’m not sure where I found this.
Posted below or at: http://pastie.caboo.se/102945

better-than-mutt email replacement?

I stumbled upon a new email client written in ruby that aims to be the “email client of choice for nerds everywhere.” From the project web-page:
http://sup.rubyforge.org/ Sup is a console-based email client for people with a lot of email. It supports tagging, very fast full-text search, automatic contact- list management, and more. If you’re the type of person who treats your email client as an extension of your long-term memory, Sup is for you.
Sup makes it easy to:
Handle massive amounts of email. Mix email from different sources: mbox files (even across different machines), IMAP folders, POP accounts, and Gmail accounts. Instantaneously search over your entire email collection. Search over body text, or use a query language to combine search predicates in any way. Handle multiple accounts. Replying to email sent to a particular account will use the correct SMTP server, signature, and from address. Add custom code to handle certain types of messages or to handle certain types of text within messages. Organize email with user-defined labels, automatically track recent contacts, and much more! The goal of Sup is to become the email client of choice for nerds everywhere.

“who wrote this code?”

I just learned a really great svn feature. Its called “svn blame”. You run it on a file and it shows you who is responsible for writing each line. For instance:

Notice that it shows the revision number on the left along with who is responsible for each piece of code.

ruby-prof and KCachegrind

Some of our code has been slow for a while. Now that we are importing more sites its come to a point where profiling and optimizing is a necessity.
ruby-prof is a great tool for profiling your ruby code. For instance I can do the following:
SITE=mysite ruby-prof -p call_tree -f ~/s3/doc/profile/mysite_importing_call_tree.kcg convert.rb
That will output a call tree file that can be read by another great program: KCachegrind. KCachegrind helps you sort and visualize what is taking up the most time. In the screenshot attached you can see that Kernel::clone takes up 5% of the whole process.
Read more about ruby-prof Read more about KCachegrind

Picture

xen virtualization of existing server is called p2v

Just learned that moving a physical server to a virtual one is a common task and is called p2v (Physical to Virtual).
Xen comes with a linux P2V program: (see http://www.solutioncentre.co.uk/products.php?productid=100)
Also if the auto-tool doesn’t work here is an article on how to do it manually: http://wiki.xensource.com/xenwiki/XenManualPtoVProcess

Ruby Script skeleton code

Here is a great article with sample code for a skeleton ruby app. I think its a great idea for us to start using templates such as this one when we have to write one-off scripts.
http://www.infinitered.com/blog/?p=21

Next,