Author Archives: Nate Murray

Cascading, TF-IDF, and BufferedSum (Part 1)

Introduction A common technique in MapReduce is to input a group of records, calculate a value from that group, and emit each record with the new value attached. While this is easy to do in raw MR jobs, the solution in Cascading is not very obvious. This tutorial introduces a new operation to Cascading called BufferedSum. [...]
Posted in cascading, hadoop, programming | Leave a comment

How to use Cascading with Hadoop Streaming

Last time we talked about how to use a raw MapReduce job in Cascading. Now we are going to up the ante by using Hadoop Streaming as a Flow in Cascading. In this example, we hook a python streaming job into a Cascade. Its pretty easy once you know how to do it: Create a JobConf [...]
Posted in programming | Leave a comment

Interval – a ruby library for musical interval arithmetic

Interval interval is a tiny library that provides simple musical note pitch and interval arithmetic. It is intended to do one thing: given a pitch add (or subtract) an interval and give the resulting pitch. Observe: p = Interval::Pitch.from_string("c") i = Interval::Interval.from_string("M3") p2 = p + i p2.to_short_name # => "e"   i.to_s # => "Major Third"   i2 = Interval::Interval.from_string("p5") i2.to_s # => "Perfect Fifth"   (p2 [...]
Posted in music, ruby | Leave a comment