Category Archives: programming

Cascading, TF-IDF, and BufferedSum (Part 1)

Introduction A common technique in MapReduce is to input a group of records, calculate a value from that group, and emit each record with the new value attached. While this is easy to do in raw MR jobs, the solution in Cascading is not very obvious. This tutorial introduces a new operation to Cascading called BufferedSum. [...]
Also posted in cascading, hadoop | Leave a comment

How to use Cascading with Hadoop Streaming

Last time we talked about how to use a raw MapReduce job in Cascading. Now we are going to up the ante by using Hadoop Streaming as a Flow in Cascading. In this example, we hook a python streaming job into a Cascade. Its pretty easy once you know how to do it: Create a JobConf [...]
Posted in programming | Leave a comment

Slides for “Introduction to Cascading” Presentation

This week I gave an introductory presentation to Cascading. These are the slides from that presentation. Intro To Cascading View more documents from Nate Murray.
Posted in programming | Leave a comment