Monthly Archives: December 2009

Cascading, TF-IDF, and BufferedSum (Part 1)

Introduction A common technique in MapReduce is to input a group of records, calculate a value from that group, and emit each record with the new value attached. While this is easy to do in raw MR jobs, the solution in Cascading is not very obvious. This tutorial introduces a new operation to Cascading called BufferedSum. [...]
Posted in cascading, hadoop, programming | Leave a comment