@xcombinator
- I realize there are a million already, but I created another git cheatsheet: http://bit.ly/bfAKlZ 2010/09/01
-
Recent Posts
Recent Comments
- ActiveRecord from_json and from_xml (5)
- Terence: Dude, you are the bomb. Thanks for your fix. Helped us out heaps.
- djb daemontools with Ubuntu’s upstart (2)
- sorcess: such config may lead to data loss… consider above configuration with this little change start on...
- Mac OS X color showing ESC[whatever for git-diff colors (and more) (15)
- automate installing tripwire using expect (1)
- Trey Henefield: I came across this as useful. But I found an even easier solution. There is an option that disables...
- ActiveRecord from_json and from_xml (5)
Categories
- bookmarks (2)
- cascading (2)
- code (2)
- crawling (1)
- deployment (6)
- ec2 (3)
- erlang (2)
- gems (3)
- git (7)
- hadoop (3)
- java (1)
- merb (1)
- music (1)
- osx (2)
- poolparty (3)
- processing (1)
- programming (48)
- rails (11)
- ruby (21)
- scalability (5)
- shell (8)
- sysadmin (16)
- tips (13)
- Uncategorized (3)
- useless (1)
Archives
Pages
Blogroll

How to use a raw MapReduce job in Cascading
Cascading is a great abstraction over MapReduce.
However, sometimes you may have code for an existing MapReduce job or want to drop directly to Hadoop for efficiency. Even if you’re using raw MapReduce jobs, Cascading can still be useful in planning the overall data pipeline.
The code below is an example of how to use a raw MapReduce job in a Cascade. The main thing to take away is that we are creating intermediate sinks and sources and relying on Cascading to schedule the flows in the correct order.