-
Recent Posts
Recent Comments
- Mac OS X color showing ESC[whatever for git-diff colors (and more) (12)
- Gopala Krishna A: Thanks a lot!! This really proved helpful on opensuse 11.2
- Silly Avatar: Thanx for this blog entry. Was having this problem while ssh-ing to a linux vps w/Putty. Thought it was...
- Girish KS: Thanks for the post Nate and thanks pablitostar for your suggestion. I started using git few days back and...
- pablitostar: I found using the -r flag did fix git-diff, but it broke something else in less. Specifically, searching...
- Gopala Krishna A: Thanks a lot!! This really proved helpful on opensuse 11.2
- ActiveRecord from_xml (and from_json) part 2 (3)
- Billy Kimble: Thanks for the snippet of code — it has helped me out tremendously. Unfortunately it did not work...
- Mac OS X color showing ESC[whatever for git-diff colors (and more) (12)
Categories
- bookmarks (2)
- cascading (2)
- code (2)
- deployment (6)
- ec2 (3)
- erlang (2)
- gems (3)
- git (7)
- hadoop (3)
- java (1)
- merb (1)
- music (1)
- osx (2)
- poolparty (3)
- processing (1)
- programming (43)
- rails (11)
- ruby (21)
- scalability (5)
- shell (8)
- sysadmin (16)
- tips (13)
- Uncategorized (3)
- useless (1)
Archives
Pages
Blogroll
Category Archives: hadoop
How to use a raw MapReduce job in Cascading
Cascading is a great abstraction over MapReduce.
However, sometimes you may have code for an existing MapReduce job or want to drop directly to Hadoop for efficiency. Even if you’re using raw MapReduce jobs, Cascading can still be useful in planning the overall data pipeline.
The code below is an example of how to use a [...]
Also posted in cascading, java Leave a comment
“Easily” setup a monitored Hadoop / Hive Cluster in EC2 with PoolParty
Summary
Setting up a scalable Hadoop cluster isn’t easy, but PoolParty makes it easier
and manageable.
By the time we’re done with this tutorial you’ll have a Hadoop cluster consisting of one master node and two slaves. The slaves are formatted with HDFS and process MapReduce jobs that are delegated to them from the master.
The whole [...]
Also posted in poolparty, programming, ruby, scalability 1 Comment

Cascading, TF-IDF, and BufferedSum (Part 1)