[Cialug] Command line tools can be faster than a Hadoop cluster

Matthew Nuzum newz at bearfruit.org
Mon Feb 16 15:30:03 CST 2015


Saw this and thought some of you would find it interesting.

Command-line tools can be 235x faster than your Hadoop cluster
<http://aadrake.com/command-line-tools-can-be-235x-faster-than-your-hadoop-cluster.html>

The main focus is the explanation of how your pipeline can do parallel
processing. One thing that may be an error (though not relevant to his
point) is that I think using `uniq` stores all the results in memory while
it processes them. This is a side note because he points out another issue
with uniq that precludes him from using it.

-- 
Matthew Nuzum
newz2000 on freenode, skype, linkedin and twitter

♫ You're never fully dressed without a smile! ♫


More information about the Cialug mailing list