[Cialug] Command line tools can be faster than a Hadoop cluster

Eric Junker eric at eric.nu
Mon Feb 16 22:44:12 CST 2015


Reminds me of this post about how a six command shell pipeline 
accomplished the same as 10+ pages of Donald Knuth's Pascal code.

http://www.leancrew.com/all-this/2011/12/more-shell-less-egg/

If you think about it, using pipes to connect several small programs 
together is basically functional programming.

Eric

On 2/16/15 3:30 PM, Matthew Nuzum wrote:
> Saw this and thought some of you would find it interesting.
>
> Command-line tools can be 235x faster than your Hadoop cluster
> <http://aadrake.com/command-line-tools-can-be-235x-faster-than-your-hadoop-cluster.html>
>
> The main focus is the explanation of how your pipeline can do parallel
> processing. One thing that may be an error (though not relevant to his
> point) is that I think using `uniq` stores all the results in memory while
> it processes them. This is a side note because he points out another issue
> with uniq that precludes him from using it.
>



More information about the Cialug mailing list