Tag Archives: Hadoop

例如，使用SequenceFile、指定分隔符等等。
hadoop jar /path/hadoop-xxxx-streaming.jar \
-D mapred.reduce.tasks=100 \
-input path/xxx \
-output path/yyy \
-file ./dna.[......]

继续阅读

[转 ]Hadoop - How to do a secondary sort on values ?

Leave a reply

关于在hadoop中，如何让reduce阶段同一个key下的values有序，一篇很好的文章，写的比《Hadoop权威指南》清楚！

转载自：

http://www.bigdataspeak.com/2013/02/hadoop-how-to-do-secondary-sort-on_25.html

The problem at hand here is that you need to work upon a sorted values set in your reducer.[......]

继续阅读

[转载]MapReduce的模式、算法和用例

Leave a reply

转载自：http://yangguan.org/mapreduce-patterns-algorithms-and-use-cases/

翻译自：http://highlyscalable.wordpress.com/2012/02/01/mapreduce-patterns/

在这篇文章里总结了几种网上或者论文中常见的MapReduce模式和算法，并系统化的解释了这些技术的不同之处。所有描述性的文字和代码都使用了标准hadoop的MapReduce模型，包括Mappers, Red[......]

继续阅读

四号程序员

Keep It Simple and Stupid

Tag Archives: Hadoop

Mahout - Clustering (聚类篇)

升级到JDK 7后，Sort出现兼容性问题的解法。

Hadoop使用Streaming的一些坑

[转 ]Hadoop - How to do a secondary sort on values ?

[转载]MapReduce的模式、算法和用例