Tag Archives: Map

定制Hadoop的MapReduce任务的FileOutputFormat

需求：Reduce输出特殊的格式结果
例如：如Reducer的结果，压到Guava的BloomFilter中
import com.google.common.hash.BloomFilter;
import com.google.common.hash.Funnels;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.h[......]

继续阅读

SpringBoot实现从配置中注入多级Ｍap到bean中

Leave a reply

假设要搞一个２级map: type -> level -> score

先看配置：
xxxx.old.type2Level2ScoreMap:
type_1.level2ScoreMap.level_1: 1
type_1.level2ScoreMap.level_2: 2
type_2.level2ScoreMap.level_1: 1
type_3.level2ScoreMap.level_1: 1
首先搞定2个数据结构，注意一定要字段名对应，层[......]

继续阅读

如何在Hadoop中控制map的个数

Leave a reply

转载自：如何在hadoop中控制map的个数

hadoop提供了一个设置map个数的参数mapred.map.tasks，我们可以通过这个参数来控制map的个数。但是通过这种方式设置map的个数，并不是每次都有效的。原因是mapred.map.tasks只是一个hadoop的参考数值，最终map的个数，还取决于其他的因素。
为了方便介绍，先来看几个名词：
block_size : hdfs的文件块大小，默认为64M，可以通过参数dfs.block.size设置
total_size[......]

继续阅读

Hadoop小集群(5结点)测试

Leave a reply

1、Map/Reduce任务
输入：
文件格式
id value
其中id是1~100之间的随机整数，value为1~100之间的随机浮点数。
输出：
每个id的最大value

生成这类文件，可以用python搞定，见本文末尾的附录。

2、Map/Reduce程序
这里就直接使用新(0.20.2)的API了，即org.apache.hadoop.mapreduce.*下的接口。
特别注意：
job.setNumReduceTasks(5)
指定了本Job的Redu[......]

继续阅读

四号程序员

Keep It Simple and Stupid

Tag Archives: Map

定制Hadoop的MapReduce任务的FileOutputFormat

SpringBoot实现从配置中注入多级Ｍap到bean中

如何在Hadoop中控制map的个数

Hadoop小集群(5结点)测试