众所周知的原因,帮大家把这篇文章转载过来,写的很好。
It seems like Bloom filters are all the rage these days. Three years ago I had barely heard of them and now it seems like I see articles and code using them all the time. That's mostly a good thing, since bloom f[......]
众所周知的原因,帮大家把这篇文章转载过来,写的很好。
It seems like Bloom filters are all the rage these days. Three years ago I had barely heard of them and now it seems like I see articles and code using them all the time. That's mostly a good thing, since bloom f[......]
是最近几年非常火的一种Hash算法,已经升级到3.0,
Hadoop、Kotyo Cabinet等之中都采用了它。
主要是性能非常优秀,且碰撞不高。
Python实现:http://pypi.python.org/pypi/mmh3/2.0
其实我在为写一个自用的Bloom Filter做准备。[......]
GenericOptionsParser可以让你的Map/Reduce程序具有Hadoop常用的属性。
一般无需直接用GenericOptionsParser,继承Configured即可。
public class ConfigurationPrinter extends Configured implements Tool {
//添加你需要的配置文件
static {
Configuration.addDefaultResource("hdfs-default.xml[......]
1、计数器,用于Debug或者统计,毕竟分布式系统中,想要调个Bug是非常困难的,因为机器太多。。
2、Hadoop中有10多个内置的计数器,是默认就有的,例如HDFS读/写数据量,等等……
3、用户可以自定义Counter,步骤如下:
(1)Enum
(2)reporter.incrCounter()
(3)从Web管理界面或者getCounter中获得。
class MaxTemperatureWithCounters extends Configured impleme[......]
本章主要介绍了Hadoop中的各种数据类型……
1、从泛型的角度看问题。
箭头代表从输入到输出
map(k1, v1) -> list(k2, v2)
combine(k2, list(v2)) -> list(k2, v2)
reduce(k2, list(v2)) -> list(k3, v3)
2、从1中可以看到,map的输出k2 v2和reduce的输入k2和v2必须相同。
如果有combine的话,map的输出也必须和combine相匹配。[......]