Hamsterdb tesing code (python)

Tesing code for HamsterDB, python bind.

Big file may slow down HamsterDB.(QPS 70 around 5,000,000 kvs)
So test_write_en() / test_read_en() use pre-hash (mmh3).(QPS 300K around 5,000,000 kvs) That is not true, I made a mistake in my code and QPS i[......]

继续阅读

Python中将list等分成N块

2 Replies

#arr是被分割的list，n是每个chunk中含n元素。
def chunks(arr, n):
return [arr[i:i+n] for i in range(0, len(arr), n)]

#或者让一共有m块，自动分（尽可能平均）
#split the arr into N chunks
def chunks(arr, m):
n = int(math.ceil(len(arr) / float(m)))
return [arr[i:i +[......]

继续阅读

Hadoop The Definitive Guide 2nd读书笔记 – 第九章

Leave a reply

1、Hadoop集群的机器最好是多核、多通道硬盘，但不要使用RAID。选用中档机器。例如8核、16GB内存、4×1TB硬盘。

2、集群可随着规模而不断扩充，当小集群时（10的数量级），namenode和jobtracker可以放在同一台机器上（同时保证有一份NFS远端的namenode备份）即可。再大就最好分别放在两台机器上。

3、Windows这么不靠谱的东西就不要用于生产环境了，生产环境最好用Linux，或者Unix。

4、Hadoop的网络拓扑结构分为：机架内Rack（在[......]

继续阅读

如何卸载Python通过setup.py安装的模块

Leave a reply

1、获取位置
sudo easy_install -m BitVector
....
Using /usr/local/lib/python2.6/dist-packages
....
2、删除egg文件和py、pyc
cd /usr/local/lib/python2.6/dist-packages
rm -rf BitVector-3.0.egg-info
rm BitVector.py
rm BItVecvor.pyc[......]

继续阅读

Bloom Filter实现的一些文章

Leave a reply

1、给出了Java实现，用Random做为一致性哈希算法。。。
http://blog.locut.us/2008/01/12/a-decent-stand-alone-java-bloom-filter-implementation/

2、分析比较到位：
http://blog.csdn.net/jiaomeng/article/details/1495500

3、这个写的也不错
http://www.cnblogs.com/heaad/archive/2011/01/02/[......]

继续阅读

四号程序员

Keep It Simple and Stupid

Hamsterdb tesing code (python)

Python中将list等分成N块

Hadoop The Definitive Guide 2nd读书笔记 – 第九章

如何卸载Python通过setup.py安装的模块

Bloom Filter实现的一些文章