Tokyo Cabinet中Bucket(桶)的设置与性能关系



Tokyo Cabinet attains improvement in retrieval by loading RAM with the whole of a bucket array. If a bucket array is on RAM, it is possible to access a region of a target record by about one path of file operations. A bucket array saved in a file is not read into RAM with the read' call but directly mapped to RAM with the mmap’ call. Therefore, preparation time on connecting to a database is very short, and two or more processes can share the same memory map.

If the number of elements of a bucket array is about half of records stored within a database, although it depends on characteristic of the input, the probability of collision of hash values is about 56.7% (36.8% if the same, 21.3% if twice, 11.5% if four times, 6.0% if eight times). In such case, it is possible to retrieve a record by two or less paths of file operations. If it is made into a performance index, in order to handle a database containing one million of records, a bucket array with half a million of elements is needed. The size of each element is 4 bytes. That is, if 2M bytes of RAM is available, a database containing one million records can be handled.

桶大小为row数量的2倍时,概率是21.3% ,而1/2时碰撞概率达到56.7%。按照这个概率来说,2次以内搞定的概率是:

(1-0.567)+(1-0.567)*(0.567) = 67%



Leave a Reply

Your email address will not be published.