Xapian压缩数据库

压缩数据库可以减小数据库体积,提高检索性能。

xapian-compact - Compact a database, or merge and compact several

Usage: xapian-compact [OPTIONS] SOURCE_DATABASE... DESTINATION_DATABASE

Options:
  -b, --blocksize   Set the blocksize in bytes (e.g. 4096) or K (e.g. 4K)
                    (must be between 2K and 64K and a power of 2, default 8K)
  -n, --no-full     Disable full compaction
  -F, --fuller      Enable fuller compaction (not recommended if you plan to
                    update the compacted database)
  -m, --multipass   If merging more than 3 databases, merge the postlists in
                    multiple passes (which is generally faster but requires
                    more disk space for temporary files)
      --no-renumber Preserve the numbering of document ids (useful if you have
                    external references to them, or have set them to match
                    unique ids from an external source).  Currently this
                    option is only supported when merging databases if they
                    have disjoint ranges of used document ids
  --help            display this help and exit
  --version         output version information and exit

基本,我们用-F(如果你之后不准备再更新数据库了)和-b 16KB(一般来说,Block Size越大,越高效)

xapian-compact -b 16K -F ./index_data ./index_data_F_16KB

 注:如果你的Database是通过间断Update进去的。即多次commit进去的。那么上述压缩会非常有用。以我的情况为例:100万文档,分50次建的索引,索引压缩前,对于DF大的Query经常在3~4秒。压缩后,基本能缩到0.8秒左右,简单的Query更快。

 

1 thought on “Xapian压缩数据库

Leave a Reply

Your email address will not be published. Required fields are marked *