Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-784

Increase Bloom filter defaults

    • Type: Icon: Task Task
    • Resolution: Done
    • WT2.0
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None

      Our default Bloom filter settings are 8 bits per record with 4 hashes. This has a false positive rate of ~2.4%.

      For example, with an LSM tree consisting of 10 chunks, for every 1000 LSM searches, we will do ~240 unnecessary btree searches (24 per chunk). For an out-of-cache workload, this is an overhead of ~24% (i.e., assuming all internal pages are in cache and none of the leaf pages, we read 1240 leaf pages when we only need 1000).

      If we changed the defaults to 16 bits per record with 8 hashes, that false positive rate goes to 0.057%, so the same 1000 LSM searches would only do ~6 unnecessary btree searches. The I/O overhead would drop to 0.6%.

      The tradeoff is that we would double the space required for Bloom filters, and double the amount of computation associated with Bloom filter lookups. (We could do fractionally better with 12 hashes than 8, but that's 50% more computation to go from 0.6% to 0.5%).

      For reference, the riak tests currently use 28/19, which has a false positive rate of 1.44e-06.

      The change is trivial to make, but what are your thoughts on the implications of doubling the default space required for Bloom filters?

            Assignee:
            michael.cahill@mongodb.com Michael Cahill (Inactive)
            Reporter:
            michael.cahill@mongodb.com Michael Cahill (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: