Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Done
Fix Version/s: WT2.0
Affects Version/s: None
Component/s: None
Labels:
None

Sprint:
None
Story Points:
None

Our default Bloom filter settings are 8 bits per record with 4 hashes. This has a false positive rate of ~2.4%.

For example, with an LSM tree consisting of 10 chunks, for every 1000 LSM searches, we will do ~240 unnecessary btree searches (24 per chunk). For an out-of-cache workload, this is an overhead of ~24% (i.e., assuming all internal pages are in cache and none of the leaf pages, we read 1240 leaf pages when we only need 1000).

If we changed the defaults to 16 bits per record with 8 hashes, that false positive rate goes to 0.057%, so the same 1000 LSM searches would only do ~6 unnecessary btree searches. The I/O overhead would drop to 0.6%.

The tradeoff is that we would double the space required for Bloom filters, and double the amount of computation associated with Bloom filter lookups. (We could do fractionally better with 12 hashes than 8, but that's 50% more computation to go from 0.6% to 0.5%).

For reference, the riak tests currently use 28/19, which has a false positive rate of 1.44e-06.

The change is trivial to make, but what are your thoughts on the implications of doubling the default space required for Bloom filters?

Assignee:: Michael Cahill (Inactive)
Reporter:: Michael Cahill (Inactive)
Votes:: 0 Vote for this issue
Watchers:: 1 Start watching this issue

Created:: Dec 03 2013 03:00:04 AM UTC
Updated:: Apr 16 2015 07:11:37 PM UTC
Resolved:: Apr 09 2015 01:07:39 AM UTC

Details

Description

Attachments

Activity

People

Dates