[SERVER-1723] Add Bitmap indexes Created: 03/Sep/10  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: Index Maintenance
Affects Version/s: None
Fix Version/s: None

Type: New Feature Priority: Major - P3
Reporter: Marc Seeger Assignee: Backlog - Query Optimization
Resolution: Unresolved Votes: 51
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Query Optimization
Participants:

 Description   

As an alternative to a B+Tree, using Bitmap Indexes (http://en.wikipedia.org/wiki/Bitmap_index) could allow a very speedy retrieval of simple queries like the equivalent of "SELECT * FROM FOO WHERE GENDER = "MALE" AND "STUDENT" = TRUE.

Bitmap indexes used to be thought of a "low cardinality" thing, but in "An Adequate Design for Large Data Warehouse Systems: Bitmap index versus B-tree index" by Zaker et al, the authors go even as far as saying that for certain situations, the cardinality of the values does not even matter. Their conclusive statement is as follows:

Thus, we conclude that Bitmap index is the conclusive choice for a DW designing no matter for columns with high or low cardinality.

A highly optimized C++ library called Fastbit (https://sdm.lbl.gov/fastbit/) already exists.

Especially when using large amounts of data that have a low cardinality (e.g. a 'gender' field), the compressed bitmap indexes could keep the RAM usage low and outperform B+Trees.



 Comments   
Comment by Daniel Lemire [ 07/Mar/16 ]

rb2k:

Fastbit is indeed very good, but it uses WAH which is patented: http://www.google.ca/patents/US6831575

Alternatives include EWAH which is used, for example, by Git (https://github.com/git/git/tree/master/ewah), Apache Hive and so forth. See http://githubengineering.com/counting-objects/

There is also Roaring (http://roaringbitmap.org/) used by Apache Spark, Apache Kylin, Druid, Elastic and so forth. See https://www.elastic.co/blog/frame-of-reference-and-roaring-bitmaps

To the best of my knowledge, EWAH and Roaring are not encumbered by patents. They are both used is several major mission-critical projects.

Comment by Eliot Horowitz (Inactive) [ 01/Apr/12 ]

There are a lot of important features, so its just on the list, just hasn't made it to the top yet.

Comment by Robert La Ferla [ 29/Feb/12 ]

This is an important feature. Why hasn't it received much attention?

Comment by David Lee [ 30/Dec/10 ]

UPDATE: I propose merging b-tree indexes using in-memory bitmap indexes, like PostgreSQL

Supporting in-memory bitmap indexes is a great way to increase performance using simpler (non-compound) indexes, and fits mongodb's philosophy of giving users greater performance in simplicity and flexibility. In-memory bitmap indexes allow single property indexes to go further without having to create various permutations of compound indexes.

Postgresql does this and explains it better: http://www.postgresql.org/docs/8.3/static/indexes-bitmap-scans.html

Comment by Andrew Armstrong [ 09/Dec/10 ]

Bitmap indexes are also mergable; you can overlay multiple bitmap indexes to resolve queries by having individual, single column/field bitmap indexes where one covers GENDER, another with YEAR, and another that covers STUDENT.

This would mean you have three indexes that covers all permutations of those 3 fields (or just one or two of them), unlike with compound btree indexes where its a left-most matching prefix.

Comment by Shawn Masters [ 07/Dec/10 ]

Even just having limited support for a tag would be highly useful and efficient.

Generated at Thu Feb 08 02:57:51 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.