[SERVER-1723] Add Bitmap indexes Created: 03/Sep/10 Updated: 06/Dec/22 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Index Maintenance |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | New Feature | Priority: | Major - P3 |
| Reporter: | Marc Seeger | Assignee: | Backlog - Query Optimization |
| Resolution: | Unresolved | Votes: | 51 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Assigned Teams: |
Query Optimization
|
| Participants: |
| Description |
|
As an alternative to a B+Tree, using Bitmap Indexes (http://en.wikipedia.org/wiki/Bitmap_index) could allow a very speedy retrieval of simple queries like the equivalent of "SELECT * FROM FOO WHERE GENDER = "MALE" AND "STUDENT" = TRUE. Bitmap indexes used to be thought of a "low cardinality" thing, but in "An Adequate Design for Large Data Warehouse Systems: Bitmap index versus B-tree index" by Zaker et al, the authors go even as far as saying that for certain situations, the cardinality of the values does not even matter. Their conclusive statement is as follows: Thus, we conclude that Bitmap index is the conclusive choice for a DW designing no matter for columns with high or low cardinality. A highly optimized C++ library called Fastbit (https://sdm.lbl.gov/fastbit/) already exists. Especially when using large amounts of data that have a low cardinality (e.g. a 'gender' field), the compressed bitmap indexes could keep the RAM usage low and outperform B+Trees. |
| Comments |
| Comment by Daniel Lemire [ 07/Mar/16 ] |
|
rb2k: Fastbit is indeed very good, but it uses WAH which is patented: http://www.google.ca/patents/US6831575 Alternatives include EWAH which is used, for example, by Git (https://github.com/git/git/tree/master/ewah), Apache Hive and so forth. See http://githubengineering.com/counting-objects/ There is also Roaring (http://roaringbitmap.org/) used by Apache Spark, Apache Kylin, Druid, Elastic and so forth. See https://www.elastic.co/blog/frame-of-reference-and-roaring-bitmaps To the best of my knowledge, EWAH and Roaring are not encumbered by patents. They are both used is several major mission-critical projects. |
| Comment by Eliot Horowitz (Inactive) [ 01/Apr/12 ] |
|
There are a lot of important features, so its just on the list, just hasn't made it to the top yet. |
| Comment by Robert La Ferla [ 29/Feb/12 ] |
|
This is an important feature. Why hasn't it received much attention? |
| Comment by David Lee [ 30/Dec/10 ] |
|
UPDATE: I propose merging b-tree indexes using in-memory bitmap indexes, like PostgreSQL Supporting in-memory bitmap indexes is a great way to increase performance using simpler (non-compound) indexes, and fits mongodb's philosophy of giving users greater performance in simplicity and flexibility. In-memory bitmap indexes allow single property indexes to go further without having to create various permutations of compound indexes. Postgresql does this and explains it better: http://www.postgresql.org/docs/8.3/static/indexes-bitmap-scans.html |
| Comment by Andrew Armstrong [ 09/Dec/10 ] |
|
Bitmap indexes are also mergable; you can overlay multiple bitmap indexes to resolve queries by having individual, single column/field bitmap indexes where one covers GENDER, another with YEAR, and another that covers STUDENT. This would mean you have three indexes that covers all permutations of those 3 fields (or just one or two of them), unlike with compound btree indexes where its a left-most matching prefix. |
| Comment by Shawn Masters [ 07/Dec/10 ] |
|
Even just having limited support for a tag would be highly useful and efficient. |