[SERVER-17225] [RocksDB] Store all data in a single column family Created: 09/Feb/15 Updated: 25/Jan/17 Resolved: 10/Feb/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Storage |
| Affects Version/s: | None |
| Fix Version/s: | 3.0.0-rc9, 3.1.0 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Igor Canadi | Assignee: | Benety Goh |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Backport Completed: | |||||||||
| Participants: | |||||||||
| Description |
|
Currently, we store collections and indexes in separate column families. This is no-go for Parse workloads, since they have a lot of collections and we can't scale column families that much. The idea is to add a 4-byte prefix to every key. Each collection and index will get their own prefix. |
| Comments |
| Comment by Ramon Fernandez Marina [ 19/Feb/15 ] |
|
Notes on v3.0 backports:
|
| Comment by Githook User [ 19/Feb/15 ] |
|
Author: {u'username': u'igorcanadi', u'name': u'Igor Canadi', u'email': u'icanadi@fb.com'}Message: Signed-off-by: Ramon Fernandez <ramon.fernandez@mongodb.com> |
| Comment by Githook User [ 10/Feb/15 ] |
|
Author: {u'username': u'igorcanadi', u'name': u'Igor Canadi', u'email': u'icanadi@fb.com'}Message: Signed-off-by: Benety Goh <benety@mongodb.com> |
| Comment by Igor Canadi [ 09/Feb/15 ] |
|
I agree that this is something we will probably want in the final product. However, it's not on the top of my TODO list just yet. I'm currently running snapshot&replay with Parse's production workload and fixing what breaks (performance-wise and stability-wise) |
| Comment by Daniel Pasette (Inactive) [ 09/Feb/15 ] |
|
I would strongly consider using one CF per database, this will work for parse's workload and leave some flexibility for tuning. Also allows you to not special case the oplog. |
| Comment by Igor Canadi [ 09/Feb/15 ] |
|
It's actually one column family per mongodb instance. We could make it per database in the future. The biggest tradeoff is that dropping an index/collection might be slow. We will not be able to reclaim the space immediately, while with CF-per-collection, we would just unlink the files. We will also pay some CPU cost to prefix/unprefix all the keys, but I'm not too concerned about that. We also lose some flexibility, since different column families can have different parameters, so with CF-per-collection, we can optimize different collections based on access patterns. The enabling factor for this patch was great work on KeyString. Previously we needed to have different comparators for different indexes (based on ordering parameter), which was hard to achieve with everything stored in a single column family. Now that all the keys are byte-comparable we don't have to worry about that. I am yet to optimize oplog. I might need to extract it to a separate column family, since it's a big factor in overall system performance. But this work will probably wait for MongoDB 3.2. |
| Comment by Daniel Pasette (Inactive) [ 09/Feb/15 ] |
|
So it will now be one column family per database. Are there any tradeoffs? I'll ask benety.goh to take a look at the PR's you have queued up tmrw. |
| Comment by Igor Canadi [ 09/Feb/15 ] |
|
Yes, IIRC Parse tested WiredTiger and this was one of the major issues that came up (file per collection/index). RocksDB with column families is even worse because we have multiple files per column family. There is also inherent issue with RocksDBs architecture when scaling column families (write-buffer memory management, write-ahead-log management, bigger write amplification due to potentially small Level 0 files...). Here's the patch for the change: https://github.com/mongodb-partners/mongo/tree/singlecolumnfamily. I'm not sure what's the status of 3.0, but it would be great if we could also get this patch in. Instead of creating a column family for collection or an index, we assign a new prefix, which is 32-bit integer. All key-values belonging to that collection will be written as <prefix>key-value. When a collection needs to read a RecordId, it will issue request Get(<prefix><record_id>) to get the document. Is this a bit clearer description? Here's an example of Iterator that automatically adds and removes prefixes from underlying data: https://github.com/mongodb-partners/mongo/commit/5dbf6e32765cddb9173c3bd17c016680ec3fddc3#diff-a1ab21c960bbf18d99e36b078fb18718R51 |
| Comment by Daniel Pasette (Inactive) [ 09/Feb/15 ] |
|
This may be an issue for WiredTiger because all indexes and collections are stored in their own files, whereas mmap stored all collections and indexes in a single set of database files. Can you describe the proposed change a bit further? |