Type: New Feature
Priority: Minor - P4
Affects Version/s: None
Fix Version/s: None
Would it be feasible for mongo to be slightly more granular about its data files, and separate data from index? For example, right now, everything for the "foursquare" database is stored in:
- foursquare.N where N is a number
Instead, the data and index could be separated into:
There are a couple advantages to this separation:
1. We would be able to use vmtouch or other tools to pin the index into memory. Given the choice between page faults in index and page faults of data, I'd much rather have the reads of data fault. The kernel is somewhat good about putting the right stuff into page cache, but at cold startup or when not all data fits in RAM it would be helpful to give it a hint about what should be warm.
2. It would be easier to see how much index and how much data was cached. There exists tools to see how much of a file is in the page cache, but those aren't helpful in the current setup since we don't get an index/data breakdown.
Generally, I'd feel much more confident with the operations of a disk-reading mongo if we had a bit more control over what was read from disk and what was always in memory.
(Even better would be to have different data/index files at the per-collection level, but I could see how that would be difficult in degenerate cases with thousands of collections.)