-
Type:
New Feature
-
Resolution: Won't Fix
-
Priority:
Minor - P4
-
None
-
Affects Version/s: None
-
Storage Execution
-
None
-
3
-
None
-
None
-
None
-
None
-
None
-
None
Scenario:
- 4G memory
- Single MongoDB instance on Amazon EC2
- EBS 1000 iops provisioned
- Big collection with tens of gigabytes of ~4K documents, only with the default _id index on it
- Completely random (uniformly distributed _id-s) queries by _id
In this scenario, I measured the following:
- Almost every query causes a pagefault
- One query should cause at most two pagefaults because of the 4K pagesize, and the ~4k docsize
BUT - Kernel uses readahead by default on pagefaults, especially when MADV_SEQUENTIAL advice was given
This means a lot more read on the storage than it would be needed. On Amazon EBS on single connection and single mongo instance I was able to query documents with ~1.5MB/sec net, while I had ~40MB/sec on the EBS storage.
I made a little patch which makes it possible to set one of the MADV_SEQUENTIAL, MADV_NORMAL or MADV_RANDOM flags on a given database's mmapped MongoDataFiles.
When I set the MADV_RANDOM with db.$cmd.findOne(
); I was able to query with ~3MB/sec net, while I had only ~4MB/sec on the EBS storage. That was a big improvement in my case.
I thought it worths to share this patch, if anybody ever has the same problem, it could help, or could be a good starting point.
I tested it only on Ubuntu x86_64 for 1-2 hours, so care must be taken