[SERVER-9378] make server auto-read-ahead on capped collections Created: 17/Apr/13 Updated: 10/Dec/14 Resolved: 17/Mar/14 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Storage |
| Affects Version/s: | 2.4.1 |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Dwight Merriman | Assignee: | Unassigned |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Participants: | |||||||||
| Description |
|
If the readahead for the filesystem is set to a low value, this can slow writing to capped collections (particularly the oplog) as we need to read before we write it. The proposal here is to have the server, in some background fashion, do some prefetching (touching?) of the oplog region where it is going to be inserting new documents soon. so not the position of this write but an upcoming block of data. maybe 1MB at a time is touched for example. (note edge case: on a tiny capped collection, this may be a dumb value. maybe for tiny just don't do this logic at all.) this touching could probably be done outside of the db locks, kind of like Record::touch(). |
| Comments |
| Comment by Eliot Horowitz (Inactive) [ 17/Mar/14 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
| ||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ben Becker [ 18/Feb/14 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
Tested with the changes in | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by Antoine Girbal [ 30/Apr/13 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
Here is a repro with benchRun.
Restart server with 16 RA
Then benchRun results:
Now restart node with 32 RA, results:
| ||||||||||||||||||||||||||||||||||||||||||||
| Comment by Antoine Girbal [ 23/Apr/13 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
eliotdwight_10gen see above for quick repro. | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by Antoine Girbal [ 23/Apr/13 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
here is an easy way to replicate the issue on any server regardless of RAM size, just make sure it's based on HD not SSD. Create a 500MB capped collection in the local db so that there is no replication for it.
Then set the readahead on all drives to 16, make sure to restart mongod, and drop FS cache.
Next insert documents in the collection
In the case of RA 16, result is about 5000 inserts / sec and disk is at 90%
Now do above 3 commands to restart with RA 32. Result is now 30k inserts / sec and disk is only about 30% used.
According to above calculation, disk is about 18 times more efficient with RA 32! | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by Dwight Merriman [ 19/Apr/13 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
fwiw: | ||||||||||||||||||||||||||||||||||||||||||||
| Comment by Antoine Girbal [ 17/Apr/13 ] | ||||||||||||||||||||||||||||||||||||||||||||
|
I've been trying to test this by using the following function in oplog.cpp:
oplogCount is just a counter that sums the document sizes inserted in the oplog. I'm calling the readaheadOplog method from the updateObjects method in update.cpp which I'm hoping is outside of any lock. But right now it doesnt work, the speed is still slow and the number of r/s is 8x that of w/s in iostat, for the same amount of data transfered. |