Priority: Major - P3
Affects Version/s: None
Fix Version/s: 5.0 Required
Random cursors can be quite slow on a multi-GB table with very large records.
A customer has experienced slow startup times in 4.2 that they didn't see in 4.0. Based on their logs mongod is spending a lot of time iterating a random cursor though the oplog — in one case it takes 15 minutes to perform 993 cursor->next() calls on a ~26GB oplog. The oplog had only 62590 records, so the average record size is 100s of KB.
A simple python script, which I'll attach, shows similar behavior on a large file. I populate a table with 60K 500 KB records, assigning them increasing keys to mimic the oplog which adds records with increasing timestamps. When that's done, I create a random cursor and call next() 1000 times. It takes almost 3 minutes on my virtual workstation.
Oddly increasing the WT cache size from the default (100MB, I think) to 4GB made the cursor iteration increase from ~130 sec to ~170 sec.
This isn't nearly as slow on a larger machine. On ocelot-aws (16 cores, 64GB RAM) it only takes 20 sec.
I've been reproducing this on the develop branch and haven't tried running specifically against 4.2.
WT-5068 was a change to the random cursor implementation that went into 4.2. So it's possible this has something to do with that change. (UPDATE: See my comment, below. WT-5068 definitely introduced this.)
There is a lot of variability in the different cursor->next() calls. Some take a couple milliseconds. Some take half a second. I guess it's not surprising that a random cursor behaves unpredictably.