Priority: Major - P3
Affects Version/s: 2.4.4
Fix Version/s: None
this bug has been in mongod since v2.2 at least.
MR obtains a cursor to iterate over the input docs, and then checks that the doc really matches using cursor->currentMatches() (line 1208 mr.cpp).
Then MR does the map/reduce logic and checks to yield the lock every 100 docs.
In case lots of documents are in the cursor but most of them dont actually match then the loop would iterate over many thousand objects without yielding, since the yield is only in the MR logic.
I have noted the issue in the following case:
- query is on an indexed field + a complex $where query
- 1 million docs match the index and are iterated over
- 950k docs get filtered out in matcher by the $where
- due to the distribution of documents, the loop ends up not yielding for about 30s
It appears that even though no explicit writes are being issued, the absence of yielding results in both reads and writes being locked.
This is obviously a problem for the delay on operations, but what's worse is that it will trigger a replica set reelection if authentication is used since secondaries cannot get heartbeat from the primary.
This creates many further errors, and makes sharded MR unstable.
The following show up in currentOp() when trying to issue a read: