Details
-
Bug
-
Resolution: Duplicate
-
Major - P3
-
None
-
2.4.1
-
None
-
None
-
MongoDB hosted in Azure-based Windows 2012 Server, queries using C# driver, map-reduce using remote mongo.exe
-
Windows
-
Description
While running a long mapReduce (on a collection of ~2M records, 8GB disk size), mongo enters a catatonic state where the map reduce operation stops using the local resources (no CPU/Disk activity) but does not complete (when the MR started CPU/Disk resources usage were high).
In db.currentOp() there are 129 ops running, all waiting for a global write lock, never completing and never yielding. Other queries work, but these queries are "stuck" and will never complete until mongo service is restarted. The first op, with the longest run time is the map op, the rest are unrelated ops to the DB from other apps.
Notes:
- I've waited 20 minutes, and ran db.currentOp() again and performed a diff between the outputs, no fields changes except the 'secs_running'. Especially important is that the 'numYields' doesn't change.
- Server log shows nothing at this state (only MMS connects). At the time when the MR seems to have stopped working there are no errors, the last entry relating to the MR is: Mon Apr 29 08:14:36.921 [conn11315] M/R: (1/3) Emit Progress: 52700/87827 60%
Attached are:
- db.currentOp() output (anonymized with ***).
- db.serverStatus() output
- Screenshot of MMS
- mapReduce code (somewhat obfuscated)