[SERVER-2225] when multiple map reduce jobs are running and one is holding the js mutex indefinitely, it is not obvious which is hogging the js mutex Created: 14/Dec/10  Updated: 06/Dec/22  Resolved: 17/May/17

Status: Closed
Project: Core Server
Component/s: MapReduce
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Aaron Staple Assignee: Backlog - Query Team (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-387 currentOp() and killOp() don't work r... Closed
Assigned Teams:
Query
Backwards Compatibility: Fully Compatible
Participants:

 Description   

Below I have two mr jobs with infinite loops in the map function. I believe op 117 is running and holding the js mutex, while 122 is not. If I kill 122 nothing happens (it does not stop running) until 117 is killed as well.

In this case it's not obvious from the currentOp output which op needs to be killed so that others can run.

> db.currentOp().inprog
[
{
"opid" : 117,
"active" : true,
"lockType" : "read",
"waitingForLock" : false,
"secs_running" : 12,
"op" : "query",
"ns" : "test.c",
"query" : {
"mapreduce" : "c",
"map" : function cf_47f() {
while (1) {
}
},
"reduce" : function cf_48f()

{ } },
"client" : "127.0.0.1:51528",
"desc" : "conn",
"msg" : "m/r: (1/3) emit phase 0/1 0%"
},
{
"opid" : 122,
"active" : true,
"waitingForLock" : false,
"secs_running" : 9,
"op" : "query",
"ns" : "?",
"query" : {
"mapreduce" : "c",
"map" : function cf_49f() {
while (1) {
}
},
"reduce" : function cf_50f() {} }

,
"client" : "127.0.0.1:51563",
"desc" : "conn"
}
]



 Comments   
Comment by Asya Kamsky [ 17/May/17 ]

This is no longer an issue as currentOp() now shows time acquiring lock, whether process is waiting for lock and for mapreduce what phase it's in and therefore it's easy to tell which process is running and which are waiting.

// working
		{
			"desc" : "conn7",
			"threadId" : "0x700009994000",
			"connectionId" : 7,
			"client" : "127.0.0.1:56469",
			"appName" : "MongoDB Shell",
			"active" : true,
			"opid" : 261629,
			"secs_running" : 82,
			"microsecs_running" : NumberLong(82471913),
			"op" : "command",
			"ns" : "test.upd",
			"query" : {
				"mapreduce" : "upd",
				"map" : {
					"code" : "function () { sleep(10000000); emit(1, 1); }"
				},
				"reduce" : {
					"code" : "function (k,v) { return 2;}"
				},
				"out" : "out"
			},
			"planSummary" : "COLLSCAN",
			"msg" : "m/r: (1/3) emit phase M/R: (1/3) Emit Progress: 0/1 0%",
			"progress" : {
				"done" : 0,
				"total" : 1
			},
			"numYields" : 0,
			"locks" : {
				"Global" : "r",
				"Database" : "R"
			},
			"waitingForLock" : false,
			"lockStats" : {
				"Global" : {
					"acquireCount" : {
						"r" : NumberLong(15),
						"w" : NumberLong(5)
					}
				},
				"Database" : {
					"acquireCount" : {
						"r" : NumberLong(3),
						"w" : NumberLong(3),
						"R" : NumberLong(2),
						"W" : NumberLong(5)
					}
				},
				"Collection" : {
					"acquireCount" : {
						"r" : NumberLong(3),
						"w" : NumberLong(6)
					}
				}
			}
		},

//waiting
		{
			"desc" : "conn6",
			"threadId" : "0x700009911000",
			"connectionId" : 6,
			"client" : "127.0.0.1:54466",
			"appName" : "MongoDB Shell",
			"active" : true,
			"opid" : 261664,
			"secs_running" : 63,
			"microsecs_running" : NumberLong(63901015),
			"op" : "command",
			"ns" : "test.upd",
			"query" : {
				"mapreduce" : "upd",
				"map" : {
					"code" : "function () { sleep(10000000); emit(1, 1); }"
				},
				"reduce" : {
					"code" : "function (k,v) { return 2;}"
				},
				"out" : "out2"
			},
			"numYields" : 0,
			"locks" : {
				"Global" : "w",
				"Database" : "W"
			},
			"waitingForLock" : true,
			"lockStats" : {
				"Global" : {
					"acquireCount" : {
						"r" : NumberLong(7),
						"w" : NumberLong(1)
					}
				},
				"Database" : {
					"acquireCount" : {
						"r" : NumberLong(2),
						"R" : NumberLong(1),
						"W" : NumberLong(1)
					},
					"acquireWaitCount" : {
						"W" : NumberLong(1)
					},
					"timeAcquiringMicros" : {
						"W" : NumberLong(63784823)
					}
				},
				"Collection" : {
					"acquireCount" : {
						"r" : NumberLong(2)
					}
				}
			}
		}

Generated at Thu Feb 08 02:59:20 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.