[SERVER-12710] Map-Reduce reports incorrect stats in db.currentOp Created: 13/Feb/14  Updated: 06/Dec/22  Resolved: 04/Feb/22

Status: Closed
Project: Core Server
Component/s: MapReduce
Affects Version/s: 2.4.9, 2.5.5
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Alexander Komyagin Assignee: Backlog - Query Execution
Resolution: Done Votes: 0
Labels: query-44-grooming
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-9907 Allow to skip initial count() in mapr... Closed
Assigned Teams:
Query Execution
Operating System: ALL
Steps To Reproduce:

>for(i=0;i<16000;i++) {db.test.insert({val:i})}
>var mapFunction1 = function() {
			sleep(1);
                       emit(1,this.val);
                   };
 
>var reduceFunction1 = function(id, val) {
                          return Array.sum(val);
                      };
>db.test.mapReduce(                      mapFunction1,                      reduceFunction1,                      { out: "map_reduce_example",query:{val:{$gt:1}} }                    ) //see db.currentOp() in another shell while this one is running

Participants:

 Description   

In db.currentOp() we support status output for MR jobs. However, if query filter is used, we output status incorrectly:

> db.currentOp()
{
	"inprog" : [
		{
			"opid" : 16406,
			"active" : true,
			"secs_running" : 6,
			"op" : "query",
			"ns" : "test.test",
			"query" : {
				"mapreduce" : "test",
				"map" : function () { sleep(1);                        emit(1,this.val);                    },
				"reduce" : function (id, val) {
                          return Array.sum(val);
                      },
				"out" : "map_reduce_example",
				"query" : {
					"val" : {
						"$gt" : 1
					}
				}
			},
			"client" : "127.0.0.1:57129",
			"desc" : "conn7",
			"threadId" : "0x106a87000",
			"connectionId" : 7,
			"locks" : {
				"^" : "r",
				"^test" : "R"
			},
			"waitingForLock" : false,
			"msg" : "m/r: (1/3) emit phase M/R: (1/3) Emit Progress: 5173/1 517300%",
			"progress" : {
				"done" : 5173,
				"total" : 1
			},
			"numYields" : 57,
			"lockStats" : {
				"timeLockedMicros" : {
					"r" : NumberLong(13163605),
					"w" : NumberLong(1055)
				},
				"timeAcquiringMicros" : {
					"r" : NumberLong(6595364),
					"w" : NumberLong(7)
				}
			}
		}
	]
}

We need to handle filtered jobs correctly and not output misleading percentages in db.currentOp(). (output in logs is fine)



 Comments   
Comment by Esha Bhargava [ 04/Feb/22 ]

Closing these tickets as part of the deprecation of mapReduce.

Comment by David Storch [ 16/Aug/19 ]

I've confirmed that this bug still exist in the master branch. CC james.wahlin charlie.swanson

Comment by Alexander Komyagin [ 18/Feb/14 ]

Thanks Dan, that makes sense. In this case we just need to fix the db.currentOp output to handle filtered job correctly and not to report awkward and confusing percentages.

Comment by Daniel Pasette (Inactive) [ 18/Feb/14 ]

Not sure I understand the rest of the description, but the reason the total number of documents is not set when there is a query filter is that it causes the query to be run twice: once to get the total number of documents for reporting and progress meter, and once to actually process the documents. See SERVER-9907. If there is no query filter, grabbing the total number of documents in the collection is free, so the stats are correct in that case.

Is there something else you're implying?

Generated at Thu Feb 08 03:29:20 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.