[SERVER-23456] high latency on simple mongo queries during map-reduce operations Created: 31/Mar/16  Updated: 01/Apr/16  Resolved: 01/Apr/16

Status: Closed
Project: Core Server
Component/s: MapReduce
Affects Version/s: None
Fix Version/s: None

Type: Question Priority: Major - P3
Reporter: pavan Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Participants:

 Description   

Mongo-DB in spite of having adequate resources ( 4 - Core Box ) with single map-reduce operations seems to be chocking down other operations. In our simple tests , with simple collection having 400 records , a select takes more than 16 seconds. The map-reduce is running entirely on a different collection. Also we noticed node is unable to connect - resets the connection and on a new connection query is run which normally takes fraction of a second now takes nearly 16 seconds to comeback with a count of < 300 records.

Conn5865 - Command Count takes 16509 ms while map reduce is running. Under normal load conditions < 100 milliseconds.

, acquireWaitCount:

{ r: 3 }

, timeAcquiringMicros:

{ r: 7055 }

}, Collection: { acquireCount:

{ r: 11 }

} } protocol:op_query 135ms
2016-03-31T14:58:11.322-0700 I COMMAND [conn6097] CMD: drop dim3.tmp.mr.whois.tech_postalcode_56cc44bbf2e5301119d51e37s_3924
2016-03-31T14:58:11.371-0700 I NETWORK [initandlisten] connection accepted from 172.31.32.189:37316 #6167 (363 connections now open)
2016-03-31T14:58:16.007-0700 I - [conn6097] M/R: (3/3) Final Reduce Progress: 50700/54200 93%
2016-03-31T14:58:19.030-0700 I - [conn6097] M/R Reduce Post Processing Progress: 8100/54200 14%
2016-03-31T14:58:22.045-0700 I - [conn6097] M/R Reduce Post Processing Progress: 16600/54200 30%
2016-03-31T14:58:25.028-0700 I - [conn6097] M/R Reduce Post Processing Progress: 24900/54200 45%
2016-03-31T14:58:26.190-0700 I NETWORK [initandlisten] connection accepted from 172.31.32.189:37317 #6168 (364 connections now open)
2016-03-31T14:58:26.191-0700 I NETWORK [conn6166] end connection 172.31.32.189:37315 (363 connections now open)
2016-03-31T14:58:28.004-0700 I - [conn6097] M/R Reduce Post Processing Progress: 32800/54200 60%
2016-03-31T14:58:31.015-0700 I - [conn6097] M/R Reduce Post Processing Progress: 41300/54200 76%
2016-03-31T14:58:34.011-0700 I - [conn6097] M/R Reduce Post Processing Progress: 50000/54200 92%
2016-03-31T14:58:35.490-0700 I COMMAND [conn6097] CMD: drop dim3.tmp.mr.whois.tech_postalcode_56cc44bbf2e5301119d51e37s_3924
2016-03-31T14:58:35.490-0700 I COMMAND [conn6167] command local.oplog.rs command: getMore { getMore: 116290011101, collection: "oplog.rs", maxTimeMS: 5000, term: 17, lastKnownCommittedOpTime:

{ ts: Timestamp 1459461496000|3513, t: 17 }

} cursorid:116290011101 keyUpdates:0 writeConflicts:0 exception: operation exceeded time limit code:50 numYields:0 reslen:74 locks:{ Global: { acquireCount:

{ r: 2 }

, acquireWaitCount:

{ r: 1 }

, timeAcquiringMicros:

{ r: 19300914 }

}, Database: { acquireCount:

{ r: 1 }

}, oplog: { acquireCount:

{ r: 1 }

} } protocol:op_command 19301ms
2016-03-31T14:58:35.490-0700 I NETWORK [conn6167] end connection 172.31.32.189:37316 (362 connections now open)
2016-03-31T14:58:35.492-0700 I NETWORK [conn6168] end connection 172.31.32.189:37317 (361 connections now open)
2016-03-31T14:58:35.493-0700 I NETWORK [initandlisten] connection accepted from 172.31.32.189:37318 #6169 (362 connections now open)
2016-03-31T14:58:35.496-0700 I COMMAND [conn5865] command dim3.scans command: count { count: "scans", query: { $and: [

{ scan_status: "Filter Domains" }

] } } planSummary: COUNT_SCAN

{ scan_status: 1.0 }

keyUpdates:0 writeConflicts:0 numYields:1 reslen:62 locks:{ Global: { acquireCount:

{ r: 4 }

, acquireWaitCount:

{ r: 1 }

, timeAcquiringMicros:

{ r: 16503711 }

}, Database: { acquireCount:

{ r: 2 }

, acquireWaitCount:

{ r: 1 }

, timeAcquiringMicros:

{ r: 5971 }

}, Collection: { acquireCount:

{ r: 2 }

} } protocol:op_query 16509ms
2016-03-31T14:58:35.499-0700 I COMMAND [conn6097] CMD: drop dim3.tmp.mr.whois.tech_postalcode_56cc44bbf2e5301119d51e37s_3924
2016-03-31T14:58:35.530-0700 I COMMAND [conn6097] command dim3.domainstats command: mapReduce { mapreduce: "whois.tech_postalcode_56cc44bbf2e5301119d51e37s", map: "function () {

var key = this._id;
var value=

{'colors':this.colors}

;
if(value.colors){
//coming from aggregate collections
va...", reduce: "function (key, values) {

//printjson(values)
var reduce ={colors:{whois:{}}};
values.forEach(function(value){
var properties = Ob...", verbose: true, out:

{ reduce: "domainstats" }

, jsMode: true, sort:

{ _id: 1 }

, readPreference: "primary" } planSummary: COUNT keyUpdates:0 writeConflicts:0 numYields:850 reslen:218 locks:{ Global: { acquireCount:

{ r: 382226, w: 216813, W: 54201 }

}, Database: { acquireCount:

{ r: 54635, w: 216807, R: 971, W: 8 }

, acquireWaitCount:

{ W: 2 }

, timeAcquiringMicros:

{ W: 24771 }

}, Collection: { acquireCount:

{ r: 54635, w: 108406 }

}, Metadata: { acquireCount:

{ w: 108403 }

}, oplog: { acquireCount:

{ w: 108403 }

} } protocol:op_query 24225ms



 Comments   
Comment by Ramon Fernandez Marina [ 01/Apr/16 ]

Closing this issue as per your latest comment ppeddada.

Comment by pavan [ 31/Mar/16 ]

Looks like Map-Reduce by default locks the entire database as opposed to locking just the collections that its working on, using the 'nonAtomic' option seems to be addressing the issue reported here. This is not an issue anymore.

Comment by pavan [ 31/Mar/16 ]

MongoTop output, clearly shows the only two operations running are map-reduce and the query.

ns total read write 2016-03-31T14:58:36-07:00
local.oplog.rs 68924ms 47146ms 21778ms
dim3.scans 16609ms 16609ms 0ms
dim3.tmp.mr.whois.tech_postalcode_56cc44bbf2e5301119d51e37s_3924 14638ms 16311ms -1672ms
dim3.domainstats 129ms 0ms 129ms
admin.system.roles 0ms 0ms 0ms
admin.system.version 0ms 0ms 0ms
config.locks 0ms 0ms 0ms
dim3.Loading... 0ms 0ms 0ms
dim3.accounts 0ms 0ms 0ms
dim3.activity 0ms 0ms 0ms

Generated at Thu Feb 08 04:03:26 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.