Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Duplicate
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 3.2.3
Component/s: WiredTiger
Labels:
None

Operating System:
ALL
Case:
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

I'm seeing a number of queries that are experiencing very high execution times, almost entirely due to timeAcquiringMicros against the Global Intent Shared lock.

Some examples of the entries from the logs (server, db and collection names anonymised):

Mar 30 15:17:33 SERVER mongod.27017[1461]: [conn2254368] command DB.COLLECTION command: count { count: "requestsByUserState", query: { timeGroup: "2016-03-24" } } planSummary: COUNT_SCAN { timeGroup: -1 } keyUpdates:0 writeConflicts:0 numYields:0 reslen:62 locks:{ Global: { acquireCount: { r: 2 }, acquireWaitCount: { r: 1 }, timeAcquiringMicros: { r: 32914737 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_query 32915ms

Mar 30 15:25:06 SERVER mongod.27017[1461]: [conn1899344] command DB.COLLECTION command: aggregate { aggregate: "scriptLogLevelsCount.DAILY", pipeline: [ { $match: { _id: { $gte: "2016-03-30", $lt: "2016-03-31" } } }, { $group: { _id: null, FATAL: { $sum: "$scriptLogLevelsCount.FATAL" }, ERROR: { $sum: "$scriptLogLevelsCount.ERROR" }, WARN: { $sum: "$scriptLogLevelsCount.WARN" }, INFO: { $sum: "$scriptLogLevelsCount.INFO" }, DEBUG: { $sum: "$scriptLogLevelsCount.DEBUG" } } } ] } keyUpdates:0 writeConflicts:0 numYields:1 reslen:138 locks:{ Global: { acquireCount: { r: 8 }, acquireWaitCount: { r: 4 }, timeAcquiringMicros: { r: 125415124 } }, Database: { acquireCount: { r: 4 } }, Collection: { acquireCount: { r: 4 } } } protocol:op_query 125416ms

Mar 30 16:03:07 SERVER mongod.27017[1461]: [conn2261655] command DB.COLLECTION command: getMore { getMore: 266905304107253, collection: "COLLECTION" } cursorid:266905304107253 keyUpdates:0 writeConflicts:0 numYields:36 nreturned:4724 reslen:4195106 locks:{ Global: { acquireCount: { r: 74 }, acquireWaitCount: { r: 8 }, timeAcquiringMicros: { r: 88559517 } }, Database: { acquireCount: { r: 37 } }, Collection: { acquireCount: { r: 37 } } } protocol:op_query 88570ms

We're seeing thousands of these per day, from what I can see always on secondary servers, but in this environment we perform the majority of our queries against the secondaries. (Its an analytics processing platform, so is write-heavy on the master and it doesn't matter if queries are slightly stale)

When the slow queries happen, there are often a number of them clustered together during the same or subsequent seconds, which makes me wonder if this is a WiredTiger checkpoint? There are a lot of databases in this replica set, meaning a significant number of files to checkpoint.

Environment details:
3 node replica set
MongoDB 3.2.3 64-bit
WiredTiger storage engine (default config apart from "directoryForIndexes: true")
Ubuntu 14.04.3 LTS
Azure VMs - Standard DS13 (8 Cores, 56 GB memory)
~5,700 databases
~293,000 WiredTiger data and index files

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

perf_report.txt.gz
3.38 MB
Apr 18 2016 11:14:43 AM UTC
s23433.png
74 kB
Apr 18 2016 05:51:30 AM UTC

duplicates

WT-1598 Remove the schema, table locks

Closed

Assignee:: Unassigned
Reporter:: Greg Murphy
Participants:: Greg Murphy, Kelsey Schubert, Ramon Fernandez Marina
Votes:: 0 Vote for this issue
Watchers:: 14 Start watching this issue

Created:: Mar 30 2016 04:22:52 PM UTC
Updated:: Aug 24 2017 02:21:12 PM UTC
Resolved:: May 17 2016 07:17:34 PM UTC

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates