[SERVER-23433] High acquire time for Global Intent Shared lock using WiredTiger Created: 30/Mar/16  Updated: 24/Aug/17  Resolved: 17/May/16

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: 3.2.3
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Greg Murphy Assignee: Unassigned
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File perf_report.txt.gz     PNG File s23433.png    
Issue Links:
Duplicate
duplicates WT-1598 Remove the schema, table locks Closed
Related
Operating System: ALL
Participants:
Case:

 Description   

I'm seeing a number of queries that are experiencing very high execution times, almost entirely due to timeAcquiringMicros against the Global Intent Shared lock.

Some examples of the entries from the logs (server, db and collection names anonymised):

Mar 30 15:17:33 SERVER mongod.27017[1461]: [conn2254368] command DB.COLLECTION command: count { count: "requestsByUserState", query: { timeGroup: "2016-03-24" } } planSummary: COUNT_SCAN { timeGroup: -1 } keyUpdates:0 writeConflicts:0 numYields:0 reslen:62 locks:{ Global: { acquireCount: { r: 2 }, acquireWaitCount: { r: 1 }, timeAcquiringMicros: { r: 32914737 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_query 32915ms
 
Mar 30 15:25:06 SERVER mongod.27017[1461]: [conn1899344] command DB.COLLECTION command: aggregate { aggregate: "scriptLogLevelsCount.DAILY", pipeline: [ { $match: { _id: { $gte: "2016-03-30", $lt: "2016-03-31" } } }, { $group: { _id: null, FATAL: { $sum: "$scriptLogLevelsCount.FATAL" }, ERROR: { $sum: "$scriptLogLevelsCount.ERROR" }, WARN: { $sum: "$scriptLogLevelsCount.WARN" }, INFO: { $sum: "$scriptLogLevelsCount.INFO" }, DEBUG: { $sum: "$scriptLogLevelsCount.DEBUG" } } } ] } keyUpdates:0 writeConflicts:0 numYields:1 reslen:138 locks:{ Global: { acquireCount: { r: 8 }, acquireWaitCount: { r: 4 }, timeAcquiringMicros: { r: 125415124 } }, Database: { acquireCount: { r: 4 } }, Collection: { acquireCount: { r: 4 } } } protocol:op_query 125416ms
 
Mar 30 16:03:07 SERVER mongod.27017[1461]: [conn2261655] command DB.COLLECTION command: getMore { getMore: 266905304107253, collection: "COLLECTION" } cursorid:266905304107253 keyUpdates:0 writeConflicts:0 numYields:36 nreturned:4724 reslen:4195106 locks:{ Global: { acquireCount: { r: 74 }, acquireWaitCount: { r: 8 }, timeAcquiringMicros: { r: 88559517 } }, Database: { acquireCount: { r: 37 } }, Collection: { acquireCount: { r: 37 } } } protocol:op_query 88570ms

We're seeing thousands of these per day, from what I can see always on secondary servers, but in this environment we perform the majority of our queries against the secondaries. (Its an analytics processing platform, so is write-heavy on the master and it doesn't matter if queries are slightly stale)

When the slow queries happen, there are often a number of them clustered together during the same or subsequent seconds, which makes me wonder if this is a WiredTiger checkpoint? There are a lot of databases in this replica set, meaning a significant number of files to checkpoint.

Environment details:
3 node replica set
MongoDB 3.2.3 64-bit
WiredTiger storage engine (default config apart from "directoryForIndexes: true")
Ubuntu 14.04.3 LTS
Azure VMs - Standard DS13 (8 Cores, 56 GB memory)
~5,700 databases
~293,000 WiredTiger data and index files



 Comments   
Comment by Kelsey Schubert [ 17/May/16 ]

Hi gregmurphy,

Thank you for the detailed workload description. The issue here is that the checkpoint handle aquire phase will block opening a cursor on any table if it is the first time that table is being opened in a particular WiredTiger session. When there are a smaller number of collections/indexes, MongoDB caches cursors to all tables, and thus operations generally do not need to open tables. However, the size of those caches is limited (to keep them efficient, and avoid memory overhead) which leads to apply ops needing to open tables. We are working on WT-1598 to improve this behavior.

Please feel to vote for WT-1598 and watch it for updates.

Kind regards,
Thomas

Comment by Greg Murphy [ 18/Apr/16 ]

Sure - its easier to describe narratively rather than answering the questions in order, but hopefully this covers everything:

This particular MongoDB cluster contains analytics data for our multi-tenant platform, with a database per customer (4,728 of those at the moment). Depending on the data generated by that customer, each database can contain up to 55 collections and 58 indexes. The data ingested into the database is generated every 10 minutes and ingested as close to generation as possible. All customers don't continually generate data, so some databases are more active than others.

During the ingestion process (which is parallelised with 7 processes running concurrently, each one working on a subset of databases) the current data state is read into memory by the process from the secondary MongoDB nodes. The ingestion process then merges this with the incoming data and persists the results back to MongoDB before performing some aggregations and persisting the results of them. This persistence results in updates/inserts happening on the against the majority of the collections in the database being processed, but as these are aggregated results the number of documents affected is low.

New databases, collections and indexes are only created when new customers arrive. In the last 8 hours there have been 58 new databases created, so not a high frequency of object creations.

Its difficult to say precisely how many collections and indexes are actively in use at any time as it depends on the processing point that each ingestion process is at. Also end-user queries can hit this cluster (from a web portal).

Hope that helps, please let me know if I've missed anything.

Comment by Kelsey Schubert [ 18/Apr/16 ]

Hi gregmurphy,

The number of connection data handles that are currently active (around 200 thousand) is very large is likely explains the issue you are experiencing. When WiredTiger creates a checkpoint it needs to look at each open table (collection or index). To ensure an unchanging set of tables, it takes a lock during this process. Consequently, this process may block other operations including a checkpoint. It is worth noting that the code that gathers these handles was modified in the 3.2.5 release, which explains the difference in the behavior you observed.

Please see the attached FTDC graph from your node on MongoDB 3.2.5. We can see that the spikes in read time acquiring and FTDC dropouts are aligned with when checkpoints start.

To continue to investigate, can you please describe some details about your workload? In particular please be sure to answer the following questions:

  1. How many collections and indexes are actively in use at a particular point in time?
  2. How often are collections created? Is there a pattern to their creation (eg. one every second, or a thousand once per minute)?
  3. How are updates spread across the collections?

Thank you,
Thomas

Comment by Greg Murphy [ 18/Apr/16 ]

Report attached as requested. For the purpose of this test, we upgraded a single secondary to 3.2.5 again and ran perf against that secondary.

Comment by Kelsey Schubert [ 15/Apr/16 ]

Hi gregmurphy,

Can you please upload the perf report while you are experiencing this issue by following the steps below?

  1. run the following command for a couple seconds:

    sudo perf record -a -g -s -p <pid> -o <output.data.filename>
    

  2. change the permissions of the outputted file:

    sudo chown <user> <output.data.filename>
    

  3. generate the report:

    perf report -i <output.data.filename> >> perf_report.txt
    

  4. attach perf_report.txt to this ticket.

Thank you,
Thomas

Comment by Greg Murphy [ 15/Apr/16 ]

Additionally, performance actually seems worse under 3.2.5 - initial analysis of the slow log is showing a more predictable mean of response time of very slow queries (>60s) but a significant increase in their frequency. In all the examples I've looked at almost all of the response time is still waiting for the global lock.

Interestingly, average checkpoint time appears to have gone down.

We will revert to 3.2.4 to stabilise the system, but I will capture all of the diagnostic.data files from each node beforehand and upload them. Please let me know if there is any further debugging I can enable.

Comment by Greg Murphy [ 14/Apr/16 ]

OK - the file is uploading to the same secure URL you provided earlier in the ticket.

Thanks

Greg

Comment by Kelsey Schubert [ 14/Apr/16 ]

HI gregmurphy,

Sorry you are still experiencing issues, please upload the latest diagnostic.data and we will continue to investigate.

Thank you,
Thomas

Comment by Greg Murphy [ 14/Apr/16 ]

We upgraded one of the secondaries today, but unfortunately are still seeing the same behaviour on it. Examples:

Apr 14 13:19:07 SERVER mongod.27017[33669]: [conn11317] command DB.COLLECTION command: getMore { getMore: 21283805213298, collection: "COLLECTION" } cursorid:21283805213298 keyUpdates:0 writeConflicts:0 numYields:28 nreturned:3603 reslen:4195489 locks:{ Global: { acquireCount: { r: 58 }, acquireWaitCount: { r: 8 }, timeAcquiringMicros: { r: 50331051 } }, Database: { acquireCount: { r: 29 } }, Collection: { acquireCount: { r: 29 } } } protocol:op_query 50343ms
 
Apr 14 13:19:06 SERVER mongod.27017[33669]: [conn12233] command DB.COLLECTION command: count { count: "COLLECTION", query: { timeGroup: "2016-04-12" } } planSummary: COUNT_SCAN { timeGroup: -1 } keyUpdates:0 writeConflicts:0 numYields:59 reslen:62 locks:{ Global: { acquireCount: { r: 120 }, acquireWaitCount: { r: 3 }, timeAcquiringMicros: { r: 46770101 } }, Database: { acquireCount: { r: 60 } }, Collection: { acquireCount: { r: 60 } } } protocol:op_query 46772ms
 
Apr 14 13:19:06 SERVER mongod.27017[33669]: [conn12256] command DB.COLLECTION command: find { find: "COLLECTION", filter: { _id: "20160414T1240-gsp-aeu000-mo04.gsp-aeu-1460637943500.gz" }, limit: 1, singleBatch: true } planSummary: IDHACK keysExamined:0 docsExamined:0 idhack:1 cursorExhausted:1 keyUpdates:0 writeConflicts:0 numYields:0 nreturned:0 reslen:143 locks:{ Global: { acquireCount: { r: 2 }, acquireWaitCount: { r: 1 }, timeAcquiringMicros: { r: 49799814 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_query 49938ms

Would the updated diagnostic.data files help?

Comment by Greg Murphy [ 13/Apr/16 ]

Thanks - much appreciated - we'll upgrade as soon as the release is out and confirm.

Comment by Kelsey Schubert [ 12/Apr/16 ]

Hi gregmurphy,

I have examined the diagnostic data you have provided. You are correct that this issue is correlated to WiredTiger checkpoints. In particular, this issue may be related to a lock WiredTiger takes during checkpoints that will block workloads that create or drop collections or indexes. WT-2346 improves this behavior, and will be included in the upcoming 3.2.5 release.

Please upgrade to MongoDB 3.2.5 when it released and confirm if the issue is resolved.

Thank you,
Thomas

Comment by Greg Murphy [ 31/Mar/16 ]

Thanks Thomas, I've just uploaded the archives from both secondaries.

Comment by Kelsey Schubert [ 31/Mar/16 ]

Hi gregmurphy,

I have created a secure upload portal for you to use. Files you upload to this portal will only be visible to MongoDB staff investigating this ticket.

Kind regards,
Thomas

Comment by Greg Murphy [ 31/Mar/16 ]

Thanks for the quick reply - can you give me a private upload link so that I don't have to attach all the data to the ticket please? (I've got it ready to send when you do).

Comment by Ramon Fernandez Marina [ 30/Mar/16 ]

gregmurphy, it is possible that the large number of files is indeed slowing things down. Can you please archive and upload the contents of the diagnostic.data directory for the affected secondaries? Seeing the full logs would also help determine if the issue is related to the number of files or if it's something else.

Thanks,
Ramón.

Generated at Thu Feb 08 04:03:22 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.