[SERVER-23627] Randomly locking in WiredTiger storage engine Created: 09/Apr/16  Updated: 16/Nov/21  Resolved: 18/Apr/16

Status: Closed
Project: Core Server
Component/s: Performance, WiredTiger
Affects Version/s: 3.0.10
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: Rakesh Kumar Assignee: Unassigned
Resolution: Done Votes: 0
Labels: query, replicaset
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

production


Operating System: ALL
Steps To Reproduce:

Not able to reproduce in staging environment.

Participants:

 Description   

Hi,

last week we have upgraded from mongo 3.0.6 to 3.0.10. And after upgrading server we are getting random locking issues (even for 1 minute).
In general these queries will take ~70ms, but during that time it start taking 70 seconds (for us its an outage).
Tried switching primary to other secondary, but still facing the same issue.

Below are the logs, when it start happening

2016-04-09T08:15:02.106+0530 I NETWORK  [conn222315] end connection 10.14.4.20:51763 (287 connections now open)
2016-04-09T08:15:02.107+0530 I NETWORK  [initandlisten] connection accepted from 10.14.4.20:51770 #222318 (288 connections now open)
2016-04-09T08:15:03.223+0530 I NETWORK  [initandlisten] connection accepted from 10.14.1.108:40449 #222319 (289 connections now open)
2016-04-09T08:15:03.223+0530 I NETWORK  [conn222319] end connection 10.14.1.108:40449 (288 connections now open)
2016-04-09T08:15:03.223+0530 I NETWORK  [initandlisten] connection accepted from 10.14.1.108:40450 #222320 (289 connections now open)
2016-04-09T08:15:03.224+0530 I NETWORK  [initandlisten] connection accepted from 10.14.1.108:40451 #222321 (290 connections now open)
2016-04-09T08:15:03.225+0530 I NETWORK  [conn222321] end connection 10.14.1.108:40451 (289 connections now open)
2016-04-09T08:15:03.225+0530 I NETWORK  [initandlisten] connection accepted from 10.14.1.108:40452 #222322 (290 connections now open)
2016-04-09T08:15:03.232+0530 I NETWORK  [initandlisten] connection accepted from 10.14.1.108:40457 #222323 (291 connections now open)
2016-04-09T08:15:03.340+0530 I COMMAND  [conn222323] command roster_production.$cmd command: count { count: "inventories", query: { current_location.fix_time: { $lte: new Date(1460166903224) }, status: { $in: [ "idle" ] } } } planSummary: COLLSCAN keyUpdates:0 writeConflicts:0 numYields:786 reslen:44 locks:{ Global: { acquireCount: { r: 1574 } }, Database: { acquireCount: { r: 787 } }, Collection: { acquireCount: { r: 787 } } } 107ms
2016-04-09T08:15:03.449+0530 I COMMAND  [conn222323] command roster_production.$cmd command: count { count: "inventories", query: { current_location.fix_time: { $lte: new Date(1460166903224) }, status: { $in: [ "idle" ] } } } planSummary: COLLSCAN keyUpdates:0 writeConflicts:0 numYields:786 reslen:44 locks:{ Global: { acquireCount: { r: 1574 } }, Database: { acquireCount: { r: 787 } }, Collection: { acquireCount: { r: 787 } } } 107ms
2016-04-09T08:15:03.710+0530 I COMMAND  [conn222323] command roster_production.$cmd command: count { count: "inventories", query: { current_location.fix_time: { $lte: new Date(1460166903224) }, status: { $in: [ "idle" ] } } } planSummary: COLLSCAN keyUpdates:0 writeConflicts:0 numYields:786 reslen:44 locks:{ Global: { acquireCount: { r: 1574 } }, Database: { acquireCount: { r: 787 } }, Collection: { acquireCount: { r: 787 } } } 107ms
2016-04-09T08:15:06.767+0530 I NETWORK  [initandlisten] connection accepted from 127.0.0.1:50024 #222324 (292 connections now open)
2016-04-09T08:15:13.233+0530 I NETWORK  [conn222320] end connection 10.14.1.108:40450 (291 connections now open)
2016-04-09T08:15:14.758+0530 I NETWORK  [conn222317] end connection 10.14.1.207:46537 (290 connections now open)
2016-04-09T08:15:14.758+0530 I NETWORK  [initandlisten] connection accepted from 10.14.1.207:46545 #222325 (291 connections now open)
2016-04-09T08:15:23.438+0530 I COMMAND  [conn222184] command local.$cmd command: dbStats { dbStats: 1 } keyUpdates:0 writeConflicts:0 numYields:0 reslen:188 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { R: 1 }, acquireWaitCount: { R: 1 }, timeAcquiringMicros: { R: 76628917 } } } 76629ms
2016-04-09T08:15:23.438+0530 I WRITE    [conn222239] update roster_production.inventories query: { _id: ObjectId('56c6b540af1df3a27122eae6') } update: { $set: { last_request_device_time: new Date(1460169847843), updated_at: new Date(1460169847848), current_location._id: ObjectId('57086c773b6f82232e000000') } } nscanned:1 nscannedObjects:1 nMatched:1 nModified:1 keyUpdates:1 writeConflicts:0 numYields:1 locks:{ Global: { acquireCount: { r: 3, w: 3 } }, Database: { acquireCount: { w: 3 }, acquireWaitCount: { w: 1 }, timeAcquiringMicros: { w: 75590590 } }, Collection: { acquireCount: { w: 2 } }, oplog: { acquireCount: { w: 1 } } } 75590ms
2016-04-09T08:15:23.438+0530 I WRITE    [conn222296] update roster_production.inventories query: { _id: ObjectId('56a3181aaf1df3a27122c482') } update: { $set: { last_request_device_time: new Date(1460169848325), updated_at: new Date(1460169848330), current_location._id: ObjectId('57086c78706d2756aa000000') } } nscanned:1 nscannedObjects:1 nMatched:1 nModified:1 keyUpdates:1 writeConflicts:0 numYields:1 locks:{ Global: { acquireCount: { r: 3, w: 3 } }, Database: { acquireCount: { w: 3 }, acquireWaitCount: { w: 1 }, timeAcquiringMicros: { w: 75106091 } }, Collection: { acquireCount: { w: 2 } }, oplog: { acquireCount: { w: 1 } } } 75106ms
2016-04-09T08:15:23.438+0530 I NETWORK  [conn222184] end connection 127.0.0.1:50008 (290 connections now open)
2016-04-09T08:15:23.438+0530 I WRITE    [conn222231] insert roster_production.legs query: { _id: ObjectId('57086c78706d27547b000004'), session_id: ObjectId('5708621d706d272159000850'), engaged_by: 114135077, coverage: 1471870, wait_time: 686.0, request_time: new Date(1460169847000), lazy: false, source: "fixed", previous_status: "idle", current_status: "engaged", next_status: "idle", inventory_id: ObjectId('56cedb47af1df3a27122ef0d'), updated_at: new Date(1460169848132), created_at: new Date(1460169848132), start: { _id: ObjectId('57086c78706d27547b000002'), time: new Date(1460168370000), location: { _id: ObjectId('57086c78706d27547b000005'), alt: null, accuracy: 20.0, fix_time: new Date(1460168367000), provider: "fused", bearing: 197.0 } }, end: { _id: ObjectId('57086c78706d27547b000003'), time: new Date(1460169846000), location: { _id: ObjectId('57086c78706d27547b000006'), alt: null, accuracy: 20.0, fix_time: new Date(1460169842000), altitude: 570.4, provider: "fused", bearing: 1.0 } } } ninserted:1 keyUpdates:0 writeConflicts:0 numYields:0 locks:{ Global: { acquireCount: { r: 2, w: 2 } }, Database: { acquireCount: { w: 2 }, acquireWaitCount: { w: 1 }, timeAcquiringMicros: { w: 75305189 } }, Collection: { acquireCount: { w: 1 } }, oplog: { acquireCount: { w: 1 } } } 75305ms

Getting Lots of queries like above one.



 Comments   
Comment by Ramon Fernandez Marina [ 18/Apr/16 ]

Thanks for the update rakesh.mib.j.

Comment by Rakesh Kumar [ 17/Apr/16 ]

Thanks, But we got the exact issue after a long discussion with AWS.
There is an issue with EBS volume at AWS end, due to that disk is not available for some duration and during that duration we face this issue.
Please close the ticket and sorry for the delayed response.

Comment by Ramon Fernandez Marina [ 11/Apr/16 ]

rakesh.mib.j, I've created a private upload portal so you can share with us privately and securely the information requested by Alexander above.

Comment by Alexander Gorrod [ 10/Apr/16 ]

rakesh.kumar It would be useful if you can gather:

  • iostat and cpustat output for a time period that spans operations stalls.
  • level 1 mongod log
  • Periodic serverStatus output at 1-second intervals for a time period that spans operations stalls. You can do that by running the following command:

  mongo --eval "while(true) {print(JSON.stringify(db.serverStatus())); sleep(1000)}" >ss.log &

It would also be very valuable to know whether the same issue occurs on the latest stable release of MongoDB which is 3.2.4.

Comment by Rakesh Kumar [ 10/Apr/16 ]

@Dan Pasette : Please let me know what information you need at my end, We are using mms monitoring for these servers. I can provide the same (In private ticket, which can't be seen in public) if needed.

Server Hardware (EC2 Instance Type) : c4.4xlarge, with 500Gb gp2 EBS volume.
Below are mongo server configuration, which we use in mongod.conf (as in all nodes, replicaset cluster).

oplogSize=40000
replSet=PordCluster
storageEngine=wiredTiger
wiredTigerCollectionBlockCompressor=none
wiredTigerIndexPrefixCompression=false

Further we are facing issue in mongo 3.0.10 (after upgrading from 3.0.6) and after some time (after locking release) things become normal (no performance degradation).
And we are doing lots of bulk updates in this cluster, with a batch size of 100.

Please let me know if you need any other details to debug this issue.

Comment by Daniel Pasette (Inactive) [ 10/Apr/16 ]

Hi Rakesh, we don't have enough information to say for sure, but the log messages you've provided are consistent with symptoms reported in SERVER-21782.

Generated at Thu Feb 08 04:03:58 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.