[SERVER-56098] global lock on secondary Created: 14/Apr/21  Updated: 01/Jun/21  Resolved: 01/Jun/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: h jy Assignee: Edwin Zhou
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File Screen Shot 2021-04-21 at 11.28.34 AM.png     PNG File Screen Shot 2021-04-22 at 1.08.50 PM.png     PNG File Screen Shot 2021-04-28 at 9.51.47 AM.png     File metrics.2021-04-15T05-10-37Z-00000     File metrics.2021-04-21T06-11-43Z-000002021-04-21-1618992977     Zip Archive mongo_logs.zip    
Operating System: ALL
Participants:

 Description   

Hi,support !

     We found  slow logs about

2021-04-14T16:54:06.865+0800 I COMMAND [conn671031] command db.table command: getMore \{ getMore: 8060710266719536787, collection: "bik4bos", batchSize: 101, lsid: { id: UUID("6d8a691e-5bff-4699-bfe7-5596cdca38ee"), uid: BinData(0, E12CB5D7782FE6D81309E5B284A1AB6AE43E3C7B01D00039E2B3A012AADA2312) }, $clusterTime: \{ clusterTime: Timestamp(1618390445, 1169), signature: { hash: BinData(0, 31188007A6BAE8EF4B3739C8E7A9D3912FBBE3D0), keyId: 6914165982348967966 } }, $client: \{ driver: { name: "mongo-java-driver|sync", version: "3.10.2" }, os: \{ type: "Linux", name: "Linux", architecture: "amd64", version: "2.6.32-754.23.1.el6.x86_64" }, platform: "Java/Oracle Corporation/1.8.0_66-b17", mongos: \{ host: "VECS018588:30000", client: "10.111.225.126:43716", version: "4.0.12" } }, $configServerState: \{ opTime: { ts: Timestamp(1618390445, 591), t: 6 } }, $db: "db" } originatingCommand: \{ aggregate: "table", pipeline: [ { $match: { cityGuid.value: "16a0d780eeff499e8a348fb29b5cb10b", pos.loc: { $geoWithin: { $geometry: { type: "Polygon", coordinates: [ [ [ 110.3369011480655, 22.66012923621993 ], [ 110.3369011480655, 22.70930431482262 ], [ 110.3637487187408, 22.70930431482262 ], [ 110.3637487187408, 22.66012923621993 ], [ 110.3369011480655, 22.66012923621993 ] ] ] } } }, bikeType.value: 1, opStatus.value: \{ $in: [ 2, 3, 4 ] }, bizType.value: \{ $in: [ 0 ] }, $or: [ \{ alert: { $in: [ "1124", "1122", "103", "101", "1123", "2042", "1125", "1126" ] } }, \{ opFault.value: 1 }, \{ opFaultCheck.value: 1 }, \{ lastScanUser.timestamp: { $gte: 1618131245701, $lte: 1618390445701 }, alert: \{ $in: [ "104" ] } }, \{ alertMsg.104: { $gte: 1531990445701, $lte: 1618368845701 }, alert: \{ $in: [ "104" ] } }, \{ alert: { $in: [ "7777" ] }, fault: \{ $in: [ "1-1013", "1-1002", "1-1001", "1-1000", "1-1004", "1-1015", "1-5", "1-1003", "1-1005", "1-1006", "1-1007", "1-1009", "1-1012", "1-1014", "1-1008", "1-10112", "1-10113", "1-11111" ] } } ] } }, \{ $project: { _id: true, bikeNo: true, pos: { lng: true, lat: true, geo: { geo6: true } } } }, \{ $group: { _id: "$pos.geo.geo6", count: { $sum: { $const: 1 } }, first: \{ $first: "$bikeNo" }, maxLat: \{ $max: "$pos.lat" }, minLat: \{ $min: "$pos.lat" }, maxLng: \{ $max: "$pos.lng" }, minLng: \{ $min: "$pos.lng" } } } ], fromMongos: true, needsMerge: true, mergeByPBRT: false, collation: \{ locale: "simple" }, cursor: \{ batchSize: 0 }, allowImplicitCollectionCreation: true, shardVersion: [ Timestamp(997, 1), ObjectId('5ff4312c8299c11b5bd1abaa') ], lsid: \{ id: UUID("6d8a691e-5bff-4699-bfe7-5596cdca38ee"), uid: BinData(0, E12CB5D7782FE6D81309E5B284A1AB6AE43E3C7B01D00039E2B3A012AADA2312) }, $readPreference: \{ mode: "secondaryPreferred" }, $clusterTime: \{ clusterTime: Timestamp(1618390445, 1150), signature: { hash: BinData(0, 31188007A6BAE8EF4B3739C8E7A9D3912FBBE3D0), keyId: 6914165982348967966 } }, $client: \{ driver: { name: "mongo-java-driver|sync", version: "3.10.2" }, os: \{ type: "Linux", name: "Linux", architecture: "amd64", version: "2.6.32-754.23.1.el6.x86_64" }, platform: "Java/Oracle Corporation/1.8.0_66-b17", mongos: \{ host: "VECS018588:30000", client: "10.111.225.126:43716", version: "4.0.12" } }, $configServerState: \{ opTime: { ts: Timestamp(1618390445, 591), t: 6 } }, $db: "db" } planSummary: IXSCAN \{ cityGuid.value: 1, pos.loc: "2dsphere" } cursorid:8060710266719536787 keysExamined:0 docsExamined:0 cursorExhausted:1 numYields:0 nreturned:0 reslen:381 locks:\{ Global: { acquireCount: { r: 3 }, acquireWaitCount: \{ r: 1 }, timeAcquiringMicros: \{ r: 1148383 } }, Database: \{ acquireCount: { r: 2 } }, Collection: \{ acquireCount: { r: 1 } } } storage:{} protocol:op_msg 1148ms

   We noticed the global lock had been sloved on 3.6 version , but our mongo version is 4.0.12



 Comments   
Comment by Edwin Zhou [ 01/Jun/21 ]

Hi 18621325186@163.com,

Thank you for the clarification. Since we're unable to further diagnose without additional lock metrics which are acquirable by upgrading to MongoDB v4.2, I'll close this ticket as incomplete. When you're able to upgrade, please let us know and we will be happy to reopen this ticket and further investigate this behavior.

Best,
Edwin

Comment by h jy [ 31/May/21 ]

Hi ,support!

       Sorry , I meant read preference before.

Comment by Edwin Zhou [ 29/May/21 ]

Hi 18621325186@163.com,

Thank you for following up with seeing a reduction in slow logs after modifying your setting.

Could you clarify what you mean by setting readConcern to primary? Do you mean read preference?

Best,
Edwin

Comment by h jy [ 21/May/21 ]

Hi,support!

     We don't update the MongoDB version ,  because we will spend a long time to deal with it , we set readConcern to primay and the slow logs have been decreased.

Comment by Edwin Zhou [ 20/May/21 ]

Hi 18621325186@163.com,

We still need additional information to diagnose the problem. If this is still an issue for you, would you be able to upgrade to MongoDB v4.2 and see if this issue persists? If it does, can you provide the logs and FTDC that cover the behavior?

Best,
Edwin

Comment by Edwin Zhou [ 06/May/21 ]

Hi 18621325186@163.com,

Thank you for your patience while we further investigate this issue. From these metrics, we're not able to tell what behavior is causing these slow logs.

In MongoDB v4.2, we introduce finer granularity in Global lock metrics. We'll be able to further our understanding of this issue if we see these metrics. If possible, would you be able to upgrade to MongoDB v4.2 and see if you come across this issue?

If so, would you be able to provide the slow log line and the FTDC that covers the slow log line?

Best,
Edwin

Comment by h jy [ 06/May/21 ]

Is there any progress?

Comment by h jy [ 23/Apr/21 ]

1.

//{{ "_id" : "dmsBik4Bos_0", "version" : 1, "protocolVersion" : NumberLong(1), "writeConcernMajorityJournalDefault" : true, "members" : [ { "_id" : 0, "host" : "VECS018582:27000", "arbiterOnly" : false, "buildIndexes" : true, "hidden" : false, "priority" : 2, "tags" : {
 }, "slaveDelay" : NumberLong(0), "votes" : 1 }, { "_id" : 1, "host" : "VECS018589:27000", "arbiterOnly" : false, "buildIndexes" : true, "hidden" : false, "priority" : 1, "tags" : {
 }, "slaveDelay" : NumberLong(0), "votes" : 1 }, { "_id" : 2, "host" : "VECS018597:27000", "arbiterOnly" : false, "buildIndexes" : true, "hidden" : false, "priority" : 1, "tags" : {
 }, "slaveDelay" : NumberLong(0), "votes" : 1 } ], "settings" : { "chainingAllowed" : true, "heartbeatIntervalMillis" : 2000, "heartbeatTimeoutSecs" : 10, "electionTimeoutMillis" : 10000, "catchUpTimeoutMillis" : -1, "catchUpTakeoverDelayMillis" : 30000, "getLastErrorModes" : {
 }, "getLastErrorDefaults" : { "w" : 1, "wtimeout" : 0 }, "replicaSetId" : ObjectId("5ff40d48408c7c7d4e87d41d") }}

2.

//{{ "featureCompatibilityVersion" : { "version" : "4.0" }, "ok" : 1, "operationTime" : Timestamp(1619147264, 1704), "$gleStats" : { "lastOpTime" : Timestamp(0, 0), "electionId" : ObjectId("7fffffff0000000000000005") }, "lastCommittedOpTime" : Timestamp(1619147264, 1704), "$configServerState" : { "opTime" : { "ts" : Timestamp(1619147264, 800), "t" : NumberLong(6) } }, "$clusterTime" : { "clusterTime" : Timestamp(1619147264, 1704), "signature" : { "hash" : BinData(0,"7/W8VTEfrUFuYoHBT9eqUNFNixs="), "keyId" : NumberLong("6914165982348967966") } }}

3.  readConcern level is local as default

Comment by Edwin Zhou [ 22/Apr/21 ]

Hi 18621325186@163.com,

After taking a closer look at the diagnostics and your logs that you've provided, we've found it interesting that a Global W lock is acquired ~150/s before the time of the slow op being logged.

To help us further investigate out hypotheses, can you provide us the following information?

  1. Your replica set configuration, which you can get by running rs.conf() on one of the replica set nodes.
  2. Your FeatureCompatibilityVersion, which you can find by running

    db.adminCommand( { getParameter: 1, featureCompatibilityVersion: 1 } )

  3. What readConcern level are you using?

Best,
Edwin

Comment by h jy [ 22/Apr/21 ]

     All nodes have these slow logs , i  will provide an other node's slow logs .  we want to solve the issue  about  slow quries.  

     Thanks.

 

 

Comment by Edwin Zhou [ 21/Apr/21 ]

Hi 18621325186@163.com,

Thanks for following up with some more information. Could you provide the complete logs from all nodes in the cluster? Additionally, is this behavior causing adverse issues such as performance regressions? Do all the nodes experience the same issue or only this specific node?

Best,
Edwin

Comment by h jy [ 21/Apr/21 ]

We find a lot of slow queries with long time to spend on global 'r' lock, we  don't kown what this happend  to lead these queries so slowly.

//2021-04-21T15:33:53.421+0800 I COMMAND  [conn832132] command dmsBik4Bos.bik4bos command: getMore { getMore: 7282293897026165244, collection: "bik4bos", batchSize: 101, lsid: { id: UUID("e6fc871e-fa36-4a58-9a0d-379d67da9da9"), uid: BinData(0, E12CB5D7782FE6D81309E5B284A1AB6AE43E3C7B01D00039E2B3A012AADA2312) }, $clusterTime: { clusterTime: Timestamp(1618990431, 2142), signature: { hash: BinData(0, 9061BBAB4470E7C6802F757BA46901089F046DA8), keyId: 6914165982348967966 } }, $client: { driver: { name: "mongo-java-driver|sync", version: "3.10.2" }, os: { type: "Linux", name: "Linux", architecture: "amd64", version: "2.6.32-754.23.1.el6.x86_64" }, platform: "Java/Oracle Corporation/1.8.0_66-b17", mongos: { host: "VECS018596:30000", client: "10.111.225.121:59088", version: "4.0.12" } }, $configServerState: { opTime: { ts: Timestamp(1618990430, 3733), t: 6 } }, $db: "dmsBik4Bos" } originatingCommand: { aggregate: "bik4bos", pipeline: [ { $match: { cityGuid.value: "bb1d85669e9144938a549d8d2447a4d1", pos.loc: { $geoWithin: { $geometry: { type: "Polygon", coordinates: [ [ [ 116.3973536078009, 39.89739003241792 ], [ 116.3973536078009, 39.90888160244508 ], [ 116.4057811425808, 39.90888160244508 ], [ 116.4057811425808, 39.89739003241792 ], [ 116.3973536078009, 39.89739003241792 ] ] ] } } }, bikeType.value: 1, opStatus.value: { $in: [ 2, 3, 4 ] }, bizType.value: { $in: [ 0 ] } } }, { $project: { _id: true, bikeNo: true, pos: { lng: true, lat: true, geo: { geo6: true } } } }, { $group: { _id: "$pos.geo.geo6", count: { $sum: { $const: 1 } }, first: { $first: "$bikeNo" }, maxLat: { $max: "$pos.lat" }, minLat: { $min: "$pos.lat" }, maxLng: { $max: "$pos.lng" }, minLng: { $min: "$pos.lng" } } } ], fromMongos: true, needsMerge: true, mergeByPBRT: false, collation: { locale: "simple" }, cursor: { batchSize: 0 }, allowImplicitCollectionCreation: true, shardVersion: [ Timestamp(996, 1), ObjectId('5ff4312c8299c11b5bd1abaa') ], lsid: { id: UUID("e6fc871e-fa36-4a58-9a0d-379d67da9da9"), uid: BinData(0, E12CB5D7782FE6D81309E5B284A1AB6AE43E3C7B01D00039E2B3A012AADA2312) }, $readPreference: { mode: "secondaryPreferred" }, $clusterTime: { clusterTime: Timestamp(1618990431, 2049), signature: { hash: BinData(0, 9061BBAB4470E7C6802F757BA46901089F046DA8), keyId: 6914165982348967966 } }, $client: { driver: { name: "mongo-java-driver|sync", version: "3.10.2" }, os: { type: "Linux", name: "Linux", architecture: "amd64", version: "2.6.32-754.23.1.el6.x86_64" }, platform: "Java/Oracle Corporation/1.8.0_66-b17", mongos: { host: "VECS018596:30000", client: "10.111.225.121:59088", version: "4.0.12" } }, $configServerState: { opTime: { ts: Timestamp(1618990430, 3733), t: 6 } }, $db: "dmsBik4Bos" } planSummary: IXSCAN { cityGuid.value: 1, pos.loc: "2dsphere" } cursorid:7282293897026165244 keysExamined:0 docsExamined:0 cursorExhausted:1 numYields:0 nreturned:0 reslen:381 locks:{ Global: { acquireCount: { r: 3 }, acquireWaitCount: { r: 1 }, timeAcquiringMicros: { r: 2151651 } }, Database: { acquireCount: { r: 2 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_msg 2151ms

  this a new slow query about the issue , i wil upload the new FTDC .

Comment by Edwin Zhou [ 20/Apr/21 ]

Hi 18621325186@163.com,

Thanks for your report. I'm not sure what is the issue you're seeing; slow logs are expected when a query takes longer than 100ms. The FTDC data you provided also does not cover the log line you provided in the description. The metrics begin on 2021-04-15T05:10:37 and the log line is out of that range. Could you help us by clarifying the issue you're seeing?

Best,
Edwin

Generated at Thu Feb 08 05:38:19 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.