[SERVER-31359] when large inserts into mongo, lots of global lock occur Created: 03/Oct/17  Updated: 08/Feb/23  Resolved: 07/Nov/17

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Question Priority: Major - P3
Reporter: shawn Assignee: Mark Agarunov
Resolution: Done Votes: 0
Labels: MAREF
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Zip Archive diagnostic.data.zip     PNG File ftdc.png     PNG File ftdc_locks.png     File mongolog_0926.tar.gz    
Issue Links:
Related
related to SERVER-21858 A high throughput update workload in ... Closed
related to SERVER-20328 Allow secondary reads while applying ... Closed
is related to SERVER-31399 repl.apply.batches.totalMillis does n... Closed
Participants:

 Description   

hi

my applicaton use secondary as a read concern of mongodb.
when i import lots of documents , developer notify me that their queries time out.
so ,i get some slowlog form errlog as follows :

-----------------------------------------------------------------------------------------------------
2017-09-26T09:02:09.722+0800 I COMMAND  [conn1680898] command 
db.table command: find { find: "table", filter: { sid: { $in: [ "x", "y", "z"] } }, projection: { _id: 0, bssid: 1, loc: 1, accuracy: 1 } } 
planSummary: IXSCAN { sid: 1 } keysExamined:6,docsExamined:3 cursorExhausted:1 numYields:0 nreturned:3 reslen:959 
locks:{ Global: { acquireCount: { r: 2 }, acquireWaitCount: { r: 1 }, *+timeAcquiringMicros: { r: 12923929+* } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_query 12933ms
--------------------------------------------------------------------------------------------------------

we can see locks section, timeAcquiringMicros is 12923929. wait time is so long.

As we known lock of WiredTiger is document level , why a query wait so long .

seeing from db.currentOp(), as following:

---------------------------------------------------------------------------------------------------------
                        "desc" : "rsSync",
                        "threadId" : "140284498736896",
                        "active" : true,
                        "opid" : 7,
                        "secs_running" : 3118,
                        "microsecs_running" : NumberLong("3118657970"),
                        "op" : "none",
                        "ns" : "local.replset.minvalid",
                        "query" : {
 
                        },
                        "numYields" : 0,
                        "locks" : {
                                "Global" : "W"
                        },
                        "waitingForLock" : true,
                        "lockStats" : {
                                "Global" : {
                                        "acquireCount" : {
                                                "r" : NumberLong(258933),
                                                "w" : NumberLong(518376),
                                                "R" : NumberLong(1),
                                                "W" : NumberLong(129595)
                                        },
                                        "acquireWaitCount" : {
                                                "W" : NumberLong(68687)
                                        },
                                        "timeAcquiringMicros" : {
                                                "W" : NumberLong(51701628)
                                        }
                                },
-----------------------------------------------------------------------------------------------

the rssync thead holds a global lock, and lockstats.Global.acquireCount.W is bigger and bigger。

so , the root case is the rssync thead holds global lock .

I use ycsb to benchmark a new replset reproduce this case. And monitor of global lock is high.



 Comments   
Comment by Mark Agarunov [ 07/Nov/17 ]

Hello shawn_001,

Thank you for the additional information. After investigating, I believe this behavior is due to the same underlying issue as SERVER-21858, where an update heavy workload causes the secondary to perform applyOps continuously due to a bottleneck, which will block reads from the secondary. According to the data you've provided, it is likely that the bottleneck ion the secondary may be the CPU, as it appears to be running at maximum utilization. This behavior should be fixed by the changes detailed in SERVER-20328, so I have closed this ticket. Please follow SERVER-20328 for updates on this issue.

Thanks,
Mark

Comment by shawn [ 04/Oct/17 ]

Hi
The entire log of mongod is big ,i choose the section of 2017-09-26 , which is the time of question occur.

I hope that we could resovle this question.

Thanks.

Comment by Mark Agarunov [ 03/Oct/17 ]

Hello shawn_001,

Thank you for the report. To get a better idea of what may be causing this, could you please provide the following:

  • The complete logs from all affected mongod nodes experiencing this issue,
  • Please archive (tar or zip) the $dbpath/diagnostic.data directory from all affected mongod nodes

This should give us a better look into what may be causing this lock to be held open.

Thanks,
Mark

Generated at Thu Feb 08 04:26:47 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.