[SERVER-47032] Long running exclusive locks can end up blocking all operations if all tickets are exhausted Created: 20/Mar/20  Updated: 27/Oct/23  Resolved: 01/Apr/20

Status: Closed
Project: Core Server
Component/s: Index Maintenance, Replication, Sharding, Stability
Affects Version/s: 3.6.13
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Kay Agahd Assignee: Carl Champain (Inactive)
Resolution: Works as Designed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File CPU_mongo-hotel01-01.png     PNG File opcounters_mongo-hotel01-01.png     PNG File rwTickets_mongo-hotel01-01.png     PNG File t2_screenshot.png     PNG File traffic_mongo-hotel01-01.png    
Issue Links:
Related
related to DOCS-13557 add warnings to all the commands that... Closed
is related to SERVER-45899 createIndex in foreground blocked *al... Closed
Operating System: ALL
Steps To Reproduce:

Here are some chronological traces:

Command convertToCapped executed at 11:54:13 and terminated at 13:15:34:

PRO [21:39][root@mongo-hotel01-01.db00.pro07:/home/kay.agahd]# ✔ grep "convertToCapped" /data/sharedSystem01/log/sharedSystem01.log
2020-03-20T11:54:13.749+0100 I STORAGE [conn39183053] createCollection: mongodba.tmphDRLk.convertToCapped.slowops with generated UUID: 90759c88-dcce-463a-9e95-e29c54ab1d00
2020-03-20T13:15:08.832+0100 I COMMAND [conn39183053] renameCollection: renaming collection no UUID from mongodba.tmphDRLk.convertToCapped.slowops to mongodba.slowops
2020-03-20T13:15:34.323+0100 I COMMAND [conn39183053] command mongodba.tmphDRLk.convertToCapped.slowops appName: "MongoDB Shell" command: convertToCapped { convertToCapped: "slowops", size: 69793218560.0, shardVersion: [ Timestamp(0, 0), ObjectId('000000000000000000000000') ], lsid: { id: UUID("fbe556af-08f9-46af-89fc-09fbde81142a"), uid: BinData(0, 3B408CB48548B5037822C10EB0976B3CBF2CEE3BF9C708796BF03941FBECD80F) }, $clusterTime: { clusterTime: Timestamp(1584701653, 8698), signature: { hash: BinData(0, BC39D22EAB7430F38A7C3701C473DD02FCCB3D2B), keyId: 6796240667325497611 } }, $client: { application: { name: "MongoDB Shell" }, driver: { name: "MongoDB Internal Client", version: "3.6.13" }, os: { type: "Linux", name: "PRETTY_NAME="Debian GNU/Linux 9 (stretch)"", architecture: "x86_64", version: "Kernel 4.9.0-11-amd64" }, mongos: { host: "mongo-hotel-01:27017", client: "127.0.0.1:42800", version: "3.6.13" } }, $configServerState: { opTime: { ts: Timestamp(1584701649, 3098), t: 14 } }, $db: "mongodba" } numYields:0 reslen:288 locks:{ Global: { acquireCount: { r: 223917487, w: 223917487 } }, Database: { acquireCount: { w: 223917484, W: 3 } }, Metadata: { acquireCount: { W: 223917482 } }, oplog: { acquireCount: { w: 223917484 } } } protocol:op_msg 4880614ms
PRO [22:23][root@mongo-hotel01-01.db00.pro07:/home/kay.agahd]

Index Build started at 13:15:38 and finished at 17:58:25.

2020-03-20T13:15:38.654+0100 I - [conn39189395] Index Build (background): 2200/223918168 0%
...
2020-03-20T14:45:33.113+0100 I - [conn39232652] Index Build (background): 5407600/224961851 2%
2020-03-20T17:58:15.033+0100 I - [conn39266887] Index Build (background): 230800/1292208 17%
2020-03-20T17:58:18.003+0100 I - [conn39266887] Index Build (background): 529900/1292208 41%
2020-03-20T17:58:21.002+0100 I - [conn39266887] Index Build (background): 858200/1292208 66%
2020-03-20T17:58:24.006+0100 I - [conn39266887] Index Build (background): 1142800/1292208 88%
2020-03-20T17:58:25.451+0100 I INDEX [conn39266887] build index done. scanned 1292208 total records. 13 secs

The first 17% of the index build took nearly 5 hours while the rest took only a few seconds. Or was the index build killed by my command killOp or killSession, executed at 14:53:25 eventually?

2020-03-20T14:53:25.208+0100 I COMMAND [conn39249255] command admin.$cmd appName: "MongoDB Shell" command: killSessions { killSessions: [ { id: UUID("81914921-f985-464d-ad3a-7ab007c2b895"), uid: BinData(0, 4E5C22C0742A0A942E4361737D42A397DBF0E56F30DC26CB617A186C91FF4A48) } ], lsid: { id: UUID("be3079b7-ffb7-435c-938e-262ca10da9a6") }, $clusterTime: { clusterTime: Timestamp(1584712403, 84), signature: { hash: BinData(0, A820811A96105BC4B5D4ADF5FBACAD96714937D2), keyId: 6796240667325497611 } }, $db: "admin" } numYields:0 reslen:268 locks:{} protocol:op_msg 231ms

At around 14:20 we remark that read and write ticket are out, so we increase it from 256 to 512:

2020-03-20T14:24:08.435+0100 I COMMAND [conn39247774] successfully set parameter wiredTigerConcurrentWriteTransactions to 512.0 (was 256)
2020-03-20T14:24:27.886+0100 I COMMAND [conn39247774] successfully set parameter wiredTigerConcurrentReadTransactions to 512.0 (was 256)

At around 15:00 we remark that replication got stuck on both secondaries.
At 15:19:41 we restart one secondary to see if replication will work again, but no success.

2020-03-20T15:19:41.790+0100 I CONTROL [main] ***** SERVER RESTARTED *****

At around 15:25 we shut down the 3 most heavily loaded routers (mongo-hotel-01, mongo-hotel-02, mongo-hotel-03), so that the DB does not get a load from their clients anymore.

At about 15:50 the first secondary is back in sync, therefore we stepDown the primary because we (wrongly) think it might be a hardware problem:

2020-03-20T15:52:11.954+0100 I COMMAND [conn39249255] Attempting to step down in response to replSetStepDown command
2020-03-20T15:52:11.973+0100 I COMMAND [conn39249255] command admin.$cmd appName: "MongoDB Shell" command: replSetStepDown { replSetStepDown: 60.0, lsid: { id: UUID("be3079b7-ffb7-435c-938e-262ca10da9a6") }, $clusterTime: { clusterTime: Timestamp(1584715911, 226), signature: { hash: BinData(0, 0A644227C73C9075B61DF7053FB6C4B6A467F789), keyId: 6796240667325497611 } }, $db: "admin" } numYields:0 reslen:268 locks:{ Global: { acquireCount: { r: 2, W: 2 }, acquireWaitCount: { W: 2 }, timeAcquiringMicros: { W: 10861 } } } protocol:op_msg 148ms

At about 17:20, we promote the stepped-down server to be Primary again because we don't think that's a hardware problem anymore.

Participants:

 Description   

We are running a cluster consisting of:

  • 1 replSet consisting of 3 mongod hardware nodes (each has 378 GB RAM, 5.8 TB SSD in raid 10)
  • 4 configserver as replSet on VM's, each having 8 CPUs and 4 GB RAM
  • 6 mongoS routers on VM's, whereas only 3 routers are heavily used, each having 8 CPUs and 4 GB RAM

While running the command convertToCapped, all databases were blocked. Neither the command killOp nor killSession could kill this operation. Clients to all other databases had to wait until the command terminated.
Furthermore, due to command convertToCapped, index builds slowed massively down the primary even though they were built in the background. Neither the command killOp nor killSession could kill this operation.

Last but not least, replication got stuck during convertToCapped, so both secondaries were more and more behind the primary.

Please see attached some screenshots of our dashboards of the primary replSet member. They show nicely when the primary were massively or even completely blocked.

I'll also upload log and diagnostic data files from all mongoD's and mongoS's.

This ticket is related to SERVER-45899 where a createIndex command in foreground blocked all databases for the same cluster.



 Comments   
Comment by Kay Agahd [ 02/Apr/20 ]

Hi carl.champain and milkie,

Many thanks for the analysis!
It's scary to learn that a mongodb system, which usually processes more than 70k ops/sec continuously 24/24 & 7/7, has been paralyzed within only 30 minutes from a total of only 5745 queries (= 3 ops/sec) on a single collection that was just being converted, and that this, moreover, qualifies as "works as designed". Why let these queries compromise the entire DBS, when the DBS knows that this collection is being converted and is therefore not accessible?

Furthermore the command convertToCapped was only active from 11:54 to 13:15 according to the log file but the whole DBS was quasi blocked until 17:30.
From 13:15 to 17:58 the indices of the new capped collection were built. Although the indexes were built in background, the whole DBS was still blocked.

Last but not least, all 4 killOp commands executed at 12:56, 13:00, 13:03, 14:42 and 14:43 did not kill anything but my nerfs.
Only the shutdown of 99% of all database clients at 15:25 allowed the DBS to return to normal after 2 hours.
Is this really the expected behaviour of a DBS?

Comment by Carl Champain (Inactive) [ 01/Apr/20 ]

kay.agahd@idealo.de,

Here is what happened:

  1. convertToCapped held a database exclusive lock for its entire duration
  2. Each new connection consumed resources
  3. All resources were exhausted (connections, tickets, etc)
  4. All subsequent operations were blocked

DOCS-13557 was opened to add warnings to the documentation about this undesired behavior.

We recommend you run convertToCapped during maintenance or reduce the volume of reads/writes going to locked things. 

I'm going to close this ticket as this is not a bug.

Kind regards,
Carl

Comment by Eric Milkie [ 01/Apr/20 ]

Note that a viable workaround for this situation is to do what Kay tried, by raising the number of read tickets temporarily. This will work until the incoming connection limit is hit (or some other resource limit).

Comment by Eric Milkie [ 01/Apr/20 ]

The convertToCapped command holds a Database exclusive lock for its entire duration. The creation of the temp collection and the final rename also take Database exclusive locks, but those are nested (recursive) and thus have no bearing on the state of the lock queue for the Database.
What appears to have happened here is that reads were not stopped to the Database during the run of the command, and those blocked connections did not affect future connection attempts (probably a web server or something as the client application). Each blocked connection consumed a ticket, until finally all the free read tickets were consumed. At this point, all incoming readers for any database were blocked. It looks like something similar did not happen for writers, so this means there were less than 256 incoming connections that attempted to write to the Database prior to point B.
The documentation has a brief warning about this behavior, but it doesn't go so far as to mention that this could affect the entire instance:

This holds a database exclusive lock for the duration of the operation. Other operations which lock the same database will be blocked until the operation completes. See What locks are taken by some common client operations? for operations that lock the database.

We haven't made any changes to this behavior of the command even up to version 4.4.

Comment by Carl Champain (Inactive) [ 23/Mar/20 ]

Hi kay.agahd@idealo.de,

We have received a total of 26 files. Thank you!

Comment by Kay Agahd [ 22/Mar/20 ]

Thanks dmitry.agranat for the upload location.
All files but the last one have been uploaded. Can you confirm that you received them? It should be 26 files in total. The last file is quite huge (>600 MB compressed) and is still uploading.

Comment by Dmitry Agranat [ 22/Mar/20 ]

Hi kay.agahd@idealo.de,

I've created a secure upload portal for you. Files uploaded to this portal are visible only to MongoDB employees and are routinely deleted after some time.

Thanks,
Dima

Comment by Kay Agahd [ 20/Mar/20 ]

Could you please give me the link to your upload portal for uploading the log and diagnostic files? Thanks!

I tried this one but it did not work:

curl -F "key=uploads_valid_180_days/SERVER-47032/\${filename}" \
-F "AWSAccessKeyId=AKIAIU2OWL4F5XDSYVJQ" \
-F "acl=private" \
-F "x-amz-server-side-encryption=AES256" \
-F "policy=eyJleHBpcmF0aW9uIjogIjIwMjAtMDctMjlUMDA6MDA6MDBaIiwKICAgICAgICAgICAgICAgICAgImNvbmRpdGlvbnMiOiBbCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgeyAiYnVja2V0IiA6ICIxMGdlbi1odHRwc3VwbG9hZCIgfSwKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBbICJzdGFydHMtd2l0aCIsICIka2V5IiwgInVwbG9hZHNfdmFsaWRfMTgwX2RheXMvU0VSVkVSLTQ1ODk5LyJdLAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIHsgImFjbCIgOiAicHJpdmF0ZSJ9LAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIHsgIngtYW16LXNlcnZlci1zaWRlLWVuY3J5cHRpb24iIDogIkFFUzI1NiIgfSwKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBbICJjb250ZW50LWxlbmd0aC1yYW5nZSIsIDAsIDUzNjg3MDkxMjAgXQogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBdCiAgICAgICAgICAgICAgICAgIH0=" \
-F "signature=W2+z1k21YqB6nF2/VeQdEfnUIq4=" \
-F "file=@configserver01.log.tgz" \
https://10gen-httpsupload.s3.amazonaws.com

 

Generated at Thu Feb 08 05:13:05 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.