[SERVER-45899] createIndex in foreground blocked *all* databases Created: 31/Jan/20  Updated: 20/Mar/20  Resolved: 18/Feb/20

Status: Closed
Project: Core Server
Component/s: Index Maintenance, Replication, Sharding, Stability
Affects Version/s: 3.6.13
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Kay Agahd Assignee: Carl Champain (Inactive)
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File mtools_analysis.png    
Issue Links:
Related
related to SERVER-47032 Long running exclusive locks can end ... Closed
Operating System: ALL
Participants:

 Description   

We've created on our cluster an index in the foreground which blocked not only the database on which the index was being built but it blocked also all other databases during the whole index build.

Here are the related logs from the primary during index build:

2020-01-30T17:09:25.975+0100 I INDEX    [conn26758520] build index on: price_alert.priceAlert properties: { v: 2, key: { priceAlertId: 1 }, name: "priceAlertId", ns: "price_alert.priceAlert" }
2020-01-30T17:09:25.975+0100 I INDEX    [conn26758520] 	 building index using bulk method; build may temporarily use up to 500 megabytes of RAM
2020-01-30T17:09:28.004+0100 I -        [conn26758520]   Index Build: 74000/22140565 0%
...
2020-01-30T17:16:06.002+0100 I -        [conn26758520]   Index Build: 21622800/22140565 97%
2020-01-30T17:16:18.046+0100 I INDEX    [conn26758520] build index done.  scanned 22140565 total records. 412 secs
2020-01-30T17:16:18.048+0100 I COMMAND  [conn26758520] command price_alert.$cmd command: createIndexes { createIndexes: "priceAlert", indexes: [ { key: { priceAlertId: 1 }, name: "priceAlertId", ns: "price_alert.priceAlert" } ], shardVersion: [ Timestamp(0, 0), ObjectId('000000000000000000000000') ], lsid: { id: UUID("1e225ae9-0a74-4a71-99ca-dcca84b62fb5"), uid: BinData(0, CA28DAE5564B626F92E2D21DD43742531AE0FED5604FFFEB8AB31BB5F101B7EC) }, $clusterTime: { clusterTime: Timestamp(1580400565, 19066), signature: { hash: BinData(0, 738B9B6A8BEB897ADDAB8B5D9E04DF77626A574E), keyId: 6729445335938105350 } }, $client: { driver: { name: "mongo-java-driver", version: "3.8.2" }, os: { type: "Linux", name: "Linux", architecture: "amd64", version: "3.16.0-10-amd64" }, platform: "Java/Oracle Corporation/1.8.0_171-8u171-b11-1~bpo8+1-b11", mongos: { host: "mongo-hotel-01:27017", client: "10.142.145.194:44050", version: "3.6.13" } }, $configServerState: { opTime: { ts: Timestamp(1580400565, 18247), t: 14 } }, $db: "price_alert" } numYields:0 reslen:364 locks:{ Global: { acquireCount: { r: 3, w: 3 } }, Database: { acquireCount: { w: 3, W: 1 } }, Collection: { acquireCount: { w: 2 } }, oplog: { acquireCount: { w: 1 } } } protocol:op_msg 412115ms

You see that the index was built from 17:09:25 to 17:16:18.
Our log file analysis by mtools shows (see attached) that everything was blocked during this time period. Many requests got queued and were executed only after the index build had finished.

It might be a coincidence but one of the 6 routers even crashed during the index build:

2020-01-30T17:14:21.324+0100 I -        [listener] pthread_create failed: Resource temporarily unavailable
2020-01-30T17:14:21.324+0100 W EXECUTOR [conn332411] Terminating session due to error: InternalError: failed to create service entry worker thread
2020-01-30T17:14:21.329+0100 F -        [LogicalSessionCacheRefresh] std::exception::what(): Resource temporarily unavailable
Actual exception type: std::system_error
 
 0x5581643011b1 0x558164300b95 0x5581643f5276 0x5581643f52c1 0x558163ac8c6f 0x558163ac94c8 0x558163abd042 0x558163ab1f8a 0x558163ab6d22 0x558163ab7562 0x5581637a8994 0x5581637a8ca3 0x558163bbed54 0x558163bc0238 0x55816378e02c 0x5581644104d0 0x7fe04e1c14a4 0x7fe04df03d0f
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"55816320E000","o":"10F31B1","s":"_ZN5mongo15printStackTraceERSo"},{"b":"55816320E000","o":"10F2B95"},{"b":"55816320E000","o":"11E7276","s":"_ZN10__cxxabiv111__terminateEPFvvE"},{"b":"55816320E000","o":"11E72C1"},{"b":"55816320E000","o":"8BAC6F","s":"_ZN5mongo10ThreadPool25_startWorkerThread_inlockEv"},{"b":"55816320E000","o":"8BB4C8","s":"_ZN5mongo10ThreadPool8scheduleESt8functionIFvvEE"},{"b":"55816320E000","o":"8AF042","s":"_ZN5mongo30ConfigServerCatalogCacheLoader14getChunksSinceERKNS_15NamespaceStringENS_12ChunkVersionESt8functionIFvPNS_16OperationContextENS_10StatusWithINS_18CatalogCacheLoader26CollectionAndChangedChunksEEEEE"},{"b":"55816320E000","o":"8A3F8A","s":"_ZN5mongo12CatalogCache26_scheduleCollectionRefreshENS_8WithLockESt10shared_ptrINS0_17DatabaseInfoEntryEES2_INS_12ChunkManagerEERKNS_15NamespaceStringEi"},{"b":"55816320E000","o":"8A8D22","s":"_ZN5mongo12CatalogCache24getCollectionRoutingInfoEPNS_16OperationContextERKNS_15NamespaceStringE"},{"b":"55816320E000","o":"8A9562","s":"_ZN5mongo12CatalogCache42getShardedCollectionRoutingInfoWithRefreshEPNS_16OperationContextERKNS_15NamespaceStringE"},{"b":"55816320E000","o":"59A994","s":"_ZN5mongo25SessionsCollectionSharded32_checkCacheForSessionsCollectionEPNS_16OperationContextE"},{"b":"55816320E000","o":"59ACA3","s":"_ZN5mongo25SessionsCollectionSharded23setupSessionsCollectionEPNS_16OperationContextE"},{"b":"55816320E000","o":"9B0D54","s":"_ZN5mongo23LogicalSessionCacheImpl8_refreshEPNS_6ClientE"},{"b":"55816320E000","o":"9B2238","s":"_ZN5mongo23LogicalSessionCacheImpl16_periodicRefreshEPNS_6ClientE"},{"b":"55816320E000","o":"58002C"},{"b":"55816320E000","o":"12024D0"},{"b":"7FE04E1BA000","o":"74A4"},{"b":"7FE04DE1B000","o":"E8D0F","s":"clone"}],"processInfo":{ "mongodbVersion" : "3.6.13", "gitVersion" : "db3c76679b7a3d9b443a0e1b3e45ed02b88c539f", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "4.9.0-11-amd64", "version" : "#1 SMP Debian 4.9.189-3+deb9u1 (2019-09-20)", "machine" : "x86_64" }, "somap" : [ { "b" : "55816320E000", "elfType" : 3, "buildId" : "072ABC16ACA2D355506A39C14C0655EBD3C10559" }, { "b" : "7FFEDF5D4000", "path" : "linux-vdso.so.1", "elfType" : 3, "buildId" : "E4710F184535B7465A293C8D84196F236A12D2CF" }, { "b" : "7FE04F404000", "path" : "/lib/x86_64-linux-gnu/libresolv.so.2", "elfType" : 3, "buildId" : "EAD5FD817712E63C1212D1EE7D7EE1B9C29F93A7" }, { "b" : "7FE04EF6A000", "path" : "/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1", "elfType" : 3, "buildId" : "BC8265C9A7EB28F1A9677B7437E4561F383C44D9" }, { "b" : "7FE04ECFE000", "path" : "/usr/lib/x86_64-linux-gnu/libssl.so.1.1", "elfType" : 3, "buildId" : "4EAE4DD6DA20F6CECBEA6688C8ED8E9987CD800F" }, { "b" : "7FE04EAFA000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "DB2CAEEEC37482A98AB1416D0A9AFE2944930DE9" }, { "b" : "7FE04E8F2000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "86B35D63FACD97D22973E99EE9863F7714C4F53A" }, { "b" : "7FE04E5EE000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "4E49714C557CE0472C798F39365CA10F9C0E1933" }, { "b" : "7FE04E3D7000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "51AD5FD294CD6C813BED40717347A53434B80B7A" }, { "b" : "7FE04E1BA000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "16D609487BCC4ACBAC29A4EAA2DDA0D2F56211EC" }, { "b" : "7FE04DE1B000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "775143E680FF0CD4CD51CCE1CE8CA216E635A1D6" }, { "b" : "7FE04F61B000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "606DF9C355103E82140D513BC7A25A635591C153" }, { "b" : "7FE04DC09000", "path" : "/lib/x86_64-linux-gnu/libnss_files.so.2", "elfType" : 3, "buildId" : "8ADB8C8CDDE0E4C3C3BABB8874AE69DCA15A666A" }, { "b" : "7FE04DA03000", "path" : "/lib/x86_64-linux-gnu/libnss_dns.so.2", "elfType" : 3, "buildId" : "B65AA56F4CECC441FCE7F85F0B4AA0F2815E4E34" } ] }}
 mongos(_ZN5mongo15printStackTraceERSo+0x41) [0x5581643011b1]
 mongos(+0x10F2B95) [0x558164300b95]
 mongos(_ZN10__cxxabiv111__terminateEPFvvE+0x6) [0x5581643f5276]
 mongos(+0x11E72C1) [0x5581643f52c1]
 mongos(_ZN5mongo10ThreadPool25_startWorkerThread_inlockEv+0x99F) [0x558163ac8c6f]
 mongos(_ZN5mongo10ThreadPool8scheduleESt8functionIFvvEE+0x398) [0x558163ac94c8]
 mongos(_ZN5mongo30ConfigServerCatalogCacheLoader14getChunksSinceERKNS_15NamespaceStringENS_12ChunkVersionESt8functionIFvPNS_16OperationContextENS_10StatusWithINS_18CatalogCacheLoader26CollectionAndChangedChunksEEEEE+0x222) [0x558163abd042]
 mongos(_ZN5mongo12CatalogCache26_scheduleCollectionRefreshENS_8WithLockESt10shared_ptrINS0_17DatabaseInfoEntryEES2_INS_12ChunkManagerEERKNS_15NamespaceStringEi+0xB2A) [0x558163ab1f8a]
 mongos(_ZN5mongo12CatalogCache24getCollectionRoutingInfoEPNS_16OperationContextERKNS_15NamespaceStringE+0x952) [0x558163ab6d22]
 mongos(_ZN5mongo12CatalogCache42getShardedCollectionRoutingInfoWithRefreshEPNS_16OperationContextERKNS_15NamespaceStringE+0x52) [0x558163ab7562]
 mongos(_ZN5mongo25SessionsCollectionSharded32_checkCacheForSessionsCollectionEPNS_16OperationContextE+0x64) [0x5581637a8994]
 mongos(_ZN5mongo25SessionsCollectionSharded23setupSessionsCollectionEPNS_16OperationContextE+0x33) [0x5581637a8ca3]
 mongos(_ZN5mongo23LogicalSessionCacheImpl8_refreshEPNS_6ClientE+0x164) [0x558163bbed54]
 mongos(_ZN5mongo23LogicalSessionCacheImpl16_periodicRefreshEPNS_6ClientE+0x28) [0x558163bc0238]
 mongos(+0x58002C) [0x55816378e02c]
 mongos(+0x12024D0) [0x5581644104d0]
 libpthread.so.0(+0x74A4) [0x7fe04e1c14a4]
 libc.so.6(clone+0x3F) [0x7fe04df03d0f]
-----  END BACKTRACE  -----

The other 5 routers survived.

I can provide you the mongod and mongos logs.



 Comments   
Comment by Carl Champain (Inactive) [ 18/Feb/20 ]

kay.agahd@idealo.de,

Apologies for the delayed response.
Unfortunately, there is no diagnostic data covering the index build for the primary and the mongos that crashed. This makes it difficult for us to investigate this issue in greater detail since we can’t correlate the events in the log with other diagnostic indicators. If this behavior happens again, please reopen a ticket and provide the diagnostic data and logs covering the event so we can continue to investigate.

Kind regards,
Carl

Comment by Carl Champain (Inactive) [ 07/Feb/20 ]

kay.agahd@idealo.de,

Yes, the upload succeeded. Thank you!
We are investigating the issue and will let you know.

Comment by Kay Agahd [ 07/Feb/20 ]

Anyone?

Comment by Kay Agahd [ 05/Feb/20 ]

Hi carl.champain, I've uploaded the router and configserver logs again and also the diagnostic data. Did the upload succeed this time?

Comment by Carl Champain (Inactive) [ 04/Feb/20 ]

Sorry, I misspoke in my previous comment: I can open sharedSystem01.primary.log.mongo-hotel01-01.db00.pro07.eu.idealo.com.gz.

Comment by Carl Champain (Inactive) [ 04/Feb/20 ]

kay.agahd@idealo.de,

 
Unfortunately, the upload was not successful. There is just sharedSystem01.primary.log.mongo-hotel01-01.db00.pro07.eu.idealo.com.gz, but, once downloaded, I can't open it.
You only need to replace file.name and keep the "@":

-F "file=@file.name"

Can you also please archive (tar or zip) the $dbpath/diagnostic.data directory (the contents are described here) and upload them?

Thanks,
Carl 

Comment by Kay Agahd [ 02/Feb/20 ]

Hi carl.champain, I've uploaded router, configserver and mongod log files but I doubt that the upload succeeded because the upload was quite fast terminated. However no error message was thrown.
The curl upload instructions said to replace file.name with the file to upload.
In order not to overwrite the uploaded files by themselves (because they all have identical file names), I added their hostame to the file name:

hn=`hostname -f`
mv router.log.gz router.log.${hn}.gz
filename="router.log.${hn}.gz"

The lines I replaced in the curl statement are:

curl -F "key=uploads_valid_180_days/SERVER-45899/\${filename}"  
...
-F "file=@\${filename}" 

Is this correct?
I removed also the @:

-F "file=\${filename}" 

What's the correct way to use the curl statement?

Comment by Carl Champain (Inactive) [ 31/Jan/20 ]

Hi kay.agahd@idealo.de,

Thank you for the report.
Can you please share with us the logs for each of the mongos and primary nodes? I've created a secure upload portal for you. Files uploaded to this portal are visible only to MongoDB employees and are routinely deleted after some time.

Kind regards,
Carl
 

Generated at Thu Feb 08 05:09:59 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.