[SERVER-46388] mongod instance crash after taking successful backup Created: 25/Feb/20  Updated: 15/Sep/20  Resolved: 15/Sep/20

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Samarth Goyal Assignee: Eric Sedor
Resolution: Incomplete Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File diagnostic.data.tar    
Operating System: ALL
Participants:

 Description   

Mongodb version: 4.2.2

OS Debian 9

Its a 3 node replica set. We are backing up data from a secondary node. After the backup is complete mongod instance crashes with the following errors.

Please let me know if any other data is required.

mongod logs:

2020-02-25T13:35:01.433+0530 I  NETWORK  [listener] connection accepted from 10.33.145.162:57000 #61 (11 connections now open)
2020-02-25T13:35:01.434+0530 I  NETWORK  [conn61] received client metadata from 10.33.145.162:57000 conn61: { driver:{ name: "PyMongo", version: "3.4.0" }, os: \{ type: "Linux", name: "debian 9.9", architecture: "x86_64", version: "4.9.0-9-amd64" }, platform: "CPython 2.7.13.final.0" }
2020-02-25T13:35:01.435+0530 I  NETWORK  [listener] connection accepted from 10.33.145.162:57002 #62 (12 connections now open)
2020-02-25T13:35:01.435+0530 I  NETWORK  [conn62] received client metadata from 10.33.145.162:57002 conn62: { driver:
{ name: "PyMongo", version: "3.4.0" }, os: \{ type: "Linux", name: "debian 9.9", architecture: "x86_64", version: "4.9.0-9-amd64" }, platform: "CPython 2.7.13.final.0" }
2020-02-25T13:35:01.955+0530 I  NETWORK  [conn62] end connection 10.33.145.162:57002 (11 connections now open)
2020-02-25T13:35:01.955+0530 I  NETWORK  [conn61] end connection 10.33.145.162:57000 (10 connections now open)
2020-02-25T13:36:00.546+0530 I  NETWORK  [listener] connection accepted from 127.0.0.1:47012 #63 (11 connections now open)
2020-02-25T13:36:00.547+0530 I  NETWORK  [listener] connection accepted from 127.0.0.1:47014 #64 (12 connections now open)
2020-02-25T13:36:00.567+0530 I  COMMAND  [conn64] CMD fsync: sync:1 lock:1
2020-02-25T13:36:00.597+0530 W  COMMAND  [fsyncLockWorker] WARNING: instance is locked, blocking all writes. The fsync command has finished execution, remember to unlock the instance using fsyncUnlock().
2020-02-25T13:36:00.597+0530 I  COMMAND  [conn64] mongod is locked and no writes are allowed. db.fsyncUnlock() to unlock
2020-02-25T13:36:00.597+0530 I  COMMAND  [conn64] Lock count is 1
2020-02-25T13:36:00.597+0530 I  COMMAND  [conn64]     For more info see [http://dochub.mongodb.org/core/fsynccommand]
2020-02-25T13:36:00.954+0530 I  COMMAND  [conn64] command: unlock requested
2020-02-25T13:36:00.954+0530 I  COMMAND  [conn64] fsyncUnlock completed. mongod is now unlocked and free to accept writes
2020-02-25T13:36:01.000+0530 E  STORAGE  [ftdc] WiredTiger error (2) [1582617961:760][21181:0x7f331691a700], WT_SESSION.open_cursor: __posix_fs_size, 297: /var/mongo/data/shard1/local/collection-16--4379623760652785123.wt: file-size: stat: No such file or directory Raw: [1582617961:760][21181:0x7f331691a700], WT_SESSION.open_cursor: __posix_fs_size, 297: /var/mongo/data/shard1/local/collection-16--4379623760652785123.wt: file-size: stat: No such file or directory
2020-02-25T13:36:01.722+0530 E  STORAGE  [thread59] WiredTiger error (2) [1582617961:722427][21181:0x7f331e129700], log-server: __directory_list_worker, 46: /var/mongo/data/shard1/journal: directory-list: opendir: No such file or directory Raw: [1582617961:722427][21181:0x7f331e129700], log-server: __directory_list_worker, 46: /var/mongo/data/shard1/journal: directory-list: opendir: No such file or directory
2020-02-25T13:36:01.722+0530 E  STORAGE  [thread59] WiredTiger error (2) [1582617961:722510][21181:0x7f331e129700], log-server: __log_prealloc_once, 467: log pre-alloc server error: No such file or directory Raw: [1582617961:722510][21181:0x7f331e129700], log-server: __log_prealloc_once, 467: log pre-alloc server error: No such file or directory
2020-02-25T13:36:01.722+0530 E  STORAGE  [thread59] WiredTiger error (2) [1582617961:722522][21181:0x7f331e129700], log-server: __log_server, 924: log server error: No such file or directory Raw: [1582617961:722522][21181:0x7f331e129700], log-server: __log_server, 924: log server error: No such file or directory
2020-02-25T13:36:01.722+0530 E  STORAGE  [thread59] WiredTiger error (-31804) [1582617961:722529][21181:0x7f331e129700], log-server: __wt_panic, 490: the process must exit and restart: WT_PANIC: WiredTiger library panic Raw: [1582617961:722529][21181:0x7f331e129700], log-server: __wt_panic, 490: the process must exit and restart: WT_PANIC: WiredTiger library panic
2020-02-25T13:36:01.722+0530 F  -        [thread59] Fatal Assertion 50853 at src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp 414
2020-02-25T13:36:01.722+0530 F  -        [thread59]
 
***aborting after fassert() failure
 
2020-02-25T13:36:01.729+0530 F  -        [thread59] Got signal: 6 (Aborted).
0x556ba0a6c431 0x556ba0a6bc2e 0x556ba0a6bcc6 0x7f33257c80e0 0x7f332544afff 0x7f332544c42a 0x556b9ef43397 0x556b9ec8eb60 0x556b9f0c194b 0x556b9ec9b920 0x556b9ec9bd84 0x556b9f14773f 0x7f33257be4a4 0x7f3325500d0f
----- BEGIN BACKTRACE -----
{"backtrace":[\\{"b":"556B9E26C000","o":"2800431","s":"_ZN5mongo15printStackTraceERSo"},\{"b":"556B9E26C000","o":"27FFC2E"},\{"b":"556B9E26C000","o":"27FFCC6"},\"b":"7F33257B7000","o":"110E0"},\{"b":"7F3325418000","o":"32FFF","s":"gsignal"},\"b":"7F3325418000","o":"3442A","s":"abort"},"b":"556B9E26C000","o":"CD7397","s":"_ZN5mongo32fassertFailedNoTraceWithLocationEiPKcj"},\{"b":"556B9E26C000","o":"A22B60"},\{"b":"556B9E26C000","o":"E5594B"},{"b":"556B9E26C000","o":"A2F920","s":"__wt_err_func"},"b":"556B9E26C000","o":"A2FD84","s":"__wt_panic"},\{"b":"556B9E26C000","o":"EDB73F"},\"b":"7F33257B7000","o":"74A4"},\{"b":"7F3325418000","o":"E8D0F","s":"clone"}],"processInfo":{ "mongodbVersion" : "4.2.2", "gitVersion" : "a0bbbff6ada159e19298d37946ac8dc4b497eadf", "compiledModules" : [], "uname" :{ "sysname" : "Linux", "release" : "4.9.0-9-amd64", "version" : "#1 SMP Debian 4.9.168-1 (2019-04-12)", "machine" : "x86_64" }, "somap" : [ \{ "b" : "556B9E26C000", "elfType" : 3, "buildId" : "16D21155485F2D07AE3FE9CB6ECBA69ABAD58B02" }, \{ "b" : "7FFECFD5B000", "path" : "linux-vdso.so.1", "elfType" : 3, "buildId" : "F7DE400836F149C7FB0DB1F818360A2433E67CB1" }, \{ "b" : "7F3326C17000", "path" : "/usr/lib/x86_64-linux-gnu/libcurl.so.4", "elfType" : 3, "buildId" : "816839E99AF235E30CC31450E2ABFABDAA257D24" }, \{ "b" : "7F3326A00000", "path" : "/lib/x86_64-linux-gnu/libresolv.so.2", "elfType" : 3, "buildId" : "EAD5FD817712E63C1212D1EE7D7EE1B9C29F93A7" }, \{ "b" : "7F3326567000", "path" : "/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1", "elfType" : 3, "buildId" : "A214BCD55713CB1E0B9AA61C07319C9A83A2268C" }, \{ "b" : "7F33262FB000", "path" : "/usr/lib/x86_64-linux-gnu/libssl.so.1.1", "elfType" : 3, "buildId" : "2BEF491D3EF8E727DF943799D1309AA357BA7D4C" }, \{ "b" : "7F33260F7000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "DB2CAEEEC37482A98AB1416D0A9AFE2944930DE9" }, \{ "b" : "7F3325EEF000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "86B35D63FACD97D22973E99EE9863F7714C4F53A" }, \{ "b" : "7F3325BEB000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "4E49714C557CE0472C798F39365CA10F9C0E1933" }, \{ "b" : "7F33259D4000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "51AD5FD294CD6C813BED40717347A53434B80B7A" }, \{ "b" : "7F33257B7000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "16D609487BCC4ACBAC29A4EAA2DDA0D2F56211EC" }, \{ "b" : "7F3325418000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "775143E680FF0CD4CD51CCE1CE8CA216E635A1D6" }, \{ "b" : "7F3326E97000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "606DF9C355103E82140D513BC7A25A635591C153" }, \{ "b" : "7F33251F2000", "path" : "/usr/lib/x86_64-linux-gnu/libnghttp2.so.14", "elfType" : 3, "buildId" : "57FE530E3C6E81FD243F02556CDC09142D176A2E" }, \{ "b" : "7F3324FD0000", "path" : "/usr/lib/x86_64-linux-gnu/libidn2.so.0", "elfType" : 3, "buildId" : "52F90A61AFD6B0605DAC537C5D1B8713E8E93889" }, \{ "b" : "7F3324DB3000", "path" : "/usr/lib/x86_64-linux-gnu/librtmp.so.1", "elfType" : 3, "buildId" : "82864DDD2632F14010AD7740D09B7270901D418D" }, \{ "b" : "7F3324B86000", "path" : "/usr/lib/x86_64-linux-gnu/libssh2.so.1", "elfType" : 3, "buildId" : "E12F1273FAC9E2BE7526C7C60D64CF80F846385D" }, \{ "b" : "7F3324978000", "path" : "/usr/lib/x86_64-linux-gnu/libpsl.so.5", "elfType" : 3, "buildId" : "1667EE4ED5224694326899E760722B7B366CEB41" }, \{ "b" : "7F332470F000", "path" : "/usr/lib/x86_64-linux-gnu/libssl.so.1.0.2", "elfType" : 3, "buildId" : "F365E3485410A0833832DC04313E2318637E6A37" }, \{ "b" : "7F33242A9000", "path" : "/usr/lib/x86_64-linux-gnu/libcrypto.so.1.0.2", "elfType" : 3, "buildId" : "D153794665C673EF207DC199FC6A36C3BB59A8C3" }, \{ "b" : "7F332405E000", "path" : "/usr/lib/x86_64-linux-gnu/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "4986F4E8DB61C236489DDC53213B04DB65A2EAA0" }, \{ "b" : "7F3323D84000", "path" : "/usr/lib/x86_64-linux-gnu/libkrb5.so.3", "elfType" : 3, "buildId" : "811575446A67638D151C4829E7040205D92F9C9B" }, \{ "b" : "7F3323B51000", "path" : "/usr/lib/x86_64-linux-gnu/libk5crypto.so.3", "elfType" : 3, "buildId" : "19CE7A9BC33E0910065BDFE299DCACFF638BF06E" }, \{ "b" : "7F332394D000", "path" : "/lib/x86_64-linux-gnu/libcom_err.so.2", "elfType" : 3, "buildId" : "2EB9256EE03E4D411C25715BB6EC484BF9B09E66" }, \{ "b" : "7F332373E000", "path" : "/usr/lib/x86_64-linux-gnu/liblber-2.4.so.2", "elfType" : 3, "buildId" : "EDE2EA44C0B018BBDB20D71A1C8AC99F0CC3F99F" }, \{ "b" : "7F33234ED000", "path" : "/usr/lib/x86_64-linux-gnu/libldap_r-2.4.so.2", "elfType" : 3, "buildId" : "EB45F0CC6A96D38B78D97C87D5D4A3E0706B2079" }, \{ "b" : "7F33232D3000", "path" : "/lib/x86_64-linux-gnu/libz.so.1", "elfType" : 3, "buildId" : "908B5A955D0A73FB8D31E0F927D0CDBA810CB300" }, \{ "b" : "7F3322FBC000", "path" : "/usr/lib/x86_64-linux-gnu/libunistring.so.0", "elfType" : 3, "buildId" : "2E457FF72C4E6A267C0B10E06C3FB8C4F32487EE" }, \{ "b" : "7F3322C23000", "path" : "/usr/lib/x86_64-linux-gnu/libgnutls.so.30", "elfType" : 3, "buildId" : "1C1BC93C559CFE2EBD1B5676FA4B355118EDF38E" }, \{ "b" : "7F33229EE000", "path" : "/usr/lib/x86_64-linux-gnu/libhogweed.so.4", "elfType" : 3, "buildId" : "1D3666D2FA45541887E96DED01529116996812AD" }, \{ "b" : "7F33227B7000", "path" : "/usr/lib/x86_64-linux-gnu/libnettle.so.6", "elfType" : 3, "buildId" : "43D18C6AB6EDE083BE2C5FAA857E379389819ACB" }, \{ "b" : "7F3322534000", "path" : "/usr/lib/x86_64-linux-gnu/libgmp.so.10", "elfType" : 3, "buildId" : "45ACF9508A033A2AE2672156491BC524A3BF20CD" }, \{ "b" : "7F3322224000", "path" : "/lib/x86_64-linux-gnu/libgcrypt.so.20", "elfType" : 3, "buildId" : "917AB7D78C8C49FE3095ABFF95FAB28575D704BB" }, \{ "b" : "7F3322018000", "path" : "/usr/lib/x86_64-linux-gnu/libkrb5support.so.0", "elfType" : 3, "buildId" : "932297A42269A54BCDB88198BA06BD63B13E1996" }, \{ "b" : "7F3321E14000", "path" : "/lib/x86_64-linux-gnu/libkeyutils.so.1", "elfType" : 3, "buildId" : "3CFF3CE519A16305A617D8885EA5D3AE3D965461" }, \{ "b" : "7F3321BF9000", "path" : "/usr/lib/x86_64-linux-gnu/libsasl2.so.2", "elfType" : 3, "buildId" : "A54D193AB95897B4BFE387E6578064711115AB75" }, \{ "b" : "7F3321994000", "path" : "/usr/lib/x86_64-linux-gnu/libp11-kit.so.0", "elfType" : 3, "buildId" : "86F00B032B270ED5297EB393B30EDEF76B890573" }, \{ "b" : "7F3321760000", "path" : "/lib/x86_64-linux-gnu/libidn.so.11", "elfType" : 3, "buildId" : "CCC0C44563E10F70FCF98D0C7AFABC9801F7159B" }, \{ "b" : "7F332154D000", "path" : "/usr/lib/x86_64-linux-gnu/libtasn1.so.6", "elfType" : 3, "buildId" : "D03612373D33091A4678A032C5D7341FB56FE7DC" }, \{ "b" : "7F3321339000", "path" : "/lib/x86_64-linux-gnu/libgpg-error.so.0", "elfType" : 3, "buildId" : "8B9D1F17D242A08FEA23AF32055037569A714209" }, \{ "b" : "7F3321130000", "path" : "/usr/lib/x86_64-linux-gnu/libffi.so.6", "elfType" : 3, "buildId" : "AA1401F42D517693444B96C5774A62D4E8C84A35" } ] }}
mongod(_ZN5mongo15printStackTraceERSo+0x41) [0x556ba0a6c431]
mongod(+0x27FFC2E) [0x556ba0a6bc2e]
mongod(+0x27FFCC6) [0x556ba0a6bcc6]
libpthread.so.0(+0x110E0) [0x7f33257c80e0]
libc.so.6(gsignal+0xCF) [0x7f332544afff]
libc.so.6(abort+0x16A) [0x7f332544c42a]
mongod(_ZN5mongo32fassertFailedNoTraceWithLocationEiPKcj+0x0) [0x556b9ef43397]
mongod(+0xA22B60) [0x556b9ec8eb60]
mongod(+0xE5594B) [0x556b9f0c194b]
mongod(__wt_err_func+0x90) [0x556b9ec9b920]
mongod(__wt_panic+0x39) [0x556b9ec9bd84]
mongod(+0xEDB73F) [0x556b9f14773f]
libpthread.so.0(+0x74A4) [0x7f33257be4a4]
libc.so.6(clone+0x3F) [0x7f3325500d0f]
-----  END BACKTRACE  -----
2020-02-25T14:47:56.283+0530 I  CONTROL  [main] ***** SERVER RESTARTED *****

 



 Comments   
Comment by Eric Sedor [ 15/Sep/20 ]

Hi,

We haven’t heard back from you for some time, so I’m going to mark this ticket as resolved. If this is still an issue for you, please provide additional information and we will reopen the ticket.

Sincerely,
Eric

Comment by Eric Sedor [ 21/Apr/20 ]

sankush04@gmail.com, to help us continue investigating here, can you please provide a detailed timeline of what was run and when?

Comment by Eric Sedor [ 09/Mar/20 ]

sankush04@gmail.com sorry if I was unclear. We are asking for a specific timeline of exactly what was run.

Comment by Ankush Sinha [ 06/Mar/20 ]

Hi Eric,
Please help us out here with some information that might help resolve this issue.

Thanks,
Ankush

Comment by Samarth Goyal [ 03/Mar/20 ]

Hi Eric,
For the backups, we are following the backup strategy using LVM snapshot as mentioned in 
https://docs.mongodb.com/manual/tutorial/backup-with-filesystem-snapshots/
 
The version upgrades from 3.2 to 4.2 were done incrementally in a rolling fashion. This process was completed a couple of days prior to starting backups.
 
Attached the diagnostic.data
 
Thanks,
Samarth

Comment by Eric Sedor [ 25/Feb/20 ]

Hi goyalsamarth1995@gmail.com,

Thanks for the information so far. Can you please describe your backup method in detail?

Please also provide a timeline of version changes for each version from 3.2 to 4.2.

Finally, would you please also archive (tar or zip) the $dbpath/diagnostic.data directory (the contents are described here) and attach it to this ticket?

Gratefully,
Eric

Comment by Samarth Goyal [ 25/Feb/20 ]

Earlier we were using mongodb 3.2.8 where the backup was working correctly. We recently upgraded it to 4.2.2 where we are facing this problem

Generated at Thu Feb 08 05:11:21 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.