[SERVER-49317] Fatal Assertion 50853 at src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp 409 Created: 05/Jul/20  Updated: 27/Oct/23  Resolved: 19/Jul/20

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: firstName lastName Assignee: Dmitry Agranat
Resolution: Community Answered Votes: 0
Labels: FA_50853
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

OS: Ubuntu (18.04 lts), mongodb 4.10.


Attachments: Text File mongodbError.txt    
Issue Links:
Duplicate
duplicates SERVER-50069 Fatal Assertion 50853 at src/mongo/db... Closed
Operating System: ALL
Steps To Reproduce:

STACK TRACE

2020-07-04T22:11:23.774+0200 I NETWORK [conn600] end connection 127.0.0.1:39212 (14 connections now open)
2020-07-04T22:11:23.774+0200 I NETWORK [conn599] end connection 127.0.0.1:39210 (13 connections now open)
2020-07-04T22:42:52.743+0200 I NETWORK [listener] connection accepted from 127.0.0.1:39806 #601 (14 connections now open)
2020-07-04T22:42:52.744+0200 I NETWORK [conn601] received client metadata from 127.0.0.1:39806 conn601: \{ driver: { name: "PyMongo", version: "3.10.1" }, os: { type: $
2020-07-04T22:42:52.744+0200 I NETWORK [listener] connection accepted from 127.0.0.1:39808 #602 (15 connections now open)
2020-07-04T22:42:52.745+0200 I NETWORK [conn602] received client metadata from 127.0.0.1:39808 conn602: \{ driver: { name: "PyMongo", version: "3.10.1" }, os: { type: $
2020-07-04T22:42:57.252+0200 I NETWORK [conn602] end connection 127.0.0.1:39808 (14 connections now open)
2020-07-04T22:42:57.252+0200 I NETWORK [conn601] end connection 127.0.0.1:398^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^$
2020-07-05T00:53:30.611+0200 I CONTROL [main] Automatically disabling TLS 1.0, to force-enable TLS 1.0 specify --sslDisabledProtocols 'none'
2020-07-05T00:53:30.703+0200 I CONTROL [initandlisten] MongoDB starting : pid=930 port=27017 dbpath=/var/lib/mongodb 64-bit host=mojtiketServerComputerName
2020-07-05T00:53:30.703+0200 I CONTROL [initandlisten] db version v4.0.10
2020-07-05T00:53:30.703+0200 I CONTROL [initandlisten] git version: c389e7f69f637f7a1ac3cc9fae843b635f20b766
2020-07-05T00:53:30.703+0200 I CONTROL [initandlisten] OpenSSL version: OpenSSL 1.1.1 11 Sep 2018
2020-07-05T00:53:30.703+0200 I CONTROL [initandlisten] allocator: tcmalloc
2020-07-05T00:53:30.703+0200 I CONTROL [initandlisten] modules: none
2020-07-05T00:53:30.703+0200 I CONTROL [initandlisten] build environment:
2020-07-05T00:53:30.703+0200 I CONTROL [initandlisten] distmod: ubuntu1804
2020-07-05T00:53:30.703+0200 I CONTROL [initandlisten] distarch: x86_64
2020-07-05T00:53:30.703+0200 I CONTROL [initandlisten] target_arch: x86_64
2020-07-05T00:53:30.703+0200 I CONTROL [initandlisten] options: \{ config: "/etc/mongod.conf", net: { bindIp: "0.0.0.0", port: 27017 }, processManagement: { timeZoneIn$
2020-07-05T00:53:30.711+0200 W STORAGE [initandlisten] Detected unclean shutdown - /var/lib/mongodb/mongod.lock is not empty.
2020-07-05T00:53:30.711+0200 I STORAGE [initandlisten] Detected data files in /var/lib/mongodb created by the 'wiredTiger' storage engine, so setting the active stora$
2020-07-05T00:53:30.711+0200 W STORAGE [initandlisten] Recovering data from the last clean checkpoint.
2020-07-05T00:53:30.711+0200 I STORAGE [initandlisten]
2020-07-05T00:53:30.711+0200 I STORAGE [initandlisten] ** WARNING: Using the XFS filesystem is strongly recommended with the WiredTiger storage engine
2020-07-05T00:53:30.711+0200 I STORAGE [initandlisten] ** See http://dochub.mongodb.org/core/prodnotes-filesystem
2020-07-05T00:53:30.711+0200 I STORAGE [initandlisten] wiredtiger_open config: create,cache_size=7467M,session_max=20000,eviction=(threads_min=4,threads_max=4),config$
2020-07-05T00:53:31.365+0200 I STORAGE [initandlisten] WiredTiger message [1593903211:365943][930:0x7fe8953e7a40], txn-recover: Main recovery loop: starting at 128/77$
2020-07-05T00:53:31.366+0200 I STORAGE [initandlisten] WiredTiger message [1593903211:366246][930:0x7fe8953e7a40], txn-recover: Recovering log 128 through 129
2020-07-05T00:53:31.386+0200 I STORAGE [initandlisten] WiredTiger message [1593903211:386983][930:0x7fe8953e7a40], txn-recover: Recovering log 129 through 129
2020-07-05T00:53:31.432+0200 I STORAGE [initandlisten] WiredTiger message [1593903211:432096][930:0x7fe8953e7a40], txn-recover: Set global recovery timestamp: 0
2020-07-05T00:53:31.461+0200 I RECOVERY [initandlisten] WiredTiger recoveryTimestamp. Ts: Timestamp(0, 0)
2020-07-05T00:53:31.488+0200 E STORAGE [initandlisten] WiredTiger error (0) [1593903211:488201][930:0x7fe8953e7a40], file:collection-720-7600515672542947522.wt, WT_CU$
2020-07-05T00:53:31.488+0200 E STORAGE [initandlisten] WiredTiger error (0) [1593903211:488339][930:0x7fe8953e7a40], file:collection-720-7600515672542947522.wt, WT_CU$
2020-07-05T00:53:31.488+0200 E STORAGE [initandlisten] WiredTiger error (0) [1593903211:488472][930:0x7fe8953e7a40], file:collection-720-7600515672542947522.wt, WT_CU$
2020-07-05T00:53:31.488+0200 E STORAGE [initandlisten] WiredTiger error (0) [1593903211:488583][930:0x7fe8953e7a40], file:collection-720-7600515672542947522.wt, WT_CU$
2020-07-05T00:53:31.488+0200 E STORAGE [initandlisten] WiredTiger error (0) [1593903211:488695][930:0x7fe8953e7a40], file:collection-720-7600515672542947522.wt, WT_CU$
2020-07-05T00:53:31.488+0200 E STORAGE [initandlisten] WiredTiger error (0) [1593903211:488812][930:0x7fe8953e7a40], file:collection-720-7600515672542947522.wt, WT_CU$
2020-07-05T00:53:31.488+0200 E STORAGE [initandlisten] WiredTiger error (0) [1593903211:488920][930:0x7fe8953e7a40], file:collection-720-7600515672542947522.wt, WT_CU$
2020-07-05T00:53:31.489+0200 E STORAGE [initandlisten] WiredTiger error (0) [1593903211:489030][930:0x7fe8953e7a40], file:collection-720-7600515672542947522.wt, WT_CU$
2020-07-05T00:53:31.489+0200 E STORAGE [initandlisten] WiredTiger error (0) [1593903211:489154][930:0x7fe8953e7a40], file:collection-720-7600515672542947522.wt, WT_CU$
2020-07-05T00:53:31.489+0200 E STORAGE [initandlisten] WiredTiger error (-31802) [1593903211:489171][930:0x7fe8953e7a40], file:collection-720-7600515672542947522.wt, $
2020-07-05T00:53:31.489+0200 E STORAGE [initandlisten] WiredTiger error (-31804) [1593903211:489181][930:0x7fe8953e7a40], file:collection-720-7600515672542947522.wt, $
2020-07-05T00:53:31.489+0200 F - [initandlisten] Fatal Assertion 50853 at src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp 409
2020-07-05T00:53:31.489+0200 F - [initandlisten]
 
***aborting after fassert() failure
 
2020-07-05T00:53:31.513+0200 F - [initandlisten] Got signal: 6 (Aborted).
 0x55efcc791c51 0x55efcc790e69 0x55efcc79134d 0x7fe893a2c890 0x7fe893667e97 0x7fe893669801 0x55efcad8ca91 0x55efcae8b356 0x55efcaefdab9 0x55efcad13e82 0x55efcad1429c 0x55efcafbf645 0x55efcafbf797 0x55efcaf32ff1 0x55efcaf3877c 0x55efcaf53d8d 0x55efcafc946c 0x55efcaf765fc 0x55efcae6ce3c 0x55efcae770dd 0x55efcae5d12c 0x55efcafeb698 x55efcafbf645 0x55efcafbf797 0x55efcaf32ff1 0x55efcaf3877c 0x55efcaf53d8d 0x55efcafc946c 0x55efcaf765fc 0x55efcae6ce3c 0x55efcae770dd 0x55efcae5d12c 0x55efcafeb
----- BEGIN BACKTRACE -----
x55efcafbf645 0x55efcafbf797 0x55efcaf32ff1 0x55efcaf3877c 0x55efcaf53d8d 0x55efcafc946c 0x55efcaf765fc 0x55efcae6ce3c 0x55efcae770dd 0x55efcae5d12c 0x55efcafeb\{"backtrace":[{"b":"55EFCA370000","o":"2421C51","s":"_ZN5mongo15printStackTraceERSo"},\{"b":"55EFCA370000","o":"2420E69"},\{"b":"55EFCA370000","o":"242134D"},{"b":"7FE89$
 mongod(_ZN5mongo15printStackTraceERSo+0x41) [0x55efcc791c51]
 mongod(+0x2420E69) [0x55efcc790e69]
 mongod(+0x242134D) [0x55efcc79134d]
 libpthread.so.0(+0x12890) [0x7fe893a2c890]
 libc.so.6(gsignal+0xC7) [0x7fe893667e97]
 libc.so.6(abort+0x141) [0x7fe893669801]
 mongod(_ZN5mongo32fassertFailedNoTraceWithLocationEiPKcj+0x0) [0x55efcad8ca91]
 mongod(+0xB1B356) [0x55efcae8b356]
 mongod(+0xB8DAB9) [0x55efcaefdab9]
 mongod(__wt_err_func+0x90) [0x55efcad13e82]
 mongod(__wt_panic+0x39) [0x55efcad1429c]
 mongod(__wt_block_read_off+0x575) [0x55efcafbf645]
 mongod(__wt_bm_read+0x147) [0x55efcafbf797]
 mongod(__wt_bt_read+0xB1) [0x55efcaf32ff1]
 mongod(__wt_page_in_func+0x89C) [0x55efcaf3877c]
 mongod(+0xBE3D8D) [0x55efcaf53d8d]
 mongod(__wt_btcur_prev+0xCFC) [0x55efcafc946c]
 mongod(+0xC065FC) [0x55efcaf765fc]
 mongod(_ZN5mongo31WiredTigerRecordStoreCursorBase4nextEv+0x22C) [0x55efcae6ce3c]
 mongod(_ZN5mongo21WiredTigerRecordStore19postConstructorInitEPNS_16OperationContextE+0xCD) [0x55efcae770dd]
 mongod(_ZN5mongo18WiredTigerKVEngine21getGroupedRecordStoreEPNS_16OperationContextENS_10StringDataES3_RKNS_17CollectionOptionsENS_8KVPrefixE+0x27C) [0x55efcae5d12c]
 mongod(_ZN5mongo26KVDatabaseCatalogEntryBase14initCollectionEPNS_16OperationContextERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEb+0x568) [0x55efcafeb698]
 mongod(_ZN5mongo15KVStorageEngine11loadCatalogEPNS_16OperationContextE+0x142F) [0x55efcafe538f]
 mongod(_ZN5mongo15KVStorageEngineC2EPNS_8KVEngineERKNS_22KVStorageEngineOptionsESt8functionIFSt10unique_ptrINS_26KVDatabaseCatalogEntryBaseESt14default_deleteIS8_EEN6_10StringDataEPS0_EE+0x22D) [0x55efcafe6bed]
 mongod(+0xAD778D) [0x55efcae4778d]
 mongod(_ZN5mongo23initializeStorageEngineEPNS_14ServiceContextENS_22StorageEngineInitFlagsE+0x552) [0x55efcb548e92]
 mongod(+0xA88F78) [0x55efcadf8f78]
 mongod(_ZN5mongo11mongoDbMainEiPPcS1_+0xEDF) [0x55efcadfcf7f]
 mongod(main+0x9) [0x55efcad8e629]
 libc.so.6(__libc_start_main+0xE7) [0x7fe89364ab97]
 mongod(_start+0x2A) [0x55efcadf792a]

Participants:

 Description   

I launched server 4 weeks ago and this is  issue happened three times. First two times I didnt see anything in logs, presumably because Ubuntu enters read only mode, because of inconsistency with sector values.  First 2 times fix was to run 'fsck' command during reboot which fixes some inconsistent sector values, and after this everything works fine.

This time I ran 'fsck' directly from terminal, it fixed some inconsistent sectors and I rebooted server. Currently I cant start/restart mongod with any command (eg. 'sudo systemctl restart mongod') and it prints same error and stack in logs like the one before reboot that I mansion earlier in this text. Currently I am not able to make mongod live again.

From logs I see that last connection before stack trace has characters ^@ ^@/

 

Can you helping by telling, am I doing something wrong, how to get server live again.

Thanks, Nikola

 

 

 



 Comments   
Comment by Dmitry Agranat [ 19/Jul/20 ]

Sounds good nikolarancic@yahoo.com, I will go ahead and close this issue.

Regards,
Dima

Comment by firstName lastName [ 16/Jul/20 ]

Hi Dmitry,

No I haven't run --rapair command. I will try it next time. Thank you Dmitry.

Bye, firstName

Comment by Dmitry Agranat [ 16/Jul/20 ]

Hi nikolarancic@yahoo.com,

The command that I've mentioned in my earlier comment is --repair, as described in our documentation: https://docs.mongodb.com/manual/reference/program/mongod/#cmdoption-mongod-repair

Thanks,
Dima

Comment by firstName lastName [ 16/Jul/20 ]

@Dmitry Agranat I don't understand you. 'after fixing the reported disk issues' log says 'Completed without error' every time. What command as a precaution do you want me to execute?

Comment by Dmitry Agranat [ 14/Jul/20 ]

nikolarancic@yahoo.com, after fixing the reported disk issues, did you successfully executed --repair?

Comment by firstName lastName [ 13/Jul/20 ]

Hi @Dmitry Agranat,
Ssd is a new one and I also run smartctl multiple times and it finishes with no error. So I am not that convince that new ssd will help.

Short:

 # 1 Extended offline Completed without error 00% 1050 -

  1. 2 Extended offline Completed without error 00% 1043 -
  2. 3 Short offline Completed without error 00% 1042 -

Long:
'''

smartctl 6.6 2016-05-31 r4324 [x86_64-linux-5.3.0-53-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model: KINGSTON SA400S37480G
Serial Number: 50026B7682672C22
LU WWN Device Id: 5 0026b7 682672c22
Firmware Version: SBFKB1C2
User Capacity: 480,103,981,056 bytes [480 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-3 T13/2161-D revision 4
SATA Version is: SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Fri Jul 10 05:29:42 2020 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (65535) seconds.
Offline data collection
capabilities: (0x11) SMART execute Offline immediate.
No Auto Offline data collection support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
No Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 30) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x0032 000 100 000 Old_age Always - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 1163
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 25
148 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 0
149 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 0
167 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 0
168 Unknown_Attribute 0x0012 100 100 000 Old_age Always - 0
169 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 14
170 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 10
172 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
173 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 131087
181 Program_Fail_Cnt_Total 0x0032 100 100 000 Old_age Always - 0
182 Erase_Fail_Count_Total 0x0000 100 100 000 Old_age Offline - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0012 100 100 000 Old_age Always - 14
194 Temperature_Celsius 0x0022 069 066 000 Old_age Always - 31 (Min/Max 22/34)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
199 UDMA_CRC_Error_Count 0x0032 100 100 000 Old_age Always - 0
218 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
231 Temperature_Celsius 0x0000 001 001 000 Old_age Offline - 99
233 Media_Wearout_Indicator 0x0032 100 100 000 Old_age Always - 586
241 Total_LBAs_Written 0x0032 100 100 000 Old_age Always - 108
242 Total_LBAs_Read 0x0032 100 100 000 Old_age Always - 36
244 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 2
245 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 15
246 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 38240
246 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 38240

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error

  1. 1 Extended offline Completed without error 00% 1050 -
  2. 2 Extended offline Completed without error 00% 1043 -
  3. 3 Short offline Completed without error 00% 1042 -

Selective Self-tests/Logging not supported

'''

Comment by Dmitry Agranat [ 13/Jul/20 ]

Hi nikolarancic@yahoo.com,

Callstack indicates a checksum failure during a file read, which indicates a disk failure. Given the long history of issues with the disk you've mentioned, we recommend fixing or replacing the disk first and after that trying to recover data by using the repair procedure.

Regards,
Dima

Generated at Thu Feb 08 05:19:31 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.