[SERVER-43247] crash sigsegv Created: 10/Sep/19  Updated: 11/Nov/19  Resolved: 11/Nov/19

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 4.0.12
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: krzysztof osmulski Assignee: Danny Hatcher (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Zip Archive mongod.log.zip    
Participants:

 Description   

i play around with mongo a bit. standalone.

I got this crash recently.

2019-09-10T04:00:17.953+0200 I NETWORK [conn2326] end connection 127.0.0.1:52482 (2 connections now open)
2019-09-10T04:00:17.955+0200 I NETWORK [conn2325] end connection 127.0.0.1:52480 (1 connection now open)
2019-09-10T04:00:58.144+0200 F - [WTCheckpointThread] Invalid access at address: 0x56e1
2019-09-10T04:00:58.180+0200 F - [WTCheckpointThread] Got signal: 11 (Segmentation fault).
 0x5610123b7e11 0x5610123b7029 0x5610123b7696 0x7ff8360eb330 0x7ff835daca50 0x561010bc031d 0x561010ae2d3e 0x561010b4cf94 0x561010b0fd4a 0x561010b1213f 0x561010b131bb 0x561010af8c4a 0x561010a72c77 0x561011e0a341 0x5610124c7790 0x7ff8360e3184 0x7ff835e1003d
----- BEGIN BACKTRACE -----
{"backtrace":[\{"b":"56100FF73000","o":"2444E11","s":"_ZN5mongo15printStackTraceERSo"},\{"b":"56100FF73000","o":"2444029"},\{"b":"56100FF73000","o":"2444696"},\{"b":"7FF8360DB000","o":"10330"},\{"b":"7FF835D12000","o":"9AA50"},\{"b":"56100FF73000","o":"C4D31D","s":"__wt_rec_row_int"},\{"b":"56100FF73000","o":"B6FD3E","s":"__wt_reconcile"},\{"b":"56100FF73000","o":"BD9F94","s":"__wt_cache_op"},\{"b":"56100FF73000","o":"B9CD4A"},\{"b":"56100FF73000","o":"B9F13F"},\{"b":"56100FF73000","o":"BA01BB","s":"__wt_txn_checkpoint"},\{"b":"56100FF73000","o":"B85C4A"},\{"b":"56100FF73000","o":"AFFC77","s":"_ZN5mongo18WiredTigerKVEngine26WiredTigerCheckpointThread3runEv"},\{"b":"56100FF73000","o":"1E97341","s":"_ZN5mongo13BackgroundJob7jobBodyEv"},\{"b":"56100FF73000","o":"2554790"},\{"b":"7FF8360DB000","o":"8184"},\{"b":"7FF835D12000","o":"FE03D","s":"clone"}],"processInfo":\{ "mongodbVersion" : "4.0.12", "gitVersion" : "5776e3cbf9e7afe86e6b29e22520ffb6766e95d4", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "4.4.0-148-generic", "version" : "#174~14.04.1-Ubuntu SMP Thu May 9 08:17:37 UTC 2019", "machine" : "x86_64" }, "somap" : [ \{ "b" : "56100FF73000", "elfType" : 3, "buildId" : "ACFBD82DF77AF9F7CD5CB76D063D54C6DA10CA66" }, \{ "b" : "7FFEDEFF1000", "elfType" : 3, "buildId" : "4B58D7AD35853E2819C0E58904BEE449220DBFB2" }, \{ "b" : "7FF837478000", "path" : "/usr/lib/x86_64-linux-gnu/libcurl.so.4", "elfType" : 3, "buildId" : "4ACB08147817E6291B181CEF491FB4724336AC04" }, \{ "b" : "7FF83725D000", "path" : "/lib/x86_64-linux-gnu/libresolv.so.2", "elfType" : 3, "buildId" : "9FCED6C1BB3F783375497F9C98FF2CF025ABBEBB" }, \{ "b" : "7FF836E80000", "path" : "/lib/x86_64-linux-gnu/libcrypto.so.1.0.0", "elfType" : 3, "buildId" : "0430E61DA2B4291F7CE5512101F7AE23C93236D4" }, \{ "b" : "7FF836C21000", "path" : "/lib/x86_64-linux-gnu/libssl.so.1.0.0", "elfType" : 3, "buildId" : "5BB10EACF0B497C21806AACAAF45C36328E831A3" }, \{ "b" : "7FF836A1D000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "1B38A86853776548628FA4090913C7A12C8F3F4D" }, \{ "b" : "7FF836815000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "D27A253ACFC83E639AE80A606BBA2C058302D07A" }, \{ "b" : "7FF83650F000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "1B288F3B11CB908F03FA568752126AD1AE3C6D1E" }, \{ "b" : "7FF8362F9000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "36311B4457710AE5578C4BF00791DED7359DBB92" }, \{ "b" : "7FF8360DB000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "C4D728AC02A328301C070F5C220B826492273FCD" }, \{ "b" : "7FF835D12000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "5A49BF8DEF435AC3FE9208DF3C6B5622FE347A97" }, \{ "b" : "7FF8376DF000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "2C5922FE5D8F6A77F42349579B5D9AF51E17C591" }, \{ "b" : "7FF835ADF000", "path" : "/usr/lib/x86_64-linux-gnu/libidn.so.11", "elfType" : 3, "buildId" : "A4CF3D2F3AD65050A8199AFC54BD29893EE88902" }, \{ "b" : "7FF8358C5000", "path" : "/usr/lib/x86_64-linux-gnu/librtmp.so.0", "elfType" : 3, "buildId" : "B194D58FAD21CCFA9B4321CA687678D82B712994" }, \{ "b" : "7FF83567E000", "path" : "/usr/lib/x86_64-linux-gnu/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "F53E78DECA2C22259B2FD54DC32C9E3B010BBBF8" }, \{ "b" : "7FF83546F000", "path" : "/usr/lib/x86_64-linux-gnu/liblber-2.4.so.2", "elfType" : 3, "buildId" : "B39BBBBA44739593151523E5F03459BC3E2D3205" }, \{ "b" : "7FF83521E000", "path" : "/usr/lib/x86_64-linux-gnu/libldap_r-2.4.so.2", "elfType" : 3, "buildId" : "372822D0E17BF7C615B6345E5ECEFB1B27BCA57B" }, \{ "b" : "7FF835005000", "path" : "/lib/x86_64-linux-gnu/libz.so.1", "elfType" : 3, "buildId" : "61ECB1C9E746126B3CCCC7E82705E539ECAEC3AB" }, \{ "b" : "7FF834D46000", "path" : "/usr/lib/x86_64-linux-gnu/libgnutls.so.26", "elfType" : 3, "buildId" : "31C8FF1B8CFAA077ECE92C00C11FCAB72272C5E2" }, \{ "b" : "7FF834AC6000", "path" : "/lib/x86_64-linux-gnu/libgcrypt.so.11", "elfType" : 3, "buildId" : "75E1DDBDFDD5DB837EC6E83928DB65A0A3CE4084" }, \{ "b" : "7FF8347FB000", "path" : "/usr/lib/x86_64-linux-gnu/libkrb5.so.3", "elfType" : 3, "buildId" : "35C054BECC0C5FB1AE9CDF7CDFA4F54089878BF0" }, \{ "b" : "7FF8345CC000", "path" : "/usr/lib/x86_64-linux-gnu/libk5crypto.so.3", "elfType" : 3, "buildId" : "0788D6F3B7675F5373F8D2F66D8284ADB7D2B7B7" }, \{ "b" : "7FF8343C8000", "path" : "/lib/x86_64-linux-gnu/libcom_err.so.2", "elfType" : 3, "buildId" : "8D56938ABD6462C4C29822D8E48A131BE1C61F6A" }, \{ "b" : "7FF8341BD000", "path" : "/usr/lib/x86_64-linux-gnu/libkrb5support.so.0", "elfType" : 3, "buildId" : "3D4F9028A10CC566F8CDC9FEFB09683B8A20FB92" }, \{ "b" : "7FF833FA2000", "path" : "/usr/lib/x86_64-linux-gnu/libsasl2.so.2", "elfType" : 3, "buildId" : "666B276BD134B0E9579B67D4EE333F2D0FB813CD" }, \{ "b" : "7FF833D64000", "path" : "/usr/lib/x86_64-linux-gnu/libgssapi.so.3", "elfType" : 3, "buildId" : "3DD3615C50982A067E390FC2443D7EF749ADAA4D" }, \{ "b" : "7FF833B50000", "path" : "/usr/lib/x86_64-linux-gnu/libtasn1.so.6", "elfType" : 3, "buildId" : "1477FEC6F18A279343616F89650A2737E83358C0" }, \{ "b" : "7FF83390E000", "path" : "/usr/lib/x86_64-linux-gnu/libp11-kit.so.0", "elfType" : 3, "buildId" : "D4B5C925023E4142D335EEFB6106F47245A3F97C" }, \{ "b" : "7FF833709000", "path" : "/lib/x86_64-linux-gnu/libgpg-error.so.0", "elfType" : 3, "buildId" : "38CA3EE1AE3847D38BF2F3ED9CA1A17FAC217CF7" }, \{ "b" : "7FF833505000", "path" : "/lib/x86_64-linux-gnu/libkeyutils.so.1", "elfType" : 3, "buildId" : "0F03635F97B93D3DACD84F0ED363C56BD266044F" }, \{ "b" : "7FF8332FC000", "path" : "/usr/lib/x86_64-linux-gnu/libheimntlm.so.0", "elfType" : 3, "buildId" : "F284B367B83FC07B7309FA086DC6634C9CC8A005" }, \{ "b" : "7FF833074000", "path" : "/usr/lib/x86_64-linux-gnu/libkrb5.so.26", "elfType" : 3, "buildId" : "7CC32240A00456FA57B74BFB922E31BA8EEF57C2" }, \{ "b" : "7FF832DD3000", "path" : "/usr/lib/x86_64-linux-gnu/libasn1.so.8", "elfType" : 3, "buildId" : "7CF4C34552B60E44902EA2DFCC4EE4906A90DE3C" }, \{ "b" : "7FF832BA0000", "path" : "/usr/lib/x86_64-linux-gnu/libhcrypto.so.4", "elfType" : 3, "buildId" : "5F0EF0E1DDE5070F686668B93E6A9BEC44D83220" }, \{ "b" : "7FF83298B000", "path" : "/usr/lib/x86_64-linux-gnu/libroken.so.18", "elfType" : 3, "buildId" : "DF1229739A9F5E6A9850B519C95D8A811B63B8EF" }, \{ "b" : "7FF832783000", "path" : "/usr/lib/x86_64-linux-gnu/libffi.so.6", "elfType" : 3, "buildId" : "C114D2C23BD2F3B1705F37FBF9CA06163C8B89A6" }, \{ "b" : "7FF83255A000", "path" : "/usr/lib/x86_64-linux-gnu/libwind.so.0", "elfType" : 3, "buildId" : "DBCF291C6CF70F0D0BF62F07347AEF28E040E1A5" }, \{ "b" : "7FF83234C000", "path" : "/usr/lib/x86_64-linux-gnu/libheimbase.so.1", "elfType" : 3, "buildId" : "F4FDBD38788250E843523FFEA869A4DA933B6BBC" }, \{ "b" : "7FF832103000", "path" : "/usr/lib/x86_64-linux-gnu/libhx509.so.5", "elfType" : 3, "buildId" : "C03FF77D3A35A40589C712C74A8597FF532C8ED7" }, \{ "b" : "7FF831E4A000", "path" : "/usr/lib/x86_64-linux-gnu/libsqlite3.so.0", "elfType" : 3, "buildId" : "183703AF20E0C5BC50D86864CF0FA578F79564DB" }, \{ "b" : "7FF831C11000", "path" : "/lib/x86_64-linux-gnu/libcrypt.so.1", "elfType" : 3, "buildId" : "8C9454C84B882A57EAA28D8F8D92F8D2C8B21A79" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x41) [0x5610123b7e11]
 mongod(+0x2444029) [0x5610123b7029]
 mongod(+0x2444696) [0x5610123b7696]
 libpthread.so.0(+0x10330) [0x7ff8360eb330]
 libc.so.6(+0x9AA50) [0x7ff835daca50]
 mongod(__wt_rec_row_int+0xCBD) [0x561010bc031d]
 mongod(__wt_reconcile+0x12DE) [0x561010ae2d3e]
 mongod(__wt_cache_op+0x5B4) [0x561010b4cf94]
 mongod(+0xB9CD4A) [0x561010b0fd4a]
 mongod(+0xB9F13F) [0x561010b1213f]
 mongod(__wt_txn_checkpoint+0x1DB) [0x561010b131bb]
 mongod(+0xB85C4A) [0x561010af8c4a]
 mongod(_ZN5mongo18WiredTigerKVEngine26WiredTigerCheckpointThread3runEv+0x6A7) [0x561010a72c77]
 mongod(_ZN5mongo13BackgroundJob7jobBodyEv+0x131) [0x561011e0a341]
 mongod(+0x2554790) [0x5610124c7790]
 libpthread.so.0(+0x8184) [0x7ff8360e3184]
 libc.so.6(clone+0x6D) [0x7ff835e1003d]
----- END BACKTRACE -----



 Comments   
Comment by krzysztof osmulski [ 10/Nov/19 ]

Yes

Comment by Danny Hatcher (Inactive) [ 08/Nov/19 ]

Were you ever able to successfully run the --repair?

Comment by Danny Hatcher (Inactive) [ 20/Sep/19 ]

I recommend specifying the same logpath with the --repair argument that you do when you normally launch a mongod process. That way the logs will be consistent and you can provide them easily.

Comment by krzysztof osmulski [ 16/Sep/19 ]

Hello. I simply cannot find it. Mean the 'verify' logs comming from --repair. So i see two options.

  1. i'm mistaken and i did not run --repair !?!
  2.  when did
    sudo -u mongodb mongod --repair --dbpath /mnt/bigdata/mongodb/
    the logs were written to stdout not to a log file ?

 

Comment by Danny Hatcher (Inactive) [ 16/Sep/19 ]

Can you please provide the full mongod log covering the last --repair attempt up to the crash?

Comment by krzysztof osmulski [ 14/Sep/19 ]

Yes, I know i got corrupted data but could not find a reason to it. I now did mongod --repair succesfully and today got back the:

2019-09-14T06:00:10.822+0200 I NETWORK [conn4265] received client metadata from 127.0.0.1:33518 conn4265: { driver:

{ name: "mongo-java-driver", version: "3.9.1" }

, os: { type: "Linux", name: "Linux", architecture: "amd64", version: "4.4.0-148-generic" }, platform: "Java/Oracle Corporation/1.8.0_201-b09" }
2019-09-14T06:00:11.193+0200 E STORAGE [conn4265] WiredTiger error (0) [1568433611:186552][25398:0x7f2f8c42b700], file:collection-0--1401964055748237432.wt, WT_CURSOR.search: __wt_block_read_off, 279: collection-0-1401964055748237432.wt: read checksum error for 8192B block at offset 77187362816: calculated block checksum doesn't match expected checksum Raw: [1568433611:186552][25398:0x7f2f8c42b700], file:collection-0--1401964055748237432.wt, WT_CURSOR.search: __wt_block_read_off, 279: collection-0-1401964055748237432.wt: read checksum error for 8192B block at offset 77187362816: calculated block checksum doesn't match expected checksum
2019-09-14T06:00:11.193+0200 E STORAGE [conn4265] WiredTiger error (0) [1568433611:193683][25398:0x7f2f8c42b700], file:collection-0--1401964055748237432.wt, WT_CURSOR.search: __wt_bm_corrupt_dump, 145: {77187362816, 8192, 355146770}:

 

The server did not crash not restart in period from repair to now.

What more i can blame is the SSD but the S.M.A.R.T show good health.

Besides the sigsegv what is raised here is there any checks i could possibly do to isolate issue?

Form me it seems that for some reason data got corrupted by multiconnection access that search and update at the same time.

 

The slave is not an option for now. This is not that level, i use mongo for small analytics purposes. But it seems that it simply broke itself upon heavier load

Comment by Danny Hatcher (Inactive) [ 12/Sep/19 ]

It appears that you experienced corruption in the underlying data files; this is likely related to the underlying infrastructure of the server failing. We recommend using Replication to spread multiple mongod process across servers to easily recover from issues like this.

I see that you ran the repairDatabase command but it did not succeed. This command was actually deprecated in 4.2 as it does not cover as many cases as the --repair configuration option. Please try starting the server with the --repair option and let it run. When the repair process is finished, please try restarting the node without that option and see if you experience any problems.

Comment by krzysztof osmulski [ 10/Sep/19 ]

te server is standalone running on ext4, ubuntu 14.04
it contains few dbs one mostly grid fs (big storage mostly no load at this point) ~2m entries
other database under bulk load every 5m with around 6 connections doing upserts (mayb double by work overlapping)
possible query in the meantime by _id gathering data to different collection. _id is a asc long in string representation.
Query on that _id is 'gt' based
the on load collection is around 10m entries ~14KB each entry. Rotating data daily around 1m in out. May be fragmented
server was on normal procedure closing finished connection.
no restart, no system crash. After last connection until the crash there is 40s gap

 2019-09-10T04:00:10.157+0200 I COMMAND [conn2326] command allegro.offers command: find { find: "offers", filter: { _id:
 
{ $gt: "7821478828" }
 
, data.product.reviews.opinions: \{ $exists: true }, data.sellingMode: \{ $exists: true } }, sort: \{ _id: 1 }, projection: \{ _id: 1, data.product: 1, data.category: 1, data.name: 1, data.sellingMode: 1 }, limit: 50000, $db: "allegro", $readPreference: \{ mode: "primaryPreferred" } } planSummary: IXSCAN \{ _id: 1 } cursorid:27763511447 keysExamined:4443 docsExamined:4443 numYields:34 nreturned:101 reslen:188309 locks:{ Global: { acquireCount:
 
{ r: 35 }
 
}, Database: { acquireCount:
 
{ r: 35 }
 
}, Collection: { acquireCount:
 
{ r: 35 }
 
} } storage:{ data:
 
{ bytesRead: 65850886, timeReadingMicros: 150096 }
 
} protocol:op_msg 226ms
 2019-09-10T04:00:13.290+0200 I COMMAND [conn2326] command allegro.offers command: getMore { getMore: 27763511447, collection: "offers", batchSize: 49899, $db: "allegro", $readPreference:
 
{ mode: "primaryPreferred" }
 
} originatingCommand: { find: "offers", filter: { _id:
 
{ $gt: "7821478828" }
 
, data.product.reviews.opinions: \{ $exists: true }, data.sellingMode: \{ $exists: true } }, sort: \{ _id: 1 }, projection: \{ _id: 1, data.product: 1, data.category: 1, data.name: 1, data.sellingMode: 1 }, limit: 50000, $db: "allegro", $readPreference: \{ mode: "primaryPreferred" } } planSummary: IXSCAN \{ _id: 1 } cursorid:27763511447 keysExamined:20397 docsExamined:20397 cursorExhausted:1 numYields:159 nreturned:512 reslen:1023759 locks:{ Global: { acquireCount:
 
{ r: 160 }
 
}, Database: { acquireCount:
 
{ r: 160 }
 
}, Collection: { acquireCount:
 
{ r: 160 }
 
} } storage:{ data:
 
{ bytesRead: 310829904, timeReadingMicros: 773045 }
 
} protocol:op_msg 1144ms
 2019-09-10T04:00:17.953+0200 I NETWORK [conn2326] end connection 127.0.0.1:52482 (2 connections now open)
 2019-09-10T04:00:17.955+0200 I NETWORK [conn2325] end connection 127.0.0.1:52480 (1 connection now open)
 2019-09-10T04:00:58.144+0200 F - [WTCheckpointThread] Invalid access at address: 0x56e1
 2019-09-10T04:00:58.180+0200 F - [WTCheckpointThread] Got signal: 11 (Segmentation fault).

process was around 3GB including 2GB specified mongo cache.
whole db storage of mongo is around 103GB

Now to make it more complicated in attached mongo.log you will find more crashes (this bug is last one).
I registered my account here to help you understand struggels of me as standalone user.
I was working with on load collection having more that 30m of entries but i struggle issues on mongo inconsistency that the mongo
db.repairdatabase() was not successfull and rebuilding the idex was not possible.
in this log file you will see also that rebuilding the index is difficult.
After last crash (the one i reported) the db.col.validate(true) was sucessful.

I hope You get something meaningful from that output what will make mongo reliable as standalone for big collections

Comment by Danny Hatcher (Inactive) [ 10/Sep/19 ]

In order for us to diagnose this problem, can you please describe the situation when you encountered the stacktrace? Were you running a series of specific queries? Did you just shutdown and then bring the server back up again? Did the hardware underneath the process experience a failure?

Can you please also provide the full mongod log covering a significant time period before the crash up to and including the crash itself?

Generated at Thu Feb 08 05:02:40 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.