[SERVER-17569] Got signal: 11 (Segmentation fault). Created: 12/Mar/15  Updated: 16/Nov/21  Resolved: 17/Mar/15

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: 3.0.0
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Marcos Fernándex Assignee: Bruce Lucas (Inactive)
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by WT-1901 Issues resolved in WiredTiger 2.5.3 Closed
Duplicate
duplicates SERVER-17613 Unable to start mongod after unclean ... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:

 Description   

cat mongodb.log

2015-03-12T18:46:59.621+0100 I CONTROL  [initandlisten] MongoDB starting : pid=28081 port=27017 dbpath=/home/mongodb 64-bit host=ns320462.ip-176-31-114.eu
2015-03-12T18:46:59.621+0100 I CONTROL  [initandlisten] ** WARNING: You are running this process as the root user, which is not recommended.
2015-03-12T18:46:59.621+0100 I CONTROL  [initandlisten] 
2015-03-12T18:46:59.622+0100 I CONTROL  [initandlisten] db version v3.0.0
2015-03-12T18:46:59.622+0100 I CONTROL  [initandlisten] git version: a841fd6394365954886924a35076691b4d149168
2015-03-12T18:46:59.622+0100 I CONTROL  [initandlisten] OpenSSL version: OpenSSL 1.0.1f 6 Jan 2014
2015-03-12T18:46:59.622+0100 I CONTROL  [initandlisten] build info: Linux ip-10-179-177-12 3.13.0-24-generic #46-Ubuntu SMP Thu Apr 10 19:11:08 UTC 2014 x86_64 BOOST_LIB_VERSION=1_49
2015-03-12T18:46:59.622+0100 I CONTROL  [initandlisten] allocator: tcmalloc
2015-03-12T18:46:59.622+0100 I CONTROL  [initandlisten] options: { config: "/etc/mongod.conf", net: { bindIp: "127.0.0.1", port: 27017 }, storage: { dbPath: "/home/mongodb", engine: "wiredTiger", journal: { enabled: true }, wiredTiger: { collectionConfig: { blockCompressor: "zlib" } } }, systemLog: { destination: "file", path: "/var/log/mongodb/mongodb.log" } }
2015-03-12T18:46:59.662+0100 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=1G,session_max=20000,eviction=(threads_max=4),statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),
2015-03-12T18:46:59.734+0100 F -        [initandlisten] Invalid access at address: 0
2015-03-12T18:46:59.753+0100 F -        [initandlisten] Got signal: 11 (Segmentation fault).
 
 0xf3f529 0xf3edf2 0xf3f14e 0x6889cc807c90 0x133ef78 0x131321d 0x131382b 0x132279a 0x1374495 0x13038e1 0x12fe48c 0xd57bcd 0xd55998 0xa7bd6d 0x807627 0x7d4259 0x6889cb1dbec5 0x805377
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"B3F529"},{"b":"400000","o":"B3EDF2"},{"b":"400000","o":"B3F14E"},{"b":"6889CC7F8000","o":"FC90"},{"b":"400000","o":"F3EF78"},{"b":"400000","o":"F1321D"},{"b":"400000","o":"F1382B"},{"b":"400000","o":"F2279A"},{"b":"400000","o":"F74495"},{"b":"400000","o":"F038E1"},{"b":"400000","o":"EFE48C"},{"b":"400000","o":"957BCD"},{"b":"400000","o":"955998"},{"b":"400000","o":"67BD6D"},{"b":"400000","o":"407627"},{"b":"400000","o":"3D4259"},{"b":"6889CB1BA000","o":"21EC5"},{"b":"400000","o":"405377"}],"processInfo":{ "mongodbVersion" : "3.0.0", "gitVersion" : "a841fd6394365954886924a35076691b4d149168", "uname" : { "sysname" : "Linux", "release" : "3.14.32-xxxx-grs-ipv6-64", "version" : "#1 SMP Sat Feb 7 11:35:27 CET 2015", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "5F365548FE8312E027817D98FEECA3A18F0F60FB" }, { "b" : "6889CD537000", "elfType" : 3, "buildId" : "FAF400EE48C6DC7D3D021FC95AA21E92ED9541BC" }, { "b" : "6889CC7F8000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "921196598AF41AFF8DE42EEFB8561243610F34C3" }, { "b" : "6889CC599000", "path" : "/lib/x86_64-linux-gnu/libssl.so.1.0.0", "elfType" : 3, "buildId" : "B408BD42C304C9370D97ED641544082414C4D59A" }, { "b" : "6889CC1B6000", "path" : "/lib/x86_64-linux-gnu/libcrypto.so.1.0.0", "elfType" : 3, "buildId" : "B0BB841B6CFD35E8D3D2AC285C220A4683A134EF" }, { "b" : "6889CBFAE000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "54EF3A97A3E71418DD088B40AF51A00457834A17" }, { "b" : "6889CBDAA000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "794CD87201C9778112E22BF5E2C0FBFB3390D29F" }, { "b" : "6889CBA9B000", "path" : "/usr/lib/x86_64-linux-gnu/libstdc++.so.6", "elfType" : 3, "buildId" : "ADEF33B83967BBB41525AE439354F030694250C4" }, { "b" : "6889CB795000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "4E96203F4FE17D3446F48226AAEA8DA6DEA8FFD0" }, { "b" : "6889CB57E000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "7C6E98219378EBD1AA0D4CD671E8FF1589C04C4A" }, { "b" : "6889CB1BA000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "95287BE8ACCCC7B5723F4306E6A5ECA6DFE7BFFD" }, { "b" : "6889CCA16000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "9240DBBD1DB14E756141EEE1FDDB67D3B77864E7" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xf3f529]
 mongod(+0xB3EDF2) [0xf3edf2]
 mongod(+0xB3F14E) [0xf3f14e]
 libpthread.so.0(+0xFC90) [0x6889cc807c90]
 mongod(__wt_struct_unpack+0x368) [0x133ef78]
 mongod(+0xF1321D) [0x131321d]
 mongod(+0xF1382B) [0x131382b]
 mongod(__wt_log_needs_recovery+0xAA) [0x132279a]
 mongod(__wt_txn_recover+0x425) [0x1374495]
 mongod(__wt_connection_workers+0x61) [0x13038e1]
 mongod(wiredtiger_open+0x118C) [0x12fe48c]
 mongod(_ZN5mongo18WiredTigerKVEngineC1ERKSsS2_bb+0x2FD) [0xd57bcd]
 mongod(+0x955998) [0xd55998]
 mongod(_ZN5mongo23GlobalEnvironmentMongoD22setGlobalStorageEngineERKSs+0x30D) [0xa7bd6d]
 mongod(_ZN5mongo13initAndListenEi+0x6F7) [0x807627]
 mongod(main+0x139) [0x7d4259]
 libc.so.6(__libc_start_main+0xF5) [0x6889cb1dbec5]
 mongod(+0x405377) [0x805377]
-----  END BACKTRACE  -----
root@ns320462:/var/log/mongodb#



 Comments   
Comment by Bruce Lucas (Inactive) [ 17/Mar/15 ]

Hi Marcos,

It turns out that we also recently reproduced this problem in an internal test, reported in SERVER-17613, and work has been proceeding on that ticket towards a fix. It turns out that the db itself was not corrupted, but rather there is an error in the recovery code that leads to the segfault. As work has been proceeding on that ticket towards a fix, I'll mark this ticket as a duplicate; please follow SERVER-17613 for information about progress on a fix. Thanks again for reporting this, and thanks for uploading the db for our investigation.

Bruce

Comment by Marcos Fernándex [ 16/Mar/15 ]

@Asya Kamsky

oh thanks, wasnt aware of it, very useful, Im gonna try it now

Comment by Asya Kamsky [ 16/Mar/15 ]

sombra2eternity

Note that you can specify different WiredTiger options to the collection when you create it:

http://docs.mongodb.org/manual/reference/method/db.createCollection/#specify-storage-engine-options

Comment by Bruce Lucas (Inactive) [ 16/Mar/15 ]

Thanks for the detailed information Marcos, and my apologies for missing your comment about OOM killer. I agree that neither segmentation fault nor the failure of --repair are the desired outcome, and furthermore I would expect that we should be able to survive OOM killer without this kind of corruption. We're looking into it.
Thanks,
Bruce

Comment by Marcos Fernándex [ 16/Mar/15 ]

As I said before (second comment), this broken state was produced by kernel OOM killing mongodb, in fact it taked me several minutes to figure out why mongodb was dissapearing and saw kern.log. WiredTiger is an expensive improvement if you use mongodb gridFS extensively, but unfortunately i'm not aware of how to configure wiredTiger/MMAPv1 per database.

I have no logs of instances before, the long short history is:

  • Buyed a new server to host a growing project
  • Installed brand new mongodb 3.0
  • Configured wiredTiger as storage engine
  • Copied database with db.copy()
  • Moved DNS to new server
  • A few minutes under real usage the mongodb instances were disappearing (killed by OOM), I started again the daemon serveral times trying to understand whats going on
  • At third restart mongo were unable to start (segmentation fault)
  • Moved back DNS to old server

My conclusions:

  • Even with a better CPU/more ram, wiredTiger is not realiable for gridFS, MMAPv1 worked much better, I wanted to get advantages in compression of the other stuff (not files) but I cant disable compression per db.
  • Mongodb segmentation fault, I know the mongodb kill may has left database in an unrecoverable state, but I still think mongodb must never segmentation fault, a proper message like "I'm unable to recover anything" or "journal corrupt, ignoring" is more appropiate.
  • Even mongod with --repair option produced a segmentation fault, which seems very critical to me.

Thanks

Comment by Bruce Lucas (Inactive) [ 16/Mar/15 ]

Hi Marcos,

It looks like there's a problem in one of the journal files. We're investigating to see what we can determine about the cause. It will be helpful to have some information about the prior history of instance before you started observing this problem -

  • were there any unusual events such as mongod or node crashes?
  • can you please compress and attach log files prior to the occurrence of this problem, especially the last log file with a successful startup of mongod prior to the occurrence of this problem.

Thanks,
Bruce

Comment by Bruce Lucas (Inactive) [ 12/Mar/15 ]

Thanks Marcos. I will take a look and get back to you with my findings.

Comment by Marcos Fernándex [ 12/Mar/15 ]

Upload complete. Thanks

Comment by Ramon Fernandez Marina [ 12/Mar/15 ]

sombra2eternity, just hit enter when prompted for a password.

Comment by Bruce Lucas (Inactive) [ 12/Mar/15 ]

Thanks Marcos. You can just hit enter when it asks for password.

Comment by Marcos Fernándex [ 12/Mar/15 ]

scp -P 722 -r <filename> SERVER-17569@www.mongodb.com:
ask me for a password :/

Comment by Marcos Fernándex [ 12/Mar/15 ]

ok, fair enought, im zipping it right now. Next I'll do:

scp -P 722 -r <filename> SERVER-17569@www.mongodb.com:

right?

Comment by Bruce Lucas (Inactive) [ 12/Mar/15 ]

Hi Marcos,

We as a general policy don't access users' systems directly. In our experience, we have found that a 9 GB upload may take some time, but it is possible. Would you be willing to give it a try? We would very much like to look at the data to understand this issue.

Thanks,
Bruce

Comment by J Rassi [ 12/Mar/15 ]

Assigning to bruce.lucas@10gen.com.

Comment by Marcos Fernándex [ 12/Mar/15 ]

1. With a clean path it works, in fact it was working for a few days now correctly, then the daemon got killed by the kernel OOM (I think wiredTiger have a problem with memory too) and all got wrong.
2. I can grant access to this server if it will help, its mostly clean with one lone project.
3. 9GB of images, I dont see it doable.

After I wrote this issue I just changed all binaries to the 3.0.1 release candidate to see if it could start, same error. Just give me an email address and I will send you ssh access to this server. Mongo is currently running with a clean /home/mongodb, old data is moved to /home/mongodb1 in order to check your above questions.

Comment by J Rassi [ 12/Mar/15 ]

Hi,

Thanks for the report, and sorry to hear that you're encountering this problem. We'll need to gather additional information in order to diagnose the issue:

  • Does this issue occur if you start mongod with a clean dbpath, or does it only occur when starting with your database files in /home/mongodb?
  • Could you please compress the mongod log file corresponding to your last successful mongod restart on these data files, and upload it as an attachment to this ticket?
  • Could you please compress/archive your /home/mongodb data directory and upload it as an attachment to this ticket?

If you are not comfortable posting your data or your logs publicly (or if either exceeds 100MB in total size), please upload them to our private drop box. Only MongoDB staff has access to data uploaded this way. You can do so as follows from the command line (when you are prompted for a password, please just press "enter"):

scp -P 722 -r <filename> SERVER-17569@www.mongodb.com:

~ Jason Rassi

Comment by Marcos Fernándex [ 12/Mar/15 ]

root@ns320462:/var/log/mongodb# mongod --version
db version v3.0.0
git version: a841fd6394365954886924a35076691b4d149168
OpenSSL version: OpenSSL 1.0.1f 6 Jan 2014

root@ns320462:/var/log/mongodb# uname -a
Linux ns320462.ip-176-31-114.eu 3.14.32-xxxx-grs-ipv6-64 #1 SMP Sat Feb 7 11:35:27 CET 2015 x86_64 x86_64 x86_64 GNU/Linux

ubuntu 14.10

Generated at Thu Feb 08 03:44:56 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.