[SERVER-20204] Segmentation fault during index build on 3.0 secondary Created: 30/Aug/15  Updated: 13/Oct/15  Resolved: 02/Sep/15

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: 3.0.5
Fix Version/s: 3.0.7, 3.1.8

Type: Bug Priority: Critical - P2
Reporter: Maksim Naumov Assignee: Geert Bosch
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-20159 Out of memory on index build during i... Closed
is related to SERVER-20244 Should eliminate usage of malloc/strd... Closed
Backwards Compatibility: Fully Compatible
Operating System: Linux
Backport Completed:
Sprint: Quint 9 09/18/15
Participants:

 Description   

Hello
I got Segmentation fault on building an index.
Mongo 3.0.6

2015-08-30T03:31:59.472+0200 I -        [rsSync]   Index Build: 5457300/16365374 33%
2015-08-30T03:32:00.870+0200 F -        [rsSync] Invalid access at address: 0
2015-08-30T03:32:01.079+0200 F -        [rsSync] Got signal: 11 (Segmentation fault).
 
 0xf5ba59 0xf5b322 0xf5b67e 0x7ff0015e7340 0x7ff000043ff9 0xd734ef 0xd73572 0x9fb456 0xbce204 0x92bc1b 0x9388be 0xc8db36 0xc8ebf1 0xc8ffb2 0xc99231 0xfa9724 0x7ff0015df182 0x7ff0000a647d
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"B5BA59"},{"b":"400000","o":"B5B322"},{"b":"400000","o":"B5B67E"},{"b":"7FF0015D7000","o":"10340"},{"b":"7FEFFFFAC000","o":"97FF9"},{"b":"400000","o":"9734EF"},{"b":"400000","o":"973572"},{"b":"400000","o":"5FB456"},{"b":"400000","o":"7CE204"},{"b":"400000","o":"52BC1B"},{"b":"400000","o":"5388BE"},{"b":"400000","o":"88DB36"},{"b":"400000","o":"88EBF1"},{"b":"400000","o":"88FFB2"},{"b":"400000","o":"899231"},{"b":"400000","o":"BA9724"},{"b":"7FF0015D7000","o":"8182"},{"b":"7FEFFFFAC000","o":"FA47D"}],"processInfo":{ "mongodbVersion" : "3.0.5", "gitVersion" : "8bc4ae20708dbb493cb09338d9e7be6698e4a3a3", "uname" : { "sysname" : "Linux", "release" : "3.13.0-45-generic", "version" : "#74-Ubuntu SMP Tue Jan 13 19:36:28 UTC 2015", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "695FC6828398A9DB1F99718671147885B5ED116D" }, { "b" : "7FFF9CEE4000", "elfType" : 3, "buildId" : "9D77366C6409A9EA266179080FA7C779EEA8A958" }, { "b" : "7FF0015D7000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "9318E8AF0BFBE444731BB0461202EF57F7C39542" }, { "b" : "7FF001378000", "path" : "/lib/x86_64-linux-gnu/libssl.so.1.0.0", "elfType" : 3, "buildId" : "A20EFFEC993A8441FA17F2079F923CBD04079E19" }, { "b" : "7FF000F9D000", "path" : "/lib/x86_64-linux-gnu/libcrypto.so.1.0.0", "elfType" : 3, "buildId" : "F000D29917E9B6E94A35A8F02E5C62846E5916BC" }, { "b" : "7FF000D95000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "92FCF41EFE012D6186E31A59AD05BDBB487769AB" }, { "b" : "7FF000B91000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "C1AE4CB7195D337A77A3C689051DABAA3980CA0C" }, { "b" : "7FF00088D000", "path" : "/usr/lib/x86_64-linux-gnu/libstdc++.so.6", "elfType" : 3, "buildId" : "4BF6F7ADD8244AD86008E6BF40D90F8873892197" }, { "b" : "7FF000587000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "1D76B71E905CB867B27CEF230FCB20F01A3178F5" }, { "b" : "7FF000371000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "8D0AA71411580EE6C08809695C3984769F25725B" }, { "b" : "7FEFFFFAC000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "30C94DC66A1FE95180C3D68D2B89E576D5AE213C" }, { "b" : "7FF0017F5000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "9F00581AB3C73E3AEA35995A0C50D24D59A01D47" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xf5ba59]
 mongod(+0xB5B322) [0xf5b322]
 mongod(+0xB5B67E) [0xf5b67e]
 libpthread.so.0(+0x10340) [0x7ff0015e7340]
 libc.so.6(+0x97FF9) [0x7ff000043ff9]
 mongod(_ZNK5mongo21WiredTigerRecordStore8_getDataERKNS_16WiredTigerCursorE+0x7F) [0xd734ef]
 mongod(_ZNK5mongo21WiredTigerRecordStore8Iterator7dataForERKNS_8RecordIdE+0x42) [0xd73572]
 mongod(_ZN5mongo14CollectionScan4workEPm+0x226) [0x9fb456]
 mongod(_ZN5mongo12PlanExecutor18getNextSnapshottedEPNS_11SnapshottedINS_7BSONObjEEEPNS_8RecordIdE+0xA4) [0xbce204]
 mongod(_ZN5mongo15MultiIndexBlock30insertAllDocumentsInCollectionEPSt3setINS_8RecordIdESt4lessIS2_ESaIS2_EE+0x17B) [0x92bc1b]
 mongod(_ZN5mongo6Cloner2goEPNS_16OperationContextERKSsS4_RKNS_12CloneOptionsEPSt3setISsSt4lessISsESaISsEERSsPi+0x17AE) [0x9388be]
 mongod(+0x88DB36) [0xc8db36]
 mongod(+0x88EBF1) [0xc8ebf1]
 mongod(_ZN5mongo4repl17syncDoInitialSyncEv+0x42) [0xc8ffb2]
 mongod(_ZN5mongo4repl13runSyncThreadEv+0x181) [0xc99231]
 mongod(+0xBA9724) [0xfa9724]
 libpthread.so.0(+0x8182) [0x7ff0015df182]
 libc.so.6(clone+0x6D) [0x7ff0000a647d]
-----  END BACKTRACE  -----



 Comments   
Comment by Ramon Fernandez Marina [ 02/Sep/15 ]

Thank you for reporting the problem fromyukki, and sorry you run into it. Feel free to post updates on this ticket after using 3.0.7-pre-, and to open new tickets if you run into further issues.

Comment by Maksim Naumov [ 02/Sep/15 ]

Hello ramon.fernandez thank you for your support. I will try pre-build and then i will upgrade to 3.0.7. Thank you one more time.

Comment by Ramon Fernandez Marina [ 02/Sep/15 ]

fromyukki, we've identified the source of the problem and committed a fix. The fix will be in upcoming releases 3.0.7 and 3.1.8, to be released in the coming weeks.

Unfortunately there's currently no workaround for this issue. You may try reducing the WiredTiger cache size, but depending on your data distribution you may run into SERVER-20159.

If this issue is currently blocking you one thing you can try is using a 3.0.7-pre- build including the fix for this bug; the build will appear here soon.

Regards,
Ramón.

Comment by Githook User [ 02/Sep/15 ]

Author:

{u'username': u'GeertBosch', u'name': u'Geert Bosch', u'email': u'geert@mongodb.com'}

Message: SERVER-20204: Use mongoMalloc to allocate memory

(cherry picked from commit 0a6c20bdc7128d1f13e967a7cc6219b1dfc38b6b)
Branch: v3.0
https://github.com/mongodb/mongo/commit/6fef26b65a64096fcfd0a287e5420fc58ced9865

Comment by Githook User [ 01/Sep/15 ]

Author:

{u'username': u'GeertBosch', u'name': u'Geert Bosch', u'email': u'geert@mongodb.com'}

Message: SERVER-20204: Use mongoMalloc to allocate memory
Branch: master
https://github.com/mongodb/mongo/commit/0a6c20bdc7128d1f13e967a7cc6219b1dfc38b6b

Comment by Maksim Naumov [ 01/Sep/15 ]

Hello ramon.fernandez, I am sorry, but dmesg has nothing for August 30 (seems that somebody reloaded this server). syslog has nothing as well, just a batch of cron logs.

Comment by Ramon Fernandez Marina [ 01/Sep/15 ]

fromyukki, the stack trace seems to point to a failure to allocate memory. Can you please send us the output of

dmesg -T


from this node? If there was a malloc(3) failure it may show up there. Depending on your syslog configuration, this information may have also been written to /var/log/kern.log or /var/log/syslog. If the output of dmesg -T doesn't show anything I'll ask you to send some of the log files I mentioned above (I'll create an upload portal so you can send this privately and securely).

Thanks,
Ramón.

Comment by Ramon Fernandez Marina [ 31/Aug/15 ]

fromyukki, the reason you're seeing SERVER-19673 could be because you have set up a very high value for the WiredTiger cache: setting this parameter to 25GB doesn't mean that mongod will only use 25GB, it means that the WiredTiger cache may grow up to 25GB – mongod will need additional memory to run.

Note also that setting a high value for the WiredTiger cache may starve the OS buffer cache, which may have a negative impact on performance.

That being said, this does not explain the segfault of course – we're still investigating the behavior you reported in this ticket.

Comment by Maksim Naumov [ 31/Aug/15 ]

Yes sure.
Currently, we have only one node (I am trying to add one more). This node is running on mongo v2.6.10. Due to upgrade to the newest version we decided to run the second node on mongo 3.0.x. I tried it starting from 3.0.3 to 3.0.5 and every time I have an error like in this ticket SERVER-19673. This time we have the different one.

Mongo is running on virtual machine: 30Gb of memory, 5Tb of a disk and just 4 CPU. Segfault happened on the secondary node while it was syncing with the primary (very new node). After restart all the data was dropped and sync process was started from the very beginning.

Here is the config file

systemLog:
    destination: file
    path: "/var/log/mongodb/mongod.log"
    logAppend: true
    verbosity: 0
    quiet: true
processManagement:
    fork: true
net:
    bindIp: 127.0.0.1,192.168.XX.XX
replication:
    replSetName: "XXXX"
storage:
    dbPath: /data/mongodb
    directoryPerDB: true
    journal:
        enabled: true
    engine: "wiredTiger"
    wiredTiger:
        engineConfig:
            cacheSizeGB: 25
            journalCompressor: snappy
        collectionConfig:
            blockCompressor: snappy
        indexConfig:
            prefixCompression: true

In the log file nothing more (some megabytes of replication logs) and everything that I have, I posted as this issue description.

Comment by Ramon Fernandez Marina [ 31/Aug/15 ]

Thanks fromyukki – yes, I wanted to be sure there were no packaging-related issues as well, and the dpkg output clears that out.

And yes, on to the segfault, for which we need some more information:

  1. Can you tell us some more details about your setup? Number of nodes, hardware configuration, MongoDB deployment...
  2. Did the segfault happened on a primary or secondary? Or is this a stand-alone node?
  3. Can you please send us the full logs of the affected node since the last restart until the segfault happened?
  4. If you have restarted this node, does the problem appear again or does the index build complete?

Thanks,
Ramón.

Comment by Maksim Naumov [ 31/Aug/15 ]

Oh, yes my fault ... anyway, seg fault

$ dpkg -l | grep mongodb
ii  mongodb-org                              3.0.6                            amd64        MongoDB open source document-oriented database system (metapackage)
ii  mongodb-org-mongos                       3.0.5                            amd64        MongoDB sharded cluster query router
ii  mongodb-org-server                       3.0.5                            amd64        MongoDB database server
ii  mongodb-org-shell                        3.0.5                            amd64        MongoDB shell client
ii  mongodb-org-tools                        3.0.5                            amd64        MongoDB tools

Comment by Maksim Naumov [ 31/Aug/15 ]

Hi ramon.fernandez , I also saw this backtrace, but the fact is that I use the latest version from your repository. Maybe you compiled it with wrong version or repository has wrong one.

Comment by Ramon Fernandez Marina [ 31/Aug/15 ]

Hi fromyukki, I adjusted the "Affects Version/s" field as per the version in the stack trace:

"processInfo":{ "mongodbVersion" : "3.0.5", "gitVersion" : "8bc4ae20708dbb493cb09338d9e7be6698e4a3a3" ...

Can you send the output of

dpkg -l | grep mongodb

?

Thanks,
Ramón

Comment by Maksim Naumov [ 30/Aug/15 ]

It is written "Affects Version/s: 3.0.5", but I got this seg fault on 3.0.6.

It says that the last version of mongo is installed
$ dpkg -s mongodb-org | grep Version
Version: 3.0.6

But
$ mongod --version
db version v3.0.5
git version: 8bc4ae20708dbb493cb09338d9e7be6698e4a3a3

So strange

Generated at Thu Feb 08 03:53:30 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.