[SERVER-16107] 2.6 mongod crashes with segfault when added to a 2.8 replica set with >= 12 nodes. Created: 12/Nov/14  Updated: 02/Feb/15  Resolved: 25/Nov/14

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 2.6.6

Type: Bug Priority: Major - P3
Reporter: Charlie Swanson Assignee: Andy Schwerin
Resolution: Done Votes: 0
Labels: 28qa
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File SERVER-16107.log    
Issue Links:
Related
Tested
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

Start a replica set with 12 nodes all in 2.8. Start node in 2.6 using same --replSet param. Re-configure to add the 2.6 node as the 13th node in the replica set. Above stack trace is logged as the 2.6 node crashes.

Participants:

 Description   

No helpful error messages logged, just this:

2014-11-11T16:06:02.983-0500 [rsStart] replSet can't get local.system.replset config from self or any seed (EMPTYCONFIG)
... (more of those)
2014-11-11T16:06:03.984-0500 [rsStart] replSet can't get local.system.replset config from self or any seed (EMPTYCONFIG)
2014-11-11T16:06:04.736-0500 [initandlisten] connection accepted from 127.0.0.1:50138 #1 (1 connection now open)
2014-11-11T16:06:04.739-0500 [initandlisten] connection accepted from 127.0.0.1:50139 #2 (2 connections now open)
2014-11-11T16:06:04.739-0500 [conn1] end connection 127.0.0.1:50138 (1 connection now open)
2014-11-11T16:06:04.990-0500 [rsStart] trying to contact charlie-macbook-pro:41000
2014-11-11T16:06:04.994-0500 [rsStart] SEVERE: Invalid access at address: 0x0
2014-11-11T16:06:04.998-0500 [rsStart] SEVERE: Got signal: 11 (Segmentation fault: 11).
Backtrace:0x109b0e58a 0x109b0e026 0x109b0e134 0x7fff8f8f1f1a 0x0 0x1098ad564 0x1098b205a 0x109b422c1 0x7fff8981f2fc 0x7fff8981f279 0x7fff8981d4b1 
 0   mongod                              0x0000000109b0e58a _ZN5mongo15printStackTraceERNSt3__113basic_ostreamIcNS0_11char_traitsIcEEEE + 58
 1   mongod                              0x0000000109b0e026 _ZN5mongo12_GLOBAL__N_110abruptQuitEi + 198
 2   mongod                              0x0000000109b0e134 _ZN5mongo12_GLOBAL__N_124abruptQuitWithAddrSignalEiP9__siginfoPv + 212
 3   libsystem_platform.dylib            0x00007fff8f8f1f1a _sigtramp + 26
 4   ???                                 0x0000000000000000 0x0 + 0
 5   mongod                              0x00000001098ad564 _ZN5mongo11ReplSetImpl4initERNS_14ReplSetCmdlineE + 308
 6   mongod                              0x00000001098b205a _ZN5mongo13startReplSetsEPNS_14ReplSetCmdlineE + 170
 7   mongod                              0x0000000109b422c1 _ZN5boost12_GLOBAL__N_112thread_proxyEPv + 177
 8   libsystem_pthread.dylib             0x00007fff8981f2fc _pthread_body + 131
 9   libsystem_pthread.dylib             0x00007fff8981f279 _pthread_body + 0
 10  libsystem_pthread.dylib             0x00007fff8981d4b1 thread_start + 13



 Comments   
Comment by Githook User [ 25/Nov/14 ]

Author:

{u'username': u'andy10gen', u'name': u'Andy Schwerin', u'email': u'schwerin@mongodb.com'}

Message: SERVER-16107 Do not increment erased iterators when processing bad replica set configs.
Branch: v2.6
https://github.com/mongodb/mongo/commit/fa4f0277a788e78db3639eaa82d46bac1dfa9e34

Comment by Andy Schwerin [ 12/Nov/14 ]

The code path in 2.6 that initializes replica set configs from remote nodes at startup invalidates an iterator and then advances it when an unsupported configuration arrives. This is causing the crash. Notably, it also fails to log an informative message, so if it hadn't crashed, it would have just sat there, waiting forever.

Probably, the best solution is to log the exception, and then safely advance and erase the iterator.

Comment by Charlie Swanson [ 12/Nov/14 ]

Sorry, that's just the 2.6 node, did you need any from the other nodes?

Comment by Charlie Swanson [ 12/Nov/14 ]

Log attached.

Comment by Andy Schwerin [ 12/Nov/14 ]

Please attach the entire log, charlie.swanson.

Generated at Thu Feb 08 03:39:59 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.