[SERVER-17439] Race condition in replset reconfig Created: 02/Mar/15  Updated: 06/Dec/22  Resolved: 30/Sep/16

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 2.6.6
Fix Version/s: 3.0.0

Type: Bug Priority: Major - P3
Reporter: Mathias Stearn Assignee: Backlog - Replication Team
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
Assigned Teams:
Replication
Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:

 Description   

Following a reconfig, a 2.6.6 mongod crashed with the following message:

2015-02-27T19:55:54.403+0000 [rsSync] SEVERE: Invalid access at address: 0
2015-02-27T19:55:54.411+0000 [conn36630502] SEVERE: Invalid access at address: 0
2015-02-27T19:55:54.454+0000 [rsSync] SEVERE: Got signal: 11 (Segmentation fault).
Backtrace:0x11fca91 0x11fbe6e 0x11fbf5f 0x7fb4a25d0130 0xec01c1 0xec0b7a 0xec0cb0 0xec0faa 0x1241429 0x7fb4a25c8df3 0x7fb4a18cf01d 
 /usr/local/mongodb/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0x11fca91]
 /usr/local/mongodb/bin/mongod() [0x11fbe6e]
 /usr/local/mongodb/bin/mongod() [0x11fbf5f]
 /lib64/libpthread.so.0(+0xf130) [0x7fb4a25d0130]
 /usr/local/mongodb/bin/mongod(_ZN5mongo7replset8SyncTail16oplogApplicationEv+0x7b1) [0xec01c1]
 /usr/local/mongodb/bin/mongod(_ZN5mongo11ReplSetImpl11_syncThreadEv+0x14a) [0xec0b7a]
 /usr/local/mongodb/bin/mongod(_ZN5mongo11ReplSetImpl10syncThreadEv+0x30) [0xec0cb0]
 /usr/local/mongodb/bin/mongod(_ZN5mongo15startSyncThreadEv+0xaa) [0xec0faa]
 /usr/local/mongodb/bin/mongod() [0x1241429]
 /lib64/libpthread.so.0(+0x7df3) [0x7fb4a25c8df3]
 /lib64/libc.so.6(clone+0x6d) [0x7fb4a18cf01d]

Symbolized:

mongo::printStackTrace(std::ostream&)
/srv/10gen/mci-exec/mci/shell/src/src/mongo/util/stacktrace.cpp:304
abruptQuit
/srv/10gen/mci-exec/mci/shell/src/src/mongo/util/signal_handlers.cpp:107
abruptQuitWithAddrSignal
/srv/10gen/mci-exec/mci/shell/src/src/mongo/util/signal_handlers.cpp:201
??
??:0
std::vector<mongo::ReplSetConfig::MemberCfg, std::allocator<mongo::ReplSetConfig::MemberCfg> >::end() const
/usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/bits/stl_vector.h:360
std::vector<mongo::ReplSetConfig::MemberCfg, std::allocator<mongo::ReplSetConfig::MemberCfg> >::size() const
/usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/bits/stl_vector.h:402
mongo::replset::SyncTail::oplogApplication()
/srv/10gen/mci-exec/mci/shell/src/src/mongo/db/repl/rs_sync.cpp:431
mongo::ReplSetImpl::_syncThread()
/srv/10gen/mci-exec/mci/shell/src/src/mongo/db/repl/rs_sync.cpp:773
mongo::ReplSetImpl::syncThread()
/srv/10gen/mci-exec/mci/shell/src/src/mongo/db/repl/rs_sync.cpp:818
mongo::TSP<mongo::Client>::get() const
/srv/10gen/mci-exec/mci/shell/src/src/mongo/db/repl/rs_sync.cpp:832
cc
/srv/10gen/mci-exec/mci/shell/src/src/mongo/db/client.h:252
mongo::startSyncThread()
/srv/10gen/mci-exec/mci/shell/src/src/mongo/db/repl/rs_sync.cpp:833
thread_proxy
/srv/10gen/mci-exec/mci/shell/src/src/third_party/boost/libs/thread/src/pthread/thread.cpp:133

It appears that this line is trying to access theReplSet->config().members.size() while the config pointer was NULL.



 Comments   
Comment by Eric Milkie [ 30/Sep/16 ]

This was fixed in version 3.0.

Comment by Eric Milkie [ 03/Mar/15 ]

Accessing _cfg could be protected by a mutex. It's only null while it is being replaced with a new ReplSetConfig().

Generated at Thu Feb 08 03:44:26 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.