[SERVER-15530] Segmentation fault on secondary after rs.reconfig Created: 03/Oct/14  Updated: 10/Dec/14  Resolved: 03/Oct/14

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 2.7.7
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: John Morales Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

AWS Ubuntu 14.04 x64


Operating System: ALL
Steps To Reproduce:

1.) Three empty 2.7.7 nodes on distinct machines (unclear if distinct machines is requirement)
2.) Set initially created with rs.add/rs.addArb shell functions using internal hostnames. (Primary, Secondary, Arbiter configuration)
3.) Set reconfigured with rs.reconfig(config) using external hostnames. Example document:

regular:PRIMARY> config
{
	"_id" : "regular",
	"members" : [
		{
			"_id" : 0,
			"host" : "ec2-XXX.compute-1.amazonaws.com:3000"
		},
		{
			"_id" : 1,
			"host" : "ec2-YYY.compute-1.amazonaws.com:3000",
			"arbiterOnly" : true
		},
		{
			"_id" : 2,
			"host" : "ec2-ZZZ.compute-1.amazonaws.com:3000"
		}
	]
}
regular:PRIMARY> rs.reconfig(config)

Participants:

 Description   

Backtrace:

...
2014-10-03T18:58:33.867+0000 I REPLSETS [rsMgr] replset msgReceivedNewConfig version: version: 4
2014-10-03T18:58:33.867+0000 I REPLSETS [rsMgr] replSet info saving a newer config version to local.system.replset: { _id: "regular", version: 4, members: [ { _id: 0, host: "ec2-XXX.compute-1.amazonaws.com:3000" }, { _id: 1, host: "ec2-YYY.compute-1.amazonaws.com:3000", arbiterOnly: true }, { _id: 2, host: "ec2-ZZZ.compute-1.amazonaws.com:3000" } ], settings: { getLastErrorDefaults: { w: 1, wtimeout: 0 } } }
2014-10-03T18:58:33.867+0000 I REPLSETS [rsMgr] replSet saveConfigLocally done
2014-10-03T18:58:33.878+0000 I REPLSETS [rsMgr] replSet replSetReconfig new config saved locally
2014-10-03T18:58:33.879+0000 I REPLSETS [rsHealthPoll] replSet member ec2-XXX.compute-1.amazonaws.com:3000 is up
2014-10-03T18:58:33.879+0000 I REPLSETS [rsHealthPoll] replSet member ec2-XXX.compute-1.amazonaws.com:3000 is now in state PRIMARY
2014-10-03T18:58:33.880+0000 I REPLSETS [rsHealthPoll] replSet member ec2-YYY.compute-1.amazonaws.com:3000 is up
2014-10-03T18:58:33.880+0000 I REPLSETS [rsHealthPoll] replSet member ec2-YYY.compute-1.amazonaws.com:3000 is now in state ARBITER
2014-10-03T18:58:34.182+0000 I NETWORK  [initandlisten] connection accepted from 10.0.0.161:38736 #205 (4 connections now open)
2014-10-03T18:58:34.183+0000 I NETWORK  [conn205] end connection 10.0.0.161:38736 (3 connections now open)
2014-10-03T18:58:34.187+0000 I NETWORK  [initandlisten] connection accepted from 10.0.0.161:38739 #206 (4 connections now open)
2014-10-03T18:58:34.187+0000 I NETWORK  [conn206] end connection 10.0.0.161:38739 (3 connections now open)
2014-10-03T18:58:34.188+0000 I NETWORK  [initandlisten] connection accepted from 10.0.0.161:38740 #207 (4 connections now open)
2014-10-03T18:58:34.188+0000 I NETWORK  [conn207] end connection 10.0.0.161:38740 (3 connections now open)
2014-10-03T18:58:34.189+0000 I NETWORK  [initandlisten] connection accepted from 10.0.0.161:38742 #208 (4 connections now open)
2014-10-03T18:58:38.341+0000 F -        [rsBackgroundSync] Invalid access at address: 0xfc
2014-10-03T18:58:38.346+0000 F -        [rsBackgroundSync] Got signal: 11 (Segmentation fault).
 
 0xeed169 0xeecd22 0xeed04e 0x7fbad97ad340 0xc892f9 0xbd957a 0xbdb8d5 0xbdc882 0xbdc978 0xf398a4 0x7fbad97a5182 0x7fbad88a5fbd
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"AED169"},{"b":"400000","o":"AECD22"},{"b":"400000","o":"AED04E"},{"b":"7FBAD979D000","o":"10340"},{"b":"400000","o":"8892F9"},{"b":"400000","o":"7D957A"},{"b":"400000","o":"7DB8D5"},{"b":"400000","o":"7DC882"},{"b":"400000","o":"7DC978"},{"b":"400000","o":"B398A4"},{"b":"7FBAD979D000","o":"8182"},{"b":"7FBAD87AB000","o":"FAFBD"}],"processInfo":{ "mongodbVersion" : "2.7.7", "gitVersion" : "afae7c082b1b4eff8401f660124e161137fe1d2b", "uname" : { "sysname" : "Linux", "release" : "3.13.0-36-generic", "version" : "#63-Ubuntu SMP Wed Sep 3 21:30:07 UTC 2014", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000" }, { "b" : "7FFF4C8FE000", "elfType" : 3 }, { "b" : "7FBAD979D000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3 }, { "b" : "7FBAD9595000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3 }, { "b" : "7FBAD9391000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3 }, { "b" : "7FBAD908D000", "path" : "/usr/lib/x86_64-linux-gnu/libstdc++.so.6", "elfType" : 3 }, { "b" : "7FBAD8D87000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3 }, { "b" : "7FBAD8B71000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3 }, { "b" : "7FBAD87AB000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3 }, { "b" : "7FBAD99BB000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3 } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xeed169]
 mongod(+0xAECD22) [0xeecd22]
 mongod(+0xAED04E) [0xeed04e]
 libpthread.so.0(+0x10340) [0x7fbad97ad340]
 mongod(_ZNK5mongo4repl11ReplSetImpl22shouldChangeSyncTargetERKNS_11HostAndPortE+0x29) [0xc892f9]
 mongod(_ZN5mongo4repl14BackgroundSync22shouldChangeSyncSourceEv+0x8A) [0xbd957a]
 mongod(_ZN5mongo4repl14BackgroundSync7produceEPNS_16OperationContextE+0x3F5) [0xbdb8d5]
 mongod(_ZN5mongo4repl14BackgroundSync15_producerThreadEv+0x152) [0xbdc882]
 mongod(_ZN5mongo4repl14BackgroundSync14producerThreadEv+0x48) [0xbdc978]
 mongod(+0xB398A4) [0xf398a4]
 libpthread.so.0(+0x8182) [0x7fbad97a5182]
 libc.so.6(clone+0x6D) [0x7fbad88a5fbd]
-----  END BACKTRACE  -----



 Comments   
Comment by Eric Milkie [ 03/Oct/14 ]

This was found in a unit test failure and fixed immediately after the 2.7.7 release. Can you try with a build from master branch?

Generated at Thu Feb 08 03:38:16 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.