[SERVER-8737] 2 shards, one RS with node with revoked cert. Arbiter frequently shows two transient primaries Created: 26/Feb/13  Updated: 06/Dec/22  Resolved: 23/Aug/18

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: Gregor Macadam Assignee: Backlog - Replication Team
Resolution: Done Votes: 0
Labels: elections
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-8738 Cannot balance while target shard has... Closed
Assigned Teams:
Replication
Operating System: ALL
Participants:

 Description   

This is shard2, doing insert to shard 1.

ubuntu@ip-10-226-183-211:~/mongodb-linux-x86_64-subscription-ubuntu1104-2.4.0-rc0$ ./bin/mongod --dbpath ./data/ --replSet rs2 --sslOnNormalPorts --sslPEMKeyFile=../gregorFreeBSD.pem --sslCAFile=../cacert.pem --sslCRLFile=../crl.pem 

ubuntu@ip-10-48-130-227:~/mongodb-linux-x86_64-subscription-ubuntu1104-2.4.0-rc0$ ./bin/mongod --dbpath ./data/ --replSet rs2 --sslOnNormalPorts --sslPEMKeyFile=../revoked_gregor.pem --sslCAFile=../cacert.pem 

ue Feb 26 15:24:33.260 [initandlisten] connection accepted from 10.226.183.211:40629 #104 (2 connections now open)
Tue Feb 26 15:24:35.986 [rsMgr] replSet info two primaries (transiently)
Tue Feb 26 15:24:40.833 [rsMgr] replSet info two primaries (transiently)
Tue Feb 26 15:24:45.999 [rsMgr] replSet info two primaries (transiently)
Tue Feb 26 15:24:50.835 [rsMgr] replSet info two primaries (transiently)
Tue Feb 26 15:24:56.005 [rsMgr] replSet info two primaries (transiently)
Tue Feb 26 15:24:57.631 [conn103] end connection 10.48.130.227:54742 (1 connection now open)
Tue Feb 26 15:24:57.632 [initandlisten] connection accepted from 10.48.130.227:54761 #105 (2 connections now open)
Tue Feb 26 15:25:00.008 [rsMgr] replSet info two primaries (transiently)
Tue Feb 26 15:25:03.294 [conn104] end connection 10.226.183.211:40629 (1 connection now open)
Tue Feb 26 15:25:03.296 [initandlisten] connection accepted from 10.226.183.211:40645 #106 (2 connections now open)
Tue Feb 26 15:25:06.012 [rsMgr] replSet info two primaries (transiently)
Tue Feb 26 15:25:12.016 [rsMgr] replSet info two primaries (transiently)
Tue Feb 26 15:25:18.026 [rsMgr] replSet info two primaries (transiently)
Tue Feb 26 15:25:24.030 [rsMgr] replSet info two primaries (transiently)
Tue Feb 26 15:25:27.640 [conn105] end connection 10.48.130.227:54761 (1 connection now open)
Tue Feb 26 15:25:27.641 [initandlisten] connection accepted from 10.48.130.227:54780 #107 (2 connections now open)
Tue Feb 26 15:25:30.034 [rsMgr] replSet info two primaries (transiently)
Tue Feb 26 15:25:33.319 [conn106] end connection 10.226.183.211:40645 (1 connection now open)
Tue Feb 26 15:25:33.320 [initandlisten] connection accepted from 10.226.183.211:40661 #108 (2 connections now open)
Tue Feb 26 15:25:36.037 [rsMgr] replSet info two primaries (transiently)
Tue Feb 26 15:25:42.040 [rsMgr] replSet info two primaries (transiently)
Tue Feb 26 15:25:48.052 [rsMgr] replSet info two primaries (transiently)
Tue Feb 26 15:25:54.056 [rsMgr] replSet info two primaries (transiently)
Tue Feb 26 15:25:57.650 [conn107] end connection 10.48.130.227:54780 (1 connection now open)
Tue Feb 26 15:25:57.651 [initandlisten] connection accepted from 10.48.130.227:54798 #109 (2 connections now open)
Tue Feb 26 15:26:00.059 [rsMgr] replSet info two primaries (transiently)
Tue Feb 26 15:26:03.340 [conn108] end connection 10.226.183.211:40661 (1 connection now open)
Tue Feb 26 15:26:03.341 [initandlisten] connection accepted from 10.226.183.211:40677 #110 (2 connections now open)
Tue Feb 26 15:26:06.063 [rsMgr] replSet info two primaries (transiently)
Tue Feb 26 15:26:12.066 [rsMgr] replSet info two primaries (transiently)
Tue Feb 26 15:26:18.077 [rsMgr] replSet info two primaries (transiently)
Tue Feb 26 15:26:24.081 [rsMgr] replSet info two primaries (transiently)
Tue Feb 26 15:26:27.659 [conn109] end connection 10.48.130.227:54798 (1 connection now open)
Tue Feb 26 15:26:27.661 [initandlisten] connection accepted from 10.48.130.227:54817 #111 (2 connections now open)
Tue Feb 26 15:26:30.084 [rsMgr] replSet info two primaries (transiently)
Tue Feb 26 15:26:33.364 [conn110] end connection 10.226.183.211:40677 (1 connection now open)
Tue Feb 26 15:26:33.365 [initandlisten] connection accepted from 10.226.183.211:40692 #112 (2 connections now open)
Tue Feb 26 15:26:36.087 [rsMgr] replSet info two primaries (transiently)
Tue Feb 26 15:26:42.089 [rsMgr] replSet info two primaries (transiently)
Tue Feb 26 15:26:48.099 [rsMgr] replSet info two primaries (transiently)
Tue Feb 26 15:26:54.101 [rsMgr] replSet info two primaries (transiently)
Tue Feb 26 15:26:57.669 [conn111] end connection 10.48.130.227:54817 (1 connection now open)
Tue Feb 26 15:26:57.670 [initandlisten] connection accepted from 10.48.130.227:54836 #113 (2 connections now open)
Tue Feb 26 15:27:00.104 [rsMgr] replSet info two primaries (transiently)
Tue Feb 26 15:27:03.386 [conn112] end connection 10.226.183.211:40692 (1 connection now open)
Tue Feb 26 15:27:03.389 [initandlisten] connection accepted from 10.226.183.211:40708 #114 (2 connections now open)
Tue Feb 26 15:27:06.111 [rsMgr] replSet info two primaries (transiently)
Tue Feb 26 15:27:12.114 [rsMgr] replSet info two primaries (transiently)
Tue Feb 26 15:27:18.124 [rsMgr] replSet info two primaries (transiently)
Tue Feb 26 15:27:24.128 [rsMgr] replSet info two primaries (transiently)
Tue Feb 26 15:27:27.678 [conn113] end connection 10.48.130.227:54836 (1 connection now open)
Tue Feb 26 15:27:27.679 [initandlisten] connection accepted from 10.48.130.227:54855 #115 (2 connections now open)
Tue Feb 26 15:27:30.134 [rsMgr] replSet info two primaries (transiently)
Tue Feb 26 15:27:33.415 [conn114] end connection 10.226.183.211:40708 (1 connection now open)
Tue Feb 26 15:27:33.416 [initandlisten] connection accepted from 10.226.183.211:40724 #116 (2 connections now open)
Tue Feb 26 15:27:36.138 [rsMgr] replSet info two primaries (transiently)
Tue Feb 26 15:27:42.142 [rsMgr] replSet info two primaries (transiently)
Tue Feb 26 15:27:48.169 [rsMgr] replSet info two primaries (transiently)
Tue Feb 26 15:27:54.173 [rsMgr] replSet info two primaries (transiently)
Tue Feb 26 15:27:57.688 [conn115] end connection 10.48.130.227:54855 (1 connection now open)
Tue Feb 26 15:27:57.689 [initandlisten] connection accepted from 10.48.130.227:54874 #117 (2 connections now open)
Tue Feb 26 15:28:00.176 [rsMgr] replSet info two primaries (transiently)
Tue Feb 26 15:28:03.445 [conn116] end connection 10.226.183.211:40724 (1 connection now open)
Tue Feb 26 15:28:03.448 [initandlisten] connection accepted from 10.226.183.211:40740 #118 (2 connections now open)
Tue Feb 26 15:28:06.181 [rsMgr] replSet info two primaries (transiently)
Tue Feb 26 15:28:12.186 [rsMgr] replSet info two primaries (transiently)
Tue Feb 26 15:28:18.195 [rsMgr] replSet info two primaries (transiently)
Tue Feb 26 15:28:24.199 [rsMgr] replSet info two primaries (transiently)

rs2:PRIMARY> rs.status()
{
	"set" : "rs2",
	"date" : ISODate("2013-02-26T15:29:58Z"),
	"myState" : 1,
	"members" : [
		{
			"_id" : 0,
			"name" : "ip-10-226-183-211:27017",
			"health" : 1,
			"state" : 1,
			"stateStr" : "PRIMARY",
			"uptime" : 31,
			"optime" : {
				"t" : 1361890367000,
				"i" : 1
			},
			"optimeDate" : ISODate("2013-02-26T14:52:47Z"),
			"self" : true
		},
		{
			"_id" : 1,
			"name" : "ip-10-48-130-227:27017",
			"health" : 0,
			"state" : 8,
			"stateStr" : "(not reachable/healthy)",
			"uptime" : 0,
			"optime" : {
				"t" : 0,
				"i" : 0
			},
			"optimeDate" : ISODate("1970-01-01T00:00:00Z"),
			"lastHeartbeat" : ISODate("2013-02-26T15:29:32Z"),
			"lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00Z"),
			"pingMs" : 0,
			"lastHeartbeatMessage" : "socket exception [CONNECT_ERROR] for "
		},
		{
			"_id" : 2,
			"name" : "ip-10-48-130-227:30000",
			"health" : 1,
			"state" : 7,
			"stateStr" : "ARBITER",
			"uptime" : 30,
			"lastHeartbeat" : ISODate("2013-02-26T15:29:58Z"),
			"lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00Z"),
			"pingMs" : 17
		}
	],
	"ok" : 1
}
rs2:PRIMARY> 

mongos> sh.status()
--- Sharding Status --- 
  sharding version: {
	"_id" : 1,
	"version" : 3,
	"minCompatibleVersion" : 3,
	"currentVersion" : 4,
	"clusterId" : ObjectId("512cb9cece86eb598e679d38")
}
  shards:
	{  "_id" : "rs1",  "host" : "rs1/ip-10-227-118-139:27017,ip-10-227-197-90:27017" }
	{  "_id" : "rs2",  "host" : "rs2/ip-10-226-183-211:27017" }
  databases:
	{  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
	{  "_id" : "test",  "partitioned" : true,  "primary" : "rs1" }
		test.test
			shard key: { "_id" : 1 }
			chunks:
				rs1	3
			{ "_id" : { "$minKey" : 1 } } -->> { "_id" : ObjectId("512ca3ed2b5f8e8ebde5780f") } on : rs1 { "t" : 1000, "i" : 1 } 
			{ "_id" : ObjectId("512ca3ed2b5f8e8ebde5780f") } -->> { "_id" : ObjectId("512cd392ba4c8d63c8f56343") } on : rs1 { "t" : 1000, "i" : 3 } 
			{ "_id" : ObjectId("512cd392ba4c8d63c8f56343") } -->> { "_id" : { "$maxKey" : 1 } } on : rs1 { "t" : 1000, "i" : 4 } 

mongos> for ( var i = 0 ; i< 10000000; i++) { db.test.insert({x: new Date(), y : i})}



 Comments   
Comment by Spencer Brody (Inactive) [ 23/Aug/18 ]

These behaviors have been changed with replication protocolVersion:1

Comment by Eric Milkie [ 11/Mar/13 ]

This is basically simulating an asymmetric network. We could do better here.

Generated at Thu Feb 08 03:18:17 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.