Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-8737

2 shards, one RS with node with revoked cert. Arbiter frequently shows two transient primaries

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Critical - P2 Critical - P2
    • None
    • Affects Version/s: None
    • Component/s: Replication
    • Labels:
    • Replication
    • ALL

      This is shard2, doing insert to shard 1.

      ubuntu@ip-10-226-183-211:~/mongodb-linux-x86_64-subscription-ubuntu1104-2.4.0-rc0$ ./bin/mongod --dbpath ./data/ --replSet rs2 --sslOnNormalPorts --sslPEMKeyFile=../gregorFreeBSD.pem --sslCAFile=../cacert.pem --sslCRLFile=../crl.pem 
      
      ubuntu@ip-10-48-130-227:~/mongodb-linux-x86_64-subscription-ubuntu1104-2.4.0-rc0$ ./bin/mongod --dbpath ./data/ --replSet rs2 --sslOnNormalPorts --sslPEMKeyFile=../revoked_gregor.pem --sslCAFile=../cacert.pem 
      
      ue Feb 26 15:24:33.260 [initandlisten] connection accepted from 10.226.183.211:40629 #104 (2 connections now open)
      Tue Feb 26 15:24:35.986 [rsMgr] replSet info two primaries (transiently)
      Tue Feb 26 15:24:40.833 [rsMgr] replSet info two primaries (transiently)
      Tue Feb 26 15:24:45.999 [rsMgr] replSet info two primaries (transiently)
      Tue Feb 26 15:24:50.835 [rsMgr] replSet info two primaries (transiently)
      Tue Feb 26 15:24:56.005 [rsMgr] replSet info two primaries (transiently)
      Tue Feb 26 15:24:57.631 [conn103] end connection 10.48.130.227:54742 (1 connection now open)
      Tue Feb 26 15:24:57.632 [initandlisten] connection accepted from 10.48.130.227:54761 #105 (2 connections now open)
      Tue Feb 26 15:25:00.008 [rsMgr] replSet info two primaries (transiently)
      Tue Feb 26 15:25:03.294 [conn104] end connection 10.226.183.211:40629 (1 connection now open)
      Tue Feb 26 15:25:03.296 [initandlisten] connection accepted from 10.226.183.211:40645 #106 (2 connections now open)
      Tue Feb 26 15:25:06.012 [rsMgr] replSet info two primaries (transiently)
      Tue Feb 26 15:25:12.016 [rsMgr] replSet info two primaries (transiently)
      Tue Feb 26 15:25:18.026 [rsMgr] replSet info two primaries (transiently)
      Tue Feb 26 15:25:24.030 [rsMgr] replSet info two primaries (transiently)
      Tue Feb 26 15:25:27.640 [conn105] end connection 10.48.130.227:54761 (1 connection now open)
      Tue Feb 26 15:25:27.641 [initandlisten] connection accepted from 10.48.130.227:54780 #107 (2 connections now open)
      Tue Feb 26 15:25:30.034 [rsMgr] replSet info two primaries (transiently)
      Tue Feb 26 15:25:33.319 [conn106] end connection 10.226.183.211:40645 (1 connection now open)
      Tue Feb 26 15:25:33.320 [initandlisten] connection accepted from 10.226.183.211:40661 #108 (2 connections now open)
      Tue Feb 26 15:25:36.037 [rsMgr] replSet info two primaries (transiently)
      Tue Feb 26 15:25:42.040 [rsMgr] replSet info two primaries (transiently)
      Tue Feb 26 15:25:48.052 [rsMgr] replSet info two primaries (transiently)
      Tue Feb 26 15:25:54.056 [rsMgr] replSet info two primaries (transiently)
      Tue Feb 26 15:25:57.650 [conn107] end connection 10.48.130.227:54780 (1 connection now open)
      Tue Feb 26 15:25:57.651 [initandlisten] connection accepted from 10.48.130.227:54798 #109 (2 connections now open)
      Tue Feb 26 15:26:00.059 [rsMgr] replSet info two primaries (transiently)
      Tue Feb 26 15:26:03.340 [conn108] end connection 10.226.183.211:40661 (1 connection now open)
      Tue Feb 26 15:26:03.341 [initandlisten] connection accepted from 10.226.183.211:40677 #110 (2 connections now open)
      Tue Feb 26 15:26:06.063 [rsMgr] replSet info two primaries (transiently)
      Tue Feb 26 15:26:12.066 [rsMgr] replSet info two primaries (transiently)
      Tue Feb 26 15:26:18.077 [rsMgr] replSet info two primaries (transiently)
      Tue Feb 26 15:26:24.081 [rsMgr] replSet info two primaries (transiently)
      Tue Feb 26 15:26:27.659 [conn109] end connection 10.48.130.227:54798 (1 connection now open)
      Tue Feb 26 15:26:27.661 [initandlisten] connection accepted from 10.48.130.227:54817 #111 (2 connections now open)
      Tue Feb 26 15:26:30.084 [rsMgr] replSet info two primaries (transiently)
      Tue Feb 26 15:26:33.364 [conn110] end connection 10.226.183.211:40677 (1 connection now open)
      Tue Feb 26 15:26:33.365 [initandlisten] connection accepted from 10.226.183.211:40692 #112 (2 connections now open)
      Tue Feb 26 15:26:36.087 [rsMgr] replSet info two primaries (transiently)
      Tue Feb 26 15:26:42.089 [rsMgr] replSet info two primaries (transiently)
      Tue Feb 26 15:26:48.099 [rsMgr] replSet info two primaries (transiently)
      Tue Feb 26 15:26:54.101 [rsMgr] replSet info two primaries (transiently)
      Tue Feb 26 15:26:57.669 [conn111] end connection 10.48.130.227:54817 (1 connection now open)
      Tue Feb 26 15:26:57.670 [initandlisten] connection accepted from 10.48.130.227:54836 #113 (2 connections now open)
      Tue Feb 26 15:27:00.104 [rsMgr] replSet info two primaries (transiently)
      Tue Feb 26 15:27:03.386 [conn112] end connection 10.226.183.211:40692 (1 connection now open)
      Tue Feb 26 15:27:03.389 [initandlisten] connection accepted from 10.226.183.211:40708 #114 (2 connections now open)
      Tue Feb 26 15:27:06.111 [rsMgr] replSet info two primaries (transiently)
      Tue Feb 26 15:27:12.114 [rsMgr] replSet info two primaries (transiently)
      Tue Feb 26 15:27:18.124 [rsMgr] replSet info two primaries (transiently)
      Tue Feb 26 15:27:24.128 [rsMgr] replSet info two primaries (transiently)
      Tue Feb 26 15:27:27.678 [conn113] end connection 10.48.130.227:54836 (1 connection now open)
      Tue Feb 26 15:27:27.679 [initandlisten] connection accepted from 10.48.130.227:54855 #115 (2 connections now open)
      Tue Feb 26 15:27:30.134 [rsMgr] replSet info two primaries (transiently)
      Tue Feb 26 15:27:33.415 [conn114] end connection 10.226.183.211:40708 (1 connection now open)
      Tue Feb 26 15:27:33.416 [initandlisten] connection accepted from 10.226.183.211:40724 #116 (2 connections now open)
      Tue Feb 26 15:27:36.138 [rsMgr] replSet info two primaries (transiently)
      Tue Feb 26 15:27:42.142 [rsMgr] replSet info two primaries (transiently)
      Tue Feb 26 15:27:48.169 [rsMgr] replSet info two primaries (transiently)
      Tue Feb 26 15:27:54.173 [rsMgr] replSet info two primaries (transiently)
      Tue Feb 26 15:27:57.688 [conn115] end connection 10.48.130.227:54855 (1 connection now open)
      Tue Feb 26 15:27:57.689 [initandlisten] connection accepted from 10.48.130.227:54874 #117 (2 connections now open)
      Tue Feb 26 15:28:00.176 [rsMgr] replSet info two primaries (transiently)
      Tue Feb 26 15:28:03.445 [conn116] end connection 10.226.183.211:40724 (1 connection now open)
      Tue Feb 26 15:28:03.448 [initandlisten] connection accepted from 10.226.183.211:40740 #118 (2 connections now open)
      Tue Feb 26 15:28:06.181 [rsMgr] replSet info two primaries (transiently)
      Tue Feb 26 15:28:12.186 [rsMgr] replSet info two primaries (transiently)
      Tue Feb 26 15:28:18.195 [rsMgr] replSet info two primaries (transiently)
      Tue Feb 26 15:28:24.199 [rsMgr] replSet info two primaries (transiently)
      
      rs2:PRIMARY> rs.status()
      {
      	"set" : "rs2",
      	"date" : ISODate("2013-02-26T15:29:58Z"),
      	"myState" : 1,
      	"members" : [
      		{
      			"_id" : 0,
      			"name" : "ip-10-226-183-211:27017",
      			"health" : 1,
      			"state" : 1,
      			"stateStr" : "PRIMARY",
      			"uptime" : 31,
      			"optime" : {
      				"t" : 1361890367000,
      				"i" : 1
      			},
      			"optimeDate" : ISODate("2013-02-26T14:52:47Z"),
      			"self" : true
      		},
      		{
      			"_id" : 1,
      			"name" : "ip-10-48-130-227:27017",
      			"health" : 0,
      			"state" : 8,
      			"stateStr" : "(not reachable/healthy)",
      			"uptime" : 0,
      			"optime" : {
      				"t" : 0,
      				"i" : 0
      			},
      			"optimeDate" : ISODate("1970-01-01T00:00:00Z"),
      			"lastHeartbeat" : ISODate("2013-02-26T15:29:32Z"),
      			"lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00Z"),
      			"pingMs" : 0,
      			"lastHeartbeatMessage" : "socket exception [CONNECT_ERROR] for "
      		},
      		{
      			"_id" : 2,
      			"name" : "ip-10-48-130-227:30000",
      			"health" : 1,
      			"state" : 7,
      			"stateStr" : "ARBITER",
      			"uptime" : 30,
      			"lastHeartbeat" : ISODate("2013-02-26T15:29:58Z"),
      			"lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00Z"),
      			"pingMs" : 17
      		}
      	],
      	"ok" : 1
      }
      rs2:PRIMARY> 
      
      mongos> sh.status()
      --- Sharding Status --- 
        sharding version: {
      	"_id" : 1,
      	"version" : 3,
      	"minCompatibleVersion" : 3,
      	"currentVersion" : 4,
      	"clusterId" : ObjectId("512cb9cece86eb598e679d38")
      }
        shards:
      	{  "_id" : "rs1",  "host" : "rs1/ip-10-227-118-139:27017,ip-10-227-197-90:27017" }
      	{  "_id" : "rs2",  "host" : "rs2/ip-10-226-183-211:27017" }
        databases:
      	{  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
      	{  "_id" : "test",  "partitioned" : true,  "primary" : "rs1" }
      		test.test
      			shard key: { "_id" : 1 }
      			chunks:
      				rs1	3
      			{ "_id" : { "$minKey" : 1 } } -->> { "_id" : ObjectId("512ca3ed2b5f8e8ebde5780f") } on : rs1 { "t" : 1000, "i" : 1 } 
      			{ "_id" : ObjectId("512ca3ed2b5f8e8ebde5780f") } -->> { "_id" : ObjectId("512cd392ba4c8d63c8f56343") } on : rs1 { "t" : 1000, "i" : 3 } 
      			{ "_id" : ObjectId("512cd392ba4c8d63c8f56343") } -->> { "_id" : { "$maxKey" : 1 } } on : rs1 { "t" : 1000, "i" : 4 } 
      
      mongos> for ( var i = 0 ; i< 10000000; i++) { db.test.insert({x: new Date(), y : i})}
      

            Assignee:
            backlog-server-repl [DO NOT USE] Backlog - Replication Team
            Reporter:
            gregor Gregor Macadam
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: