[SERVER-6136] Initial sync on new Replicaset member is connecting to Primary node when there are secondaries available Created: 20/Jun/12  Updated: 15/Feb/13  Resolved: 09/Nov/12

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 2.0.5, 2.0.6
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Luciano Issoe Assignee: Kristina Chodorow (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Ubuntu 12.0.4 LTS, mongodb 2.0.5 and 2.0.6


Operating System: Linux
Participants:

 Description   

Replicaset with 3 nodes using mongodb 2.0.5 in Ubuntu LTS 12.0.4. When adding fourth node with mongodb 2.0.6, the initial sync chooses to connect to the primary, degrading performance for the replicaset.



 Comments   
Comment by Kristina Chodorow (Inactive) [ 09/Nov/12 ]

This is "works as designed," a new node chooses who to sync from based on the pingMs field. Since all of your pingMs values are essentially the same, it'll essentially choose a random member. You can force it to change who it's syncing from using the replSetSyncFrom command (http://docs.mongodb.org/manual/reference/command/replSetSyncFrom/), but there are only certain points in the initial sync process when it's willing to change sync targets.

Comment by Luciano Issoe [ 21/Jun/12 ]

By the way, I solved the performance problem with a >rs.stepDown() on the Primary, which also restarted the initial sync on the new node. Not a fancy solution and also very hard to figure.

Comment by Luciano Issoe [ 21/Jun/12 ]

Sorry Eliot, but I don“t remember which was it.

The configuration changed a lot since yesterday. There are now 6 nodes on our RS, 3 are primary eligible, 2 are read only w/ specialized indexes for aggregation and 1 is a backup w/o any indexes and journaling enabled with data in a single EBS volume for snapshots.

But when the problem happened, a fresh node was added and the initial sync was pulling data from the Primary instead of pulling from any of the other 2 Secondaries. This pressed the Primary and the performance was seriously degraded.

I hope that's a useful for you.

Comment by Luciano Issoe [ 21/Jun/12 ]

"set" : "nick",
"date" : ISODate("2012-06-21T02:08:25Z"),
"myState" : 2,
"syncingTo" : "ec2-23-22-220-87.compute-1.amazonaws.com:27017",
"members" : [
{
"_id" : 1,
"name" : "ec2-50-19-136-144.compute-1.amazonaws.com:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 322,
"optime" :

{ "t" : 1340244503000, "i" : 296 }

,
"optimeDate" : ISODate("2012-06-21T02:08:23Z"),
"lastHeartbeat" : ISODate("2012-06-21T02:08:23Z"),
"pingMs" : 0
},
{
"_id" : 2,
"name" : "ec2-107-22-22-142.compute-1.amazonaws.com:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 322,
"optime" :

{ "t" : 1340220007000, "i" : 43 }

,
"optimeDate" : ISODate("2012-06-20T19:20:07Z"),
"lastHeartbeat" : ISODate("2012-06-21T02:08:23Z"),
"pingMs" : 0
},
{
"_id" : 3,
"name" : "ec2-23-22-220-87.compute-1.amazonaws.com:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 322,
"optime" :

{ "t" : 1340244503000, "i" : 297 }

,
"optimeDate" : ISODate("2012-06-21T02:08:23Z"),
"lastHeartbeat" : ISODate("2012-06-21T02:08:23Z"),
"pingMs" : 1
},
{
"_id" : 5,
"name" : "ec2-23-22-197-179.compute-1.amazonaws.com:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"optime" :

{ "t" : 1340211993000, "i" : 314 }

,
"optimeDate" : ISODate("2012-06-20T17:06:33Z"),
"self" : true
},
{
"_id" : 20,
"name" : "ec2-184-73-27-46.compute-1.amazonaws.com:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 322,
"optime" :

{ "t" : 1340244504000, "i" : 225 }

,
"optimeDate" : ISODate("2012-06-21T02:08:24Z"),
"lastHeartbeat" : ISODate("2012-06-21T02:08:24Z"),
"pingMs" : 0
},
{
"_id" : 22,
"name" : "ec2-50-19-28-92.compute-1.amazonaws.com:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 322,
"optime" :

{ "t" : 1340244503000, "i" : 293 }

,
"optimeDate" : ISODate("2012-06-21T02:08:23Z"),
"lastHeartbeat" : ISODate("2012-06-21T02:08:23Z"),
"pingMs" : 0
}

Comment by Eliot Horowitz (Inactive) [ 21/Jun/12 ]

can you send rs.status() and which nodes you're referring to.

Generated at Thu Feb 08 03:10:51 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.