Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Duplicate
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 1.8.1
Component/s: Replication
Labels:
None
Environment:
FreeBSD 8.2 jail

Operating System:
FreeBSD
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

I'm testing a replicaset with four mongodb 1.8.1-rc1 with each running in its own jail in FreeBSD 8.2.

If I shutdown (a clean kill) one primary (a.k.a. mongo1) and one secondary (a.k.a. mongo2), the other two secondaries (a.k.a. mongo3 & mongo4) stays running and notices that the other two went away as they should.
After restarting the mongo2 server it gets voted to become a primary. All seems to be well (we know mongo1 is still down) when you check rs.status() from mongo2:

DuegoWeb:PRIMARY> rs.status()
{
"set" : "DuegoWeb",
"date" : ISODate("2011-04-05T08:31:11Z"),
"myState" : 1,
"members" : [
{
"_id" : 0,
"name" : "mongo1.lan",
"health" : 0,
"state" : 6,
"stateStr" : "(not reachable/healthy)",
"uptime" : 0,
"optime" :

{ "t" : 0, "i" : 0 }

,
"optimeDate" : ISODate("1970-01-01T00:00:00Z"),
"lastHeartbeat" : ISODate("2011-04-05T08:31:11Z"),
"errmsg" : "not running with --replSet"
},
{
"_id" : 1,
"name" : "mongo2.lan",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"optime" :

{ "t" : 1301991284000, "i" : 1 }

,
"optimeDate" : ISODate("2011-04-05T08:14:44Z"),
"self" : true
},
{
"_id" : 2,
"name" : "mongo3.lan",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 1485,
"optime" :

{ "t" : 1301930804000, "i" : 1 }

,
"optimeDate" : ISODate("2011-04-04T15:26:44Z"),
"lastHeartbeat" : ISODate("2011-04-05T08:31:11Z")
},
{
"_id" : 3,
"name" : "mongo4.lan",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 1485,
"optime" :

{ "t" : 1301930804000, "i" : 1 }

,
"optimeDate" : ISODate("2011-04-04T15:26:44Z"),
"lastHeartbeat" : ISODate("2011-04-05T08:31:11Z")
}
],
"ok" : 1
}

However if we move to mongo3 and also run the rs.status() it says mongo2 isn't available:
{
"_id" : 1,
"name" : "mongo2.lan",
"health" : 0,
"state" : 2,
"stateStr" : "(not reachable/healthy)",
"uptime" : 0,
"optime" :

{ "t" : 1301930804000, "i" : 1 }

,
"optimeDate" : ISODate("2011-04-04T15:26:44Z"),
"lastHeartbeat" : ISODate("2011-04-05T08:30:33Z"),
"errmsg" : "not running with --replSet"
},

I find this confusing that the status() on mongo2 can say that mongo3 is ok, but not vice versa.

If we then also start up mongo1, the rs.status() on this server says all servers are ok while mongo2 still doesn't show mongo1 as being up:
DuegoWeb:SECONDARY> rs.status()
{
"set" : "DuegoWeb",
"date" : ISODate("2011-04-05T08:31:30Z"),
"myState" : 2,
"members" : [
{
"_id" : 0,
"name" : "mongo1.lan",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"optime" :

{ "t" : 1301991284000, "i" : 1 }

,
"optimeDate" : ISODate("2011-04-05T08:14:44Z"),
"self" : true
},
{
"_id" : 1,
"name" : "mongo2.lan",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 64,
"optime" :

{ "t" : 1301991284000, "i" : 1 }

,
"optimeDate" : ISODate("2011-04-05T08:14:44Z"),
"lastHeartbeat" : ISODate("2011-04-05T08:31:28Z")
},
{
"_id" : 2,
"name" : "mongo3.lan",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 64,
"optime" :

{ "t" : 1301930804000, "i" : 1 }

,
"optimeDate" : ISODate("2011-04-04T15:26:44Z"),
"lastHeartbeat" : ISODate("2011-04-05T08:31:28Z")
},
{
"_id" : 3,
"name" : "mongo4.lan",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 64,
"optime" :

{ "t" : 1301930804000, "i" : 1 }

,
"optimeDate" : ISODate("2011-04-04T15:26:44Z"),
"lastHeartbeat" : ISODate("2011-04-05T08:31:28Z")
}
],
"ok" : 1
}

The same rs.status() is still shown on mongo2 and mongo3 just like before mongo1 was started again.
Inserting data on mongo2 doesn't get replicated to mongo3, even when it says its ok and even when mongo3 seem to have been participating in voting on mongo2 for becoming the new primary.

Sorry if my example is badly explained.

I'll attach logs and all rs statuses, the order is:

start mongo1, mongo2, mongo3, mongo4
setup replicaset, check that it replicates and everything works fine, mongo1 is the primary
kill mongo1 and mongo2
mongo3 and mongo4 stays as secondaries as there are no majority
start mongo2
mongo2 gets elected as the new primary
rs.status() between mongo2 and mongo3 isn't equal. Inserting data on mongo2 doesn't show on mongo3
start mongo1
mongo1 says all servers are ok, mongo2 and mongo3 still shows the same status as before

The logs on mongo1 are +2 hours, I corrected the time on this machine later with the same results
I should also add that the replicaset is specified in a config file like this:
Mongo1:
replSet=DuegoWeb
journal=true
Mongo2:
replSet=DuegoWeb/mongo1.lan,mongo2.lan
Mongo3:
replSet=DuegoWeb/mongo3.lan,mongo1.lan
Mongo4:
replSet=DuegoWeb/mongo4.lan,mongo1.lan

Everything works and gets in sync as long as I restart the mongodb servers manually, but they never reconnect automatically

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

replicaset.tgz
24 kB
Apr 06 2011 09:26:49 AM UTC
replicaset.tgz
3 kB
Apr 05 2011 09:14:02 AM UTC

Assignee:: Kristina Chodorow (Inactive)
Reporter:: Johnny Boy
Participants:: Johnny Boy, Kristina Chodorow
Votes:: 0 Vote for this issue
Watchers:: 2 Start watching this issue

Created:: Apr 05 2011 09:14:02 AM UTC
Updated:: Feb 04 2015 08:19:43 PM UTC
Resolved:: Oct 28 2011 08:55:33 PM UTC

Details

Description

Attachments

Attachments

Activity

People

Dates