-
Type: Bug
-
Resolution: Done
-
Priority: Major - P3
-
Affects Version/s: 1.7.1
-
Component/s: Replication
-
None
-
Environment:Ubuntu lucid x86-64 mongodb compiled from src
-
ALL
In the case of a 2 member + 1 arbiter replica set: e.g:
> rs.isMaster()
{
"setName" : "rep0",
"ismaster" : true,
"secondary" : false,
"hosts" : [
"192.168.2.63:27017",
"192.168.2.63:27018"
],
"arbiters" : [
"192.168.2.63:27019"
],
"ok" : 1
}
Consider the following:
1/ kill the member on port 27017, so 27018 becomes the master:
2/ connect to the member on port 27018 and do some updates to a collection
3/ fsync to be sure the changes are saved to disk
4/ startup the member on port 27017 again
5/ connect to the member on port 27017 and observe that the previous changes have been rolled back
I (possibly wrongly) was under the impression that optime was a factor in deciding who was to be primary, but clearly in this case the arbiter is voting for the most behind member to be primary when it comes up.
Log information from member on port 27018 - notes other member down and becomes primary, also see mention that changes are fsync'ed:
Fri Oct 22 15:05:53 [rs_sync] replSet syncThread: 10278 dbclient error communicating with server
Fri Oct 22 15:05:53 [ReplSetHealthPollTask] replSet info 192.168.2.63:27017 is now down (or slow to respond)
Fri Oct 22 15:05:53 [rs Manager] replSet info electSelf 1
Fri Oct 22 15:05:53 [rs Manager] replSet PRIMARY
Fri Oct 22 15:06:18 [initandlisten] connection accepted from 127.0.0.1:33896 #4
Fri Oct 22 15:07:01 [conn4] end connection 127.0.0.1:33896
Fri Oct 22 15:08:28 [initandlisten] connection accepted from 127.0.0.1:34029 #5
Fri Oct 22 15:09:18 [conn5] command denied:
Fri Oct 22 15:09:23 [conn5] CMD fsync: sync:1 lock:0
Fri Oct 22 15:09:26 [conn5] end connection 127.0.0.1:34029
... now detects member on port 27017 is up, which is then chosen as primary, so we roll back our previous work:
Fri Oct 22 15:11:51 [ReplSetHealthPollTask] replSet info 192.168.2.63:27017 is now up
Fri Oct 22 15:11:52 [initandlisten] connection accepted from 192.168.2.63:47618 #7
Fri Oct 22 15:11:54 [rs_sync] replSet we are ahead of the primary, will try to roll back
Fri Oct 22 15:11:54 [rs_sync] replSet rollback 0
Fri Oct 22 15:11:54 [rs_sync] replSet rollback 1
Fri Oct 22 15:11:54 [rs_sync] replSet rollback 2 FindCommonPoint
Fri Oct 22 15:11:54 [rs_sync] replSet info rollback our last optime: Oct 22 15:08:34:3e7
Fri Oct 22 15:11:54 [rs_sync] replSet info rollback their last optime: Oct 22 14:18:35:218e
Fri Oct 22 15:11:54 [rs_sync] replSet info rollback diff in end of log times: 2999 seconds