[SERVER-9292] optime lag, even with WriteConcern verified Created: 09/Apr/13  Updated: 10/Dec/14  Resolved: 07/Mar/14

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 2.4.1, 2.5.0
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Hannes Magnusson Assignee: Unassigned
Resolution: Done Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

db version v2.5.0-pre-
git version: b5aa82c6783029da7fd21480d0271731c0c770f3 (Mon Apr 8 19:32:36 2013 -0400)


Operating System: ALL
Steps To Reproduce:

db.foo.insert({doc: "ument100"});
printjson(db.runCommand({"getLastError": 1, "w": "majority"}))
printjson(rs.stepDown(10));
printjson(rs.status())

Participants:

 Description   

My primary is refusing to rs.stepDown(), saying that it cannot find a secondary within 10seconds of its optime, which is confirmed by rs.status()

However. The previous command was a GLE to majority, which infact did get replicated to all servers (according to the return value) - but the optime seemingly wasn't updated.

bjori@anden:~/Sources/mongo/mongo (master) $ ./mongo localhost:30200 script.js
MongoDB shell version: 2.5.0-pre-
connecting to: localhost:30200/test
{
	"n" : 0,
	"lastOp" : Timestamp(1365468913, 1),
	"connectionId" : 175,
	"writtenTo" : [
		{
			"_id" : 0,
			"host" : "anden.local:30200",
			"priority" : 42,
			"tags" : {
				"dc" : "ny",
				"server" : "0"
			}
		},
		{
			"_id" : 1,
			"host" : "anden.local:30201",
			"tags" : {
				"dc" : "ny",
				"server" : "1"
			}
		},
		{
			"_id" : 3,
			"host" : "anden.local:30203",
			"tags" : {
				"dc" : "sf",
				"server" : "3"
			}
		},
		{
			"_id" : 2,
			"host" : "anden.local:30202",
			"tags" : {
				"dc" : "sf",
				"server" : "2"
			}
		}
	],
	"wtime" : 1,
	"err" : null,
	"ok" : 1
}
{
	"closest" : NumberLong(1365468885),
	"difference" : NumberLong(28),
	"ok" : 0,
	"errmsg" : "no secondaries within 10 seconds of my optime"
}
{
	"set" : "REPLICASET",
	"date" : ISODate("2013-04-09T00:55:13Z"),
	"myState" : 1,
	"members" : [
		{
			"_id" : 0,
			"name" : "anden.local:30200",
			"health" : 1,
			"state" : 1,
			"stateStr" : "PRIMARY",
			"uptime" : 1035,
			"optime" : Timestamp(1365468913, 1),
			"optimeDate" : ISODate("2013-04-09T00:55:13Z"),
			"self" : true
		},
		{
			"_id" : 1,
			"name" : "anden.local:30201",
			"health" : 1,
			"state" : 2,
			"stateStr" : "SECONDARY",
			"uptime" : 1032,
			"optime" : Timestamp(1365468885, 1),
			"optimeDate" : ISODate("2013-04-09T00:54:45Z"),
			"lastHeartbeat" : ISODate("2013-04-09T00:55:12Z"),
			"lastHeartbeatRecv" : ISODate("2013-04-09T00:55:12Z"),
			"pingMs" : 0,
			"lastHeartbeatMessage" : "syncing to: anden.local:30200",
			"syncingTo" : "anden.local:30200"
		},
		{
			"_id" : 2,
			"name" : "anden.local:30202",
			"health" : 1,
			"state" : 2,
			"stateStr" : "SECONDARY",
			"uptime" : 1032,
			"optime" : Timestamp(1365468885, 1),
			"optimeDate" : ISODate("2013-04-09T00:54:45Z"),
			"lastHeartbeat" : ISODate("2013-04-09T00:55:12Z"),
			"lastHeartbeatRecv" : ISODate("2013-04-09T00:55:12Z"),
			"pingMs" : 0,
			"lastHeartbeatMessage" : "syncing to: anden.local:30200",
			"syncingTo" : "anden.local:30200"
		},
		{
			"_id" : 3,
			"name" : "anden.local:30203",
			"health" : 1,
			"state" : 2,
			"stateStr" : "SECONDARY",
			"uptime" : 1032,
			"optime" : Timestamp(1365468885, 1),
			"optimeDate" : ISODate("2013-04-09T00:54:45Z"),
			"lastHeartbeat" : ISODate("2013-04-09T00:55:12Z"),
			"lastHeartbeatRecv" : ISODate("2013-04-09T00:55:13Z"),
			"pingMs" : 0,
			"lastHeartbeatMessage" : "syncing to: anden.local:30200",
			"syncingTo" : "anden.local:30200"
		},
		{
			"_id" : 4,
			"name" : "anden.local:30204",
			"health" : 1,
			"state" : 7,
			"stateStr" : "ARBITER",
			"uptime" : 1032,
			"lastHeartbeat" : ISODate("2013-04-09T00:55:12Z"),
			"lastHeartbeatRecv" : ISODate("2013-04-09T00:55:13Z"),
			"pingMs" : 0
		}
	],
	"ok" : 1
}



 Comments   
Comment by Eric Milkie [ 31/Jul/13 ]

stepDown() was not intended to be synchronous with the ephemeral state of the replica set, since its use case is interactive. If you want to use it for automated tests, we'll have to add further state synchronization that may end up slowing down replication.

As an alternative, you can write a test wrapper that retries the stepDown command in a loop until either it succeeds or it times out after a reasonably large amount of time.

Comment by Matt Dannenberg [ 01/Jul/13 ]

If you wait a brief moment between the getlasterror and the stepdown, it will work fine. If you reverse the stepdown and rs.status() in the repro, you will see that the timestamps for the secondaries are behind the timestamp of the primary. The issue here is that the structure behind rs.status() (which is also what stepdown checks) is updated by replsetheartbeat, while getlasterror is a separate mechanism.

Comment by Hannes Magnusson [ 09/Apr/13 ]

Verified on 2.4.1 too, using the macosx binaries on mongodb.org

Generated at Thu Feb 08 03:19:58 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.