Details
-
Bug
-
Status: Closed
-
Blocker - P1
-
Resolution: Duplicate
-
3.0.9, 3.2.9
-
None
-
None
-
ALL
Description
Hi,
We had indicate previously in SERVER-25255 that we re having trouble upgrading.
Today we have tried upgrade again from 3.0.9 to 3.2.9 and left it so it can start up. It took about 4 hours to start :
2016-08-24T09:55:07.425+0000 I NETWORK [websvr] admin web console waiting for connections on port 28017
|
2016-08-24T09:55:07.431+0000 I REPL [initandlisten] Did not find local voted for document at startup; NoMatchingDocument: Did not find replica set lastVote document in local.replset.election
|
2016-08-24T13:58:39.180+0000 I NETWORK [HostnameCanonicalizationWorker] Starting hostname canonicalization worker
|
Unfortunately for us looks like mongo decided to disregard all the date in the oplog and cannot sync as it's to stale.
2016-08-24T14:48:43.085+0000 I REPL [ReplicationExecutor] syncing from: db-node3.dlg.ie.aws.thefloow.net:27017
|
2016-08-24T14:48:43.098+0000 W REPL [rsBackgroundSync] we are too stale to use db-node3.dlg.ie.aws.thefloow.net:27017 as a sync source
|
2016-08-24T14:48:43.098+0000 I REPL [ReplicationExecutor] syncing from: db-node2.dlg.ie.aws.thefloow.net:27017
|
2016-08-24T14:48:43.100+0000 I REPL [ReplicationExecutor] could not find member to sync from
|
2016-08-24T14:48:43.100+0000 E REPL [rsBackgroundSync] too stale to catch up -- entering maintenance mode
|
2016-08-24T14:48:43.100+0000 I REPL [rsBackgroundSync] our last optime : (term: -1, timestamp: Aug 14 16:54:26:63)
|
2016-08-24T14:48:43.100+0000 I REPL [rsBackgroundSync] oldest available is (term: -1, timestamp: Aug 14 20:54:47:1bf)
|
2016-08-24T14:48:43.100+0000 I REPL [rsBackgroundSync] See http://dochub.mongodb.org/core/resyncingaverystalereplicasetmember
|
2016-08-24T14:48:43.100+0000 I REPL [ReplicationExecutor] going into maintenance mode with 0 other maintenance mode tasks in progress
|
To be perfectly clear this node was warmed up and in production without issue before we attempted this upgrade. Oplog db size was very big as well :
db.getReplicationInfo()
|
{
|
"logSizeMB": 614400,
|
"usedMB": 117.9,
|
"timeDiff": 0,
|
"timeDiffHours": 0,
|
"tFirst": "Sun Aug 14 2016 16:54:26 GMT+0000 (UTC)",
|
"tLast": "Sun Aug 14 2016 16:54:26 GMT+0000 (UTC)",
|
"now": "Wed Aug 24 2016 14:49:15 GMT+0000 (UTC)"
|
}
|
As you can see Mongo decided somehow that oplog have to be cleared. In comparison see below same info from other replica member :
floow:PRIMARY> db.getReplicationInfo()
|
{
|
"logSizeMB" : 614400,
|
"usedMB" : 614401.13,
|
"timeDiff" : 838673,
|
"timeDiffHours" : 232.96,
|
"tFirst" : "Sun Aug 14 2016 21:54:51 GMT+0000 (UTC)",
|
"tLast" : "Wed Aug 24 2016 14:52:44 GMT+0000 (UTC)",
|
"now" : "Wed Aug 24 2016 14:52:44 GMT+0000 (UTC)"
|
}
|
I can provide log from that period but there is nothing indicating any unusual behaviour, no errors.
2016-08-24T13:58:48.827+0000 I REPL [ReplicationExecutor] syncing from: db-node2.dlg.ie.aws.thefloow.net:27017
|
2016-08-24T13:58:48.829+0000 W REPL [rsBackgroundSync] we are too stale to use db-node2.dlg.ie.aws.thefloow.net:27017 as a sync source
|
2016-08-24T13:58:48.829+0000 I REPL [ReplicationExecutor] syncing from: db-node3.dlg.ie.aws.thefloow.net:27017
|
2016-08-24T13:58:48.852+0000 W REPL [rsBackgroundSync] we are too stale to use db-node3.dlg.ie.aws.thefloow.net:27017 as a sync source
|
2016-08-24T13:58:48.852+0000 I REPL [ReplicationExecutor] could not find member to sync from
|
2016-08-24T13:58:48.852+0000 E REPL [rsBackgroundSync] too stale to catch up -- entering maintenance mode
|
2016-08-24T13:58:48.852+0000 I REPL [rsBackgroundSync] our last optime : (term: -1, timestamp: Aug 14 16:54:26:63)
|
2016-08-24T13:58:48.852+0000 I REPL [rsBackgroundSync] oldest available is (term: -1, timestamp: Aug 14 20:10:58:f)
|
In current state we cannot reliably upgrade our database to 3.2.x as this results in de-sync.
Attachments
Issue Links
- duplicates
-
SERVER-25353 Clean shutdown should leave secondaries without the batch-in-progress flag
-
- Closed
-
- is related to
-
SERVER-25255 After upgrading to 3.2.8 (from 3.0.9) startup appears to hang with lots of disk reads to the local collection.
-
- Closed
-