-
Type: Bug
-
Resolution: Duplicate
-
Priority: Blocker - P1
-
None
-
Affects Version/s: 3.0.9, 3.2.9
-
Component/s: Admin, Replication, WiredTiger
-
None
-
ALL
Hi,
We had indicate previously in SERVER-25255 that we re having trouble upgrading.
Today we have tried upgrade again from 3.0.9 to 3.2.9 and left it so it can start up. It took about 4 hours to start :
2016-08-24T09:55:07.425+0000 I NETWORK [websvr] admin web console waiting for connections on port 28017 2016-08-24T09:55:07.431+0000 I REPL [initandlisten] Did not find local voted for document at startup; NoMatchingDocument: Did not find replica set lastVote document in local.replset.election 2016-08-24T13:58:39.180+0000 I NETWORK [HostnameCanonicalizationWorker] Starting hostname canonicalization worker
Unfortunately for us looks like mongo decided to disregard all the date in the oplog and cannot sync as it's to stale.
2016-08-24T14:48:43.085+0000 I REPL [ReplicationExecutor] syncing from: db-node3.dlg.ie.aws.thefloow.net:27017 2016-08-24T14:48:43.098+0000 W REPL [rsBackgroundSync] we are too stale to use db-node3.dlg.ie.aws.thefloow.net:27017 as a sync source 2016-08-24T14:48:43.098+0000 I REPL [ReplicationExecutor] syncing from: db-node2.dlg.ie.aws.thefloow.net:27017 2016-08-24T14:48:43.100+0000 I REPL [ReplicationExecutor] could not find member to sync from 2016-08-24T14:48:43.100+0000 E REPL [rsBackgroundSync] too stale to catch up -- entering maintenance mode 2016-08-24T14:48:43.100+0000 I REPL [rsBackgroundSync] our last optime : (term: -1, timestamp: Aug 14 16:54:26:63) 2016-08-24T14:48:43.100+0000 I REPL [rsBackgroundSync] oldest available is (term: -1, timestamp: Aug 14 20:54:47:1bf) 2016-08-24T14:48:43.100+0000 I REPL [rsBackgroundSync] See http://dochub.mongodb.org/core/resyncingaverystalereplicasetmember 2016-08-24T14:48:43.100+0000 I REPL [ReplicationExecutor] going into maintenance mode with 0 other maintenance mode tasks in progress
To be perfectly clear this node was warmed up and in production without issue before we attempted this upgrade. Oplog db size was very big as well :
db.getReplicationInfo() { "logSizeMB": 614400, "usedMB": 117.9, "timeDiff": 0, "timeDiffHours": 0, "tFirst": "Sun Aug 14 2016 16:54:26 GMT+0000 (UTC)", "tLast": "Sun Aug 14 2016 16:54:26 GMT+0000 (UTC)", "now": "Wed Aug 24 2016 14:49:15 GMT+0000 (UTC)" }
As you can see Mongo decided somehow that oplog have to be cleared. In comparison see below same info from other replica member :
floow:PRIMARY> db.getReplicationInfo() { "logSizeMB" : 614400, "usedMB" : 614401.13, "timeDiff" : 838673, "timeDiffHours" : 232.96, "tFirst" : "Sun Aug 14 2016 21:54:51 GMT+0000 (UTC)", "tLast" : "Wed Aug 24 2016 14:52:44 GMT+0000 (UTC)", "now" : "Wed Aug 24 2016 14:52:44 GMT+0000 (UTC)" }
I can provide log from that period but there is nothing indicating any unusual behaviour, no errors.
2016-08-24T13:58:48.827+0000 I REPL [ReplicationExecutor] syncing from: db-node2.dlg.ie.aws.thefloow.net:27017 2016-08-24T13:58:48.829+0000 W REPL [rsBackgroundSync] we are too stale to use db-node2.dlg.ie.aws.thefloow.net:27017 as a sync source 2016-08-24T13:58:48.829+0000 I REPL [ReplicationExecutor] syncing from: db-node3.dlg.ie.aws.thefloow.net:27017 2016-08-24T13:58:48.852+0000 W REPL [rsBackgroundSync] we are too stale to use db-node3.dlg.ie.aws.thefloow.net:27017 as a sync source 2016-08-24T13:58:48.852+0000 I REPL [ReplicationExecutor] could not find member to sync from 2016-08-24T13:58:48.852+0000 E REPL [rsBackgroundSync] too stale to catch up -- entering maintenance mode 2016-08-24T13:58:48.852+0000 I REPL [rsBackgroundSync] our last optime : (term: -1, timestamp: Aug 14 16:54:26:63) 2016-08-24T13:58:48.852+0000 I REPL [rsBackgroundSync] oldest available is (term: -1, timestamp: Aug 14 20:10:58:f)
In current state we cannot reliably upgrade our database to 3.2.x as this results in de-sync.
- duplicates
-
SERVER-25353 Clean shutdown should leave secondaries without the batch-in-progress flag
- Closed
- is related to
-
SERVER-25255 After upgrading to 3.2.8 (from 3.0.9) startup appears to hang with lots of disk reads to the local collection.
- Closed