[SERVER-50137] MongoDB crashes with Invariant failure due to oplog entries generated in 3.4 Created: 06/Aug/20 Updated: 29/Oct/23 Resolved: 13/Aug/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Storage, Upgrade/Downgrade |
| Affects Version/s: | 4.4.0 |
| Fix Version/s: | 4.4.1, 4.7.0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Kin Wai Cheung | Assignee: | Gregory Noma |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | KP44 | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
dvlp |
||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||
| Operating System: | ALL | ||||||||||||
| Backport Requested: |
v4.4
|
||||||||||||
| Sprint: | Execution Team 2020-08-24 | ||||||||||||
| Participants: | |||||||||||||
| Description |
|
In
In summary: MongoDB 4.4 crashes if any documents in the oplog have not been generated by versions 3.6+. If a rapid upgrade is truly necessary, this issue can be avoided by fully resyncing all nodes after upgrading from 3.4 but before upgrading to 4.4. Note: Because sharded config server replica sets receive relatively few write operations, they can often hold older oplog entries than other replica sets. To determine if a deployment is vulnerable, run the following to inspect the oldest document in the oplog:
If this command returns a Date (any date), the deployment is not vulnerable:
If this command returns no data, then upgrading to 4.4 will fail. original descriptionHello, We have a sharded cluster with 1 member config replica set and 2 x 1 member shard replica sets with 1 mongos running in our dvlp env. We have successfully upgrade the cluster from 4.0.9 to 4.2.8 following below document https://docs.mongodb.com/manual/release-notes/4.2-upgrade-sharded-cluster But when we start the upgrade process to 4.4 from 4.2.8 we came across following issue. https://docs.mongodb.com/manual/release-notes/4.4-upgrade-sharded-cluster/#upgrade-process
while starting the config servers on 4.4.0 binaries it failed.
I looked in the mongo log for config and it errored out with invariant failure. Is there anything we else we can look for to fix this?
|
| Comments |
| Comment by Githook User [ 13/Aug/20 ] | ||||||||||||||||||||||||||||
|
Author: {'name': 'Gregory Noma', 'email': 'gregory.noma@gmail.com', 'username': 'gregorynoma'}Message: (cherry picked from commit 5be60b26dd703f8360157d074de889d0495cab6a) | ||||||||||||||||||||||||||||
| Comment by Githook User [ 13/Aug/20 ] | ||||||||||||||||||||||||||||
|
Author: {'name': 'Gregory Noma', 'email': 'gregory.noma@gmail.com', 'username': 'gregorynoma'}Message: | ||||||||||||||||||||||||||||
| Comment by Kin Wai Cheung [ 12/Aug/20 ] | ||||||||||||||||||||||||||||
|
Hi Eric, During our upgrade for our acceptance environment. We already resized our oplog size. we checked the wall field and it returned a date and this date is the same date when we execute rs.printReplicationInfo() CfgReplSet001:PRIMARY> use local
But we still faced the invariant failure so just resizing oplog didn't help us and we had to restore and add new config node for a resync with new 4.2 oplog entries.
| ||||||||||||||||||||||||||||
| Comment by Eric Sedor [ 10/Aug/20 ] | ||||||||||||||||||||||||||||
|
Hi kinwai.cheung@clarivate.com, we're glad to hear the deployments have made it to 4.4 and will be working on a fix for this in future versions. Thanks for writing in. Can you please file an additional ticket for your PeriodicShardedIndexConsistencyChecker question? | ||||||||||||||||||||||||||||
| Comment by Kin Wai Cheung [ 10/Aug/20 ] | ||||||||||||||||||||||||||||
|
Additionally It seems when we upgraded to 4.2.8 or 4.40 every 10 minutes our config log is logged with messages like below when PeriodicShardedIndexConsistencyChecker is being run. These databases are were dropped from couple of years ago to more recently by dvlp afaik. I'm finding entries in config.collections (config) or config.cached.collections (data bearing nodes) that reference already dropped db's Is this a known issue or should I create a new Jira for this?
| ||||||||||||||||||||||||||||
| Comment by Kin Wai Cheung [ 10/Aug/20 ] | ||||||||||||||||||||||||||||
|
[^config_mongod_log_20200810.gz] Hi all, Even if oplog first event did showed me a recent date on our dvlp env. on DVLP running 4.2.8 ConfigReplSet:PRIMARY> rs.printReplicationInfo() I did face the invariant issue again based on above -> not sure if it was because it already upgraded to 4.2.8 and resetted timestamp but still had the 3.2/3.4 somewhere or I messed something up. Log with verbosity 5 attached. I restored the config back to a previous states and used your workaround. With adding a new temp node , sync it which should contain 4.2 oplog entries only. Make it primary then removed the old node and adding this back again with empty dbpath for another resync and remove the temp node (stepdown) again. This made me able to run config on 4.4 binaries and also the databearing shards and router.
| ||||||||||||||||||||||||||||
| Comment by Eric Sedor [ 07/Aug/20 ] | ||||||||||||||||||||||||||||
|
Resizing the oplog is another way to remove older entries, yes. | ||||||||||||||||||||||||||||
| Comment by Kin Wai Cheung [ 07/Aug/20 ] | ||||||||||||||||||||||||||||
|
Just checked the config servers on our higher environments. We upgraded from 3.4 to 4.0.9 around may/june 2019 . And on our higher environments it still contains oplog events before that date. Also confirmed that our databearing nodes have a much more recent oplog first time event. Does resizing the oplog size for config that the first time event is after the 3.6 upgrade also solves the issue? ACC (1 member replica set) CfgReplSet001:PRIMARY> rs.printReplicationInfo()
PROD (3 member replica set) configReplSet:PRIMARY> rs.printReplicationInfo()
----------------------- on DVLP running 4.2.8 ConfigReplSet:PRIMARY> rs.printReplicationInfo() | ||||||||||||||||||||||||||||
| Comment by Eric Sedor [ 07/Aug/20 ] | ||||||||||||||||||||||||||||
|
kinwai.cheung@clarivate.com, this would apply to any node. It is less likely that data-bearing nodes in the replica sets themselves will have entries that far back, but it is entirely possible. It's a question of how quickly the node is written to, and how large those updates are. A quick way if you know the upgrade date from 3.4 is to use printReplicationInfo. As long as the first event time is after the upgrade date to 3.6 the node should be safe to upgrade to 4.4. It is also possible to inspect the oplog directly. Let us know if the above method won't work for you or if the results leave you in doubt! | ||||||||||||||||||||||||||||
| Comment by Kin Wai Cheung [ 07/Aug/20 ] | ||||||||||||||||||||||||||||
|
Thanks Eric, I'll try this on Monday. Does this also applies for our 2 shards which are both primary replica sets? Also how can I check this myself if my replica sets still has oplog entries generated from 3.4? Since we want to upgrade our higher environments too. | ||||||||||||||||||||||||||||
| Comment by Eric Sedor [ 07/Aug/20 ] | ||||||||||||||||||||||||||||
|
Hi kinwai.cheung@clarivate.com, We believe that MongoDB 4.4 is crashing due to an issue handling oplog entries generated by MongoDB 3.4. This could happen if the deployment was running on version 3.4 relatively recently, and/or wasn't receiving many writes due to low sharding-related activity. In this case, we can see from diagnostic data that the oplog has entries from as early as October 2018. We are going to look into this further. In the meantime as a workaround, we recommend adding a new node to the config server replica set, waiting for it to sync, and then promoting it to Primary. Then, you can remove the old node and try to upgrade the new node, which will only have 4.2 oplog entries. If you still see trouble after that, can you please let us know and provide the same set of information (logs at high verbosity -vvvvv, and diagnostic data)? Does this make sense? Let me know if I can answer any questions. Sincerely, | ||||||||||||||||||||||||||||
| Comment by Kin Wai Cheung [ 07/Aug/20 ] | ||||||||||||||||||||||||||||
|
I could explain this. When starting the config ( 1 node only) with 4.4 binaries I lalways hit the invariant failure. Since I was not able to 'downgrade' or start it back with previous binaries 4.2.8 due to unsupported WT version. I then tried things like repair with 4.4 binaries or 4.2.8 binaries with and without replicaset. But it still didn't let met startup config succesfully. So you should have seen messages like:
or 2020-08-06T10:20:32.891+0200 F REPL [initandlisten] This instance has been repaired and may contain modified replicated data that would not match other replica set members. To see your repaired data, start mongod without the --replSet option. When you are finished recovering your data and would like to perform a complete re-sync, please refer to the documentation here: https://docs.mongodb.com/manual/tutorial/resync-replica-set-member/ After removing the document in local.system.replset on 4.2.8 binaries without replicaset. Then I started up the config successfully ,with replset option enabled , on 4.2.8 binaries afterwards --------------------------------------- Afterwards i tried several things without success. So I restored back the config server and started back everythin on 4.2.8 binaries around 2020-08-06T14:41 as shown in the logs.
| ||||||||||||||||||||||||||||
| Comment by Bruce Lucas (Inactive) [ 07/Aug/20 ] | ||||||||||||||||||||||||||||
|
daniel.gottlieb here's a possible clue. In the log file we see multiple restarts immediately after upgrading to 4.4 that hit the invariant. Then there is a restart with repair:true. After validating multiple collections without finding errors, it then hits the following fatal error end exits:
On the subsequent restart without repair it says
from which I think we can deduce that the previous fatal NotMasterNoSlaveOk error was not a normal part of repair. Could this be the errant exception that is causing the invariant when we're running without repair, and do the logged details about the exception provide enough of a clue to understand the cause? | ||||||||||||||||||||||||||||
| Comment by Kin Wai Cheung [ 06/Aug/20 ] | ||||||||||||||||||||||||||||
|
config_mongod_log.gz | ||||||||||||||||||||||||||||
| Comment by Daniel Gottlieb (Inactive) [ 06/Aug/20 ] | ||||||||||||||||||||||||||||
Maybe there's a connection, but this error is early enough in startup that the replication and sharding systems haven't been initialized yet. Any interaction there would have to be with how data files were written out when shutting down the 4.2.8 binary. At the risk of speculating, I would think that a topology of 2 nodes + arbiter or 3 nodes are more likely to create problems for restarting a binary than a single node replica set/shard (where the majority point could be arbitrarily lagged). | ||||||||||||||||||||||||||||
| Comment by Kin Wai Cheung [ 06/Aug/20 ] | ||||||||||||||||||||||||||||
|
@Bruce you mean the mongod log of the config server? Will attach it later today. I'll also have another upgrade attempt on Monday from 4.2.8 to 4.4.0 Previous upgrades from 4.0.9 to 4.2.8 and 4.2.8 to 4.4.0 was done while Mongos and shards were up and running (with applications connecting). Could having a 1 member replicaset for config and 1 member for 2 shards, instead of 3 for each in a shared cluster, be causing this issue in a sharded cluster? As I'm actually upgrading binaries on a primary only. | ||||||||||||||||||||||||||||
| Comment by Daniel Gottlieb (Inactive) [ 06/Aug/20 ] | ||||||||||||||||||||||||||||
|
The invariant being hit. When MDB starts up, it spins off a separate thread for updating visibility. That thread comes into existence when the oplog collection is loaded and likewise goes out of existence when it's destructed. This invariant fires when we're trying to destruct the thread, but it hasn't been started. The oplogManagerCount (unsurprisingly) gets incremented to 1 when startOplogManager is called. That method is called from "postConstructorInit" after the constructor succeeds. However, if an exception is thrown prior to "starting the oplog manager" (perhaps within initializing oplog stones, which does WT reads), the code would get caught in the state we see – trying to destruct the WTRecordStore for the oplog collection before starting the oplog manager. | ||||||||||||||||||||||||||||
| Comment by Daniel Gottlieb (Inactive) [ 06/Aug/20 ] | ||||||||||||||||||||||||||||
|
I imagine this is reproducing easily everytime a 4.4.0 binary tries coming up? I'm about to follow up with a short description of what's happening, but in the meantime kinwai.cheung@clarivate.com can you run with higher verbosity logs (e.g: -vvvv). I believe an exception is being thrown while constructing the WTRecordStore for the oplog, but that's hitting some code that's not exception-safe resulting in the unrelated invariant we see. I'm hoping with the increased verbosity we'll be able to see what the original exception is. | ||||||||||||||||||||||||||||
| Comment by Bruce Lucas (Inactive) [ 06/Aug/20 ] | ||||||||||||||||||||||||||||
|
Thanks kinwai.cheung@clarivate.com. Can you please also attach the complete log file? | ||||||||||||||||||||||||||||
| Comment by Kin Wai Cheung [ 06/Aug/20 ] | ||||||||||||||||||||||||||||
|
Hi Eric, I'm attaching the diagnostic logs from this month. Please note that upgrade only started today (9 am CEST). I've already done some steps like repairing from 4.2.8/4.4.0 binaries to have the config running. Since around 2:45 pm we restored the data files on config to have it running on 4.2.8 As having it started on 4.4.0 binaries produces the error in ticket and subsequently starting it back on 4.2.8 binaries generated this error: unsupported WiredTiger file version: this build only supports versions up to 4, and the file is version 5: WT_ERROR: non-specific WiredTiger error | ||||||||||||||||||||||||||||
| Comment by Eric Sedor [ 06/Aug/20 ] | ||||||||||||||||||||||||||||
|
Thanks for writing in kinwai.cheung@clarivate.com, Would you please archive (tar or zip) the $dbpath/diagnostic.data directory for the config server? The contents are described here. Then, attach it to this ticket. Gratefully, |