[SERVER-33714] Downgrading FCV from 3.6 to 3.4 leaves an admin.system.keys collection on shards that on upgrade is orphaned and renamed without a UUID Created: 06/Mar/18  Updated: 29/Oct/23  Resolved: 19/Apr/18

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.6.3
Fix Version/s: 3.6.5

Type: Bug Priority: Major - P3
Reporter: Xiangyu Yao (Inactive) Assignee: Jack Mulrow
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
related to SERVER-29653 Drop admin.system.keys on CSRS downgr... Closed
is related to SERVER-33719 createCollectionForApplyOps should in... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

The way to reproduce this issue is to add two lines into the test (updates_in_heterogeneous_repl_set.js):

replTest.awaitSecondaryNodes();
 
+ replTest.stepUp(replTest.nodes[2]);
+ replTest.awaitSecondaryNodes();
 
// Set the replica set feature compatibility version to 3.6.
primary = replTest.getPrimary();

This elects to primary the node that was originally added to the replica set as a v3.4 binary, so it never initial sync'ed admin.system.keys, and then have it run setFCV(3.6) and create the admin.system.keys collection during upgrade.

Sprint: Sharding 2018-04-23
Participants:
Linked BF Score: 45

 Description   

This is for v3.6 only!

"admin.system.keys" collection was introduced in v3.6, and a v3.4 node cannot clone it from the primary during initial sync: system collections must be white listed for cloning. So you can end up in a v3.6 and v3.4 binary replica set with FCV 3.4 where the v3.4 binaries don't have a collection that the v3.6 binaries do. This can happen on shards, but not config servers, because config servers drop the collection on downgrade, whereas shards do not.

If the v3.4 binary is then upgraded to v3.6, elected primary and runs setFCV 3.6, it will create admin.system.keys, which the secondaries already have. This causes the secondary to rename the original admin.system.keys collection to a tmp collection and then create a new admin.system.keys. Now the 3.6 nodes have an orphan collection "admin.tmpxxxxx.create" without an UUID.

This was caught by UUID validation code because downgrade to FCV 3.4 in the test strips the UUIDs, then upgrade to FCV 3.6 via the originally v3.4 node sends a createCollection admin.system.keys w/ UUID on the oplog to the secondaries, which already have the collection and rename their original collection w/o a UUID to admin.tmpxxxxx.create, which is left orphaned.



 Comments   
Comment by Githook User [ 19/Apr/18 ]

Author:

{'email': 'jack.mulrow@mongodb.com', 'username': 'jsmulrow', 'name': 'Jack Mulrow'}

Message: SERVER-33714 Always drop admin.system.keys collection on downgrade
Branch: v3.6
https://github.com/mongodb/mongo/commit/5f1a21787d4b7b878259b6e6e894c95e94eb57b7

Comment by Dianna Hohensee (Inactive) [ 07/Mar/18 ]

A solution would be to drop the admin.system.keys collection on FCV downgrade on shards as well, not just config servers as it currently works. I'm not certain why the original ticket to drop the collection on downgrade was only for config servers: SERVER-29653.

Generated at Thu Feb 08 04:34:20 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.