[SERVER-33714] Downgrading FCV from 3.6 to 3.4 leaves an admin.system.keys collection on shards that on upgrade is orphaned and renamed without a UUID Created: 06/Mar/18 Updated: 29/Oct/23 Resolved: 19/Apr/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 3.6.3 |
| Fix Version/s: | 3.6.5 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Xiangyu Yao (Inactive) | Assignee: | Jack Mulrow |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Steps To Reproduce: | The way to reproduce this issue is to add two lines into the test (updates_in_heterogeneous_repl_set.js):
This elects to primary the node that was originally added to the replica set as a v3.4 binary, so it never initial sync'ed admin.system.keys, and then have it run setFCV(3.6) and create the admin.system.keys collection during upgrade. |
||||||||||||||||
| Sprint: | Sharding 2018-04-23 | ||||||||||||||||
| Participants: | |||||||||||||||||
| Linked BF Score: | 45 | ||||||||||||||||
| Description |
|
This is for v3.6 only! "admin.system.keys" collection was introduced in v3.6, and a v3.4 node cannot clone it from the primary during initial sync: system collections must be white listed for cloning. So you can end up in a v3.6 and v3.4 binary replica set with FCV 3.4 where the v3.4 binaries don't have a collection that the v3.6 binaries do. This can happen on shards, but not config servers, because config servers drop the collection on downgrade, whereas shards do not. If the v3.4 binary is then upgraded to v3.6, elected primary and runs setFCV 3.6, it will create admin.system.keys, which the secondaries already have. This causes the secondary to rename the original admin.system.keys collection to a tmp collection and then create a new admin.system.keys. Now the 3.6 nodes have an orphan collection "admin.tmpxxxxx.create" without an UUID. This was caught by UUID validation code because downgrade to FCV 3.4 in the test strips the UUIDs, then upgrade to FCV 3.6 via the originally v3.4 node sends a createCollection admin.system.keys w/ UUID on the oplog to the secondaries, which already have the collection and rename their original collection w/o a UUID to admin.tmpxxxxx.create, which is left orphaned. |
| Comments |
| Comment by Githook User [ 19/Apr/18 ] |
|
Author: {'email': 'jack.mulrow@mongodb.com', 'username': 'jsmulrow', 'name': 'Jack Mulrow'}Message: |
| Comment by Dianna Hohensee (Inactive) [ 07/Mar/18 ] |
|
A solution would be to drop the admin.system.keys collection on FCV downgrade on shards as well, not just config servers as it currently works. I'm not certain why the original ticket to drop the collection on downgrade was only for config servers: |