[SERVER-34683] Downgrade replicaset from 3.6.4 to 3.4.14 fails due to the presence of `config.system.sessions` Created: 26/Apr/18 Updated: 29/Oct/23 Resolved: 10/May/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 3.6.4 |
| Fix Version/s: | 3.6.5 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Wojciech Sielski | Assignee: | Misha Tyulenev |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Backwards Compatibility: | Fully Compatible |
| Operating System: | ALL |
| Steps To Reproduce: | 3.4 upgrade to 3.6.4 and try to downgrade, when config.system.session exists |
| Sprint: | Sharding 2018-05-07, Sharding 2018-05-21 |
| Participants: | |
| Case: | (copied to CRM) |
| Description |
|
Hi I made an update of my RS from 3.4.1 to 3.6.4, Downgrade was not possible. Even with data removal (enforce "scratch-replication").
I have even tried to drop the collection and db config |
| Comments |
| Comment by Ramon Fernandez Marina [ 25/Jun/18 ] |
|
victorgp, robomon1, since this ticket corresponds to a specific bug and it's been closed already I'd request that you either post on mongodb-user group if you have a support-related question, or open a new SERVER ticket if you believe you've found a bug. Thanks, |
| Comment by Robert Ford [ 21/Jun/18 ] |
|
Not sure this is totally fixed in 3.6.5. I just did a rolling upgrade of a 3 node replicaset that was on 3.4 with FCV=3.4. The nodes were aws instances and it was easy enough to just wipe them out and recreate them with 3.6.5. Nodes 1 and 2 went fine. When I ran the rs.stepDown() on the primary it took more than a few seconds to select a new primary. Then the mongod service on Node 3 which was still 3.4 aborted with this error. After that I couldn't restart the service on Node 3. I finally just upgraded Node 3 to 3.6 and everything started fine. |
| Comment by VictorGP [ 19/Jun/18 ] |
|
I'm affected by this issue in a slightly different way. We downgraded fom 3.6 to 3.4 and in one of the shards of the cluster the config.system.sessions collection remained there, not in the rest. It is not in the config servers either, so what i'm trying to do is drop the collection it in that shard, but i cannot find a role or set of roles that allow me to do that. I tried the admin, restore and root roles with no luck, i always get: "not authorized on config to execute command" Do you know what permissions i need to set to perform this operation in that shard? |
| Comment by Githook User [ 10/May/18 ] |
|
Author: {'name': 'Misha Tyulenev', 'email': 'misha@mongodb.com', 'username': 'mikety'}Message: |
| Comment by Kevin Pulo [ 03/May/18 ] |
|
For users currently affected by this issue (running 3.6.4, with FCV 3.4 (either never set to 3.6, or set back to 3.4), attempting to downgrade to 3.4, and encountering these errors on the 3.4 nodes), another potential workaround is to rolling restart on 3.6.4 with --setParameter disableLogicalSessionCacheRefresh=true, and then perform the rolling downgrade to 3.4 while also removing this parameter when starting each 3.4 mongod (since it will prevent 3.4 mongod from starting). Note that this is an undocumented internal-only parameter that should not be used otherwise — while set to true it will inhibit the writes to config.system.sessions — which are normal and necessary in homogenous 3.6 replica sets, but the source of this issue in mixed 3.4/3.6 replica sets. |
| Comment by Andy Schwerin [ 01/May/18 ] |
|
This plan sounds reasonable. We should also see about improving the downgrade test coverage in the multiversion suite. |
| Comment by Randolph Tan [ 30/Apr/18 ] |
|
The approach sounds reasonable to me |
| Comment by Kaloian Manassiev [ 30/Apr/18 ] |
|
Currently (as of 3.6.4) the config.system.sessions collection is unconditionally created by the config server and all other nodes (shards and mongos) just check for its presence before attempting to write to it. In order to fix this downgrade problem, I propose that we make the following changes:
The last step still has a race condition where a stray write to the config.system.sessions collection may accidentally recreate it on the config server, so I propose to also disallow the creation of config.system.sessions at all if FCV is not 3.6. This is the general direction and some race conditions may still have to be fleshed out, but I wanted to verify that the direction sounds correct before we put more time into designing it. schwerin, renctan? With these fixes, customers who are on 3.6.4 and are unable to downgrade to 3.4 will have two options:
|
| Comment by Kaloian Manassiev [ 26/Apr/18 ] |
|
Hi sielaq, Thank you for the detailed report and for confirming that the feature compatibility version has been downgraded to 3.4. From cursory look it appears that the code which downgrades the FCV omits dropping the internal config.system.sessions collection (which is something new we introduced in 3.6 in order to support logical sessions) or it somehow gets recreated after the FCV downgrade (more likely). While we investigate this issue further, if you are blocked because of the failing downgrade, you can work around the problem by manually dropping the config.system.sessions collection. Best regards, |
| Comment by Wojciech Sielski [ 26/Apr/18 ] |
|
overtaking the coming question: yes downgrading features to 3.4 has been done - all acc to procedure |