[SERVER-36104] LogicalSessions should destroy cache on setting FCV from 3.6 to 3.4 Created: 12/Jul/18  Updated: 02/Nov/18  Resolved: 01/Nov/18

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Misha Tyulenev Assignee: Misha Tyulenev
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-36904 Fuzzer drops config.system.sessions a... Closed
is depended on by SERVER-37631 Disable logical sessions if FCV is 3.4 Closed
Related
is related to SERVER-33763 3.6 drivers fail to communicate with ... Closed
is related to SERVER-35795 3.4 secondaries crashing after some t... Closed
Sprint: Sharding 2018-09-24, Sharding 2018-10-22, Sharding 2018-11-05
Participants:

 Description   

Currently, sessions do not fully support upgrade/downgrade. In particular, the LogicalSessionsCache does not handle the scenario with FCV 3.6 downgrade to FCV 3.4.

Suggested Fix

  1. When mongod detects that FCV is set to 3.4 in 3.6 binary it will delete the sessions collection
  2. mongod will be closing all sessions except the sessions that have active operations
  3. mongos will be able to find out that the sessions collection does not exist and also will be closing all sessions that are in cache but not the part of the active operation
  4. mongod and mongos will keep accepting new sessions to mitigate the potential problem when drivers will not discover that server does not support sessions any more via checking logicalSessionsTimeoutMinutes
  5. stop setting logicalSessionsTimeoutMinutes in the isMaster response

Workaround

In the case FCV is set to 3.4 to 3.6 the sessions collection will exist and on a sharded cluster mongos will not be able to detect that it should stop sending the logicalSessionsTimeoutMinues field in the isMaster response. Hence drivers per the spec will keep creating sessions implicitly and it may lead to exceeding the maximum 1000,000 sessions limit.
To avoid it a user can consider manually deleting the config.systems.sessions collection in this scenario. Then mongos with the fix for SERVER-37631 will be able to detect the FCV change and will stop sending logicalSessionsTimeoutMinutes field.

Its critically important to be sure that the refresh thread that updates logical sessions cache is not running on the config server when dropping the config.systems.sessions collection as it may cause a config server crash per SERVER-36904



 Comments   
Comment by Misha Tyulenev [ 01/Nov/18 ]

The workaround is added to the description, THe issue will affect only sharded clusters with FCV change 3.6->3.4

Comment by Misha Tyulenev [ 01/Nov/18 ]

Not fixing this issue in 3.6 as it can be too risky to drop the collection and kill the sessions.
The fix for SERVER-37631 will enable a workaround outlined in the description

Comment by Bernie Hackett [ 24/Oct/18 ]

This sounds reasonable to me. Not returning logicalSessionsTimeoutMinutes in ismaster is the key thing for drivers.

Comment by Misha Tyulenev [ 24/Oct/18 ]

jeff.yemin behackett Please confirm that this behavior will be correct
Please let me know what tests are there to make sure that drivers are able to handle setFVC 3.4 correctly

Comment by Misha Tyulenev [ 01/Oct/18 ]

max.hirschhorn I suggest making the change in the 3.6 branch only

Comment by Max Hirschhorn [ 30/Sep/18 ]

Greg McKeon updated SERVER-36104:
---------------------------------
    Fix Version/s: 4.1 Required
                       (was: Needs Triage)

greg.mckeon, misha.tyulenev, given the "LogicalSessions should destroy cache on setting FCV from 3.6 to 3.4 " title of this ticket and the "TODO SERVER-36104" comment that exists only on the 3.6 branch, could you please clarify whether this ticket represents (a) a change to the master branch for how we'll handle the session catalog and featureCompatibilityVersion downgrades going forward - subsequently backporting to the 3.6 branch, or (b) a change only to the 3.6 branch?

Comment by Esha Maharishi (Inactive) [ 20/Jul/18 ]

Ok, putting back into Needs Triage.

Comment by Misha Tyulenev [ 20/Jul/18 ]

esha.maharishi its a nice to have feature, I have recently committed code that ignores sessions if FCV is not 3.6 - https://github.com/mongodb/mongo/commit/646d68003cadcd60fed5abaf1e92368390a4a1cb#diff-4564c6051c66e89d319ed96f26eaa7e1R276
but the correct fix would be removing sessions from the cache on downgrade, otherwise FCV 3.6 - 3.4 -3.6 change retains active sessions which is not desirable.

Generated at Thu Feb 08 04:42:03 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.