[DOCS-12650] Manual restoration procedure for sharded clusters should mention disableLogicalSessionCacheRefresh Created: 24/Apr/19  Updated: 30/Oct/23  Resolved: 09/May/19

Status: Closed
Project: Documentation
Component/s: Server
Affects Version/s: 3.6.11
Fix Version/s: Server_Docs_20231030

Type: Bug Priority: Major - P3
Reporter: Dmitry Ryabtsev Assignee: Ravind Kumar (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Participants:
Days since reply: 4 years, 39 weeks, 6 days ago
Story Points: 0.5

 Description   

Description

Starting with MongoDB v3.6 collections are assigned an immutable unique identifier (UUID). The collection UUID remains the same across all members of a replica set and shards in a sharded cluster. That functionality comes into effect only after featureCompatibilityVersion is set to v3.6. UUID is necessary for Logical Sessions and transactions (introduced in v4.0).

What it also means is that starting with v3.6 an oplog operation can fail if the UUID of the actual collection on the server does not match the UUID of the operation. That could be interpreted as inconsistency and therefore results into an abnormal termination of the `mongod` process (the crash you have experienced).

It creates a problem with manual restoration of sharded clusters. Specifically, in step 1 we say that the member of a shard needs to be started in standalone for certain manipulations with the metadata. The problem is that:

  • If the user is restoring from an OM/CM snapshot
  • if it takes more than 5 minutes for to keep the member in standalone
    Then the config.systems collection will get created automatically on that member with UUID that will NOT match the rest of the cluster. Later it can cause crashes (see HELP-6628) and possibly some other undefined behaviour.

The solution is to make sure that when the node is started in standalone for the restore procedure, it is started with the disableLogicalSessionCacheRefresh parameter (undocumented yet) enabled.

Scope of changes

Impact to Other Docs

MVP (Work and Date)

Resources (Scope or Design Docs, Invision, etc.)



 Comments   
Comment by Ravind Kumar (Inactive) [ 09/May/19 ]

pushed to 4.0->3.6, published in turn.

Comment by Githook User [ 09/May/19 ]

Author:

{'email': 'ravind.kumar@mongodb.com', 'name': 'rk-mongo'}

Message: DOCS-12650: OM/CM manual restore requires disableLogicalSessionCacheRefresh
Branch: v3.6
https://github.com/mongodb/docs/commit/763f1abebff9bb2fd74308798c580e6987042362

Comment by Githook User [ 09/May/19 ]

Author:

{'email': 'ravind.kumar@mongodb.com', 'name': 'rk-mongo'}

Message: DOCS-12650: OM/CM manual restore requires disableLogicalSessionCacheRefresh
Branch: v4.0
https://github.com/mongodb/docs/commit/a4c0154dd3f62ac225941ae2af4125a54e52c124

Comment by Githook User [ 09/May/19 ]

Author:

{'name': 'rk-mongo', 'email': 'ravind.kumar@mongodb.com'}

Message: DOCS-12650: OM/CM manual restore requires disableLogicalSessionCacheRefresh
Branch: master
https://github.com/mongodb/docs/commit/27dcd9e8063d00400de801aba2ce48cce90aed78

Comment by Ravind Kumar (Inactive) [ 26/Apr/19 ]

SERVER-34683 indicates the flag should be available as of 3.6, so we're safe to backport once this is ready on master

Comment by Dmitry Ryabtsev [ 26/Apr/19 ]

ravind.kumar I'm not very concerned about the parameter being undocumented as long as the restoration procedure is updated for 3.6+.

am I right in thinking this only applies for CM/OM snapshots since they drop config.system.sessions and config.transactions ?

Correct. Although there is a possibility that a user may try to restore a backup taken on 3.4 (that wouldn't have the sessions collection) with the v3.6 binaries,MongoDB would start in FCV 3.4 mode which, to the best of my knowledge, will not run the sessions refresh thread and thus should be safe.

So yeah, CM/OM snapshots restores only.

Generated at Thu Feb 08 08:05:46 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.