[SERVER-37339] Sharding state is set to initialized on Grid before sharding components are fully initialized Created: 27/Sep/18  Updated: 29/Oct/23  Resolved: 05/Dec/18

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.6.8, 4.0.2, 4.1.3
Fix Version/s: 3.6.11, 4.0.5, 4.1.7

Type: Bug Priority: Major - P3
Reporter: Blake Oler Assignee: Esha Maharishi (Inactive)
Resolution: Fixed Votes: 0
Labels: sharding-wfbf-day
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
is depended on by SERVER-37330 Add sharded passthrough suites to det... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.0, v3.6
Sprint: Sharding 2018-10-22, Sharding 2018-11-05, Sharding 2018-11-19, Sharding 2018-12-17
Participants:
Linked BF Score: 51

 Description   

This can trigger a crash or incorrect behavior if the sharding components are accessed before being fully initialized.

One example is that the ShardServerCatalogCacheLoader is created in initializeGlobalShardingStateForMongoD before Grid::setShardingInitialized but its ReplicaSetRole member variable is initialized later in initializeShardingEnvironmentOnShardServer after Grid::setShardingInitialized. This can lead to a crash because various methods in the ShardServerCatalogCacheLoader invariant that the ReplicaSetRole will not be ReplicaSetRole::None.

An example is shown here (and the corresponding Evergreen task).



 Comments   
Comment by Githook User [ 11/Feb/19 ]

Author:

{'name': 'Esha Maharishi', 'email': 'esha.maharishi@mongodb.com', 'username': 'EshaMaharishi'}

Message: SERVER-37339 Sharding state is set to initialized on Grid before sharding components are fully initialized
Branch: v3.6
https://github.com/mongodb/mongo/commit/d08d1e6cc029fb49b0bc137011750f7fe773e841

Comment by Githook User [ 10/Dec/18 ]

Author:

{'name': 'Esha Maharishi', 'email': 'esha.maharishi@mongodb.com', 'username': 'EshaMaharishi'}

Message: SERVER-37339 Sharding state is set to initialized on Grid before sharding components are fully initialized

(cherry picked from commit 26bad42994d1536f4d22aad47b0b537c3c5359b2)
Branch: v4.0
https://github.com/mongodb/mongo/commit/a3ed66ac2e610a2eb9dff192484f65860dda2ece

Comment by Githook User [ 05/Dec/18 ]

Author:

{'name': 'Esha Maharishi', 'email': 'esha.maharishi@mongodb.com', 'username': 'EshaMaharishi'}

Message: SERVER-37339 Sharding state is set to initialized on Grid before sharding components are fully initialized
Branch: master
https://github.com/mongodb/mongo/commit/26bad42994d1536f4d22aad47b0b537c3c5359b2

Comment by Esha Maharishi (Inactive) [ 03/Dec/18 ]

I think the real issue is that the sharding state is set to "initialized" on the Grid before the sharding components have finished being initialized.

I can repro the crash by applying the following diff, which makes the logical session cache refresh interval 1 second and adds a 10 second sleep after the call to Grid::setShardingInitialized but before CatalogCacheLoader::initializeReplicaSetRole, on base commit b12875f9e31321c247cc70515d669dd71ffbf700:

diff --git a/src/mongo/db/logical_session_cache_impl.h b/src/mongo/db/logical_session_cache_impl.h
index 4d2299a..6730a58 100644
--- a/src/mongo/db/logical_session_cache_impl.h
+++ b/src/mongo/db/logical_session_cache_impl.h
@@ -57,7 +57,7 @@ extern int logicalSessionRefreshMillis;
  */
 class LogicalSessionCacheImpl final : public LogicalSessionCache {
 public:
-    static constexpr Milliseconds kLogicalSessionDefaultRefresh = Milliseconds(5 * 60 * 1000);
+    static constexpr Milliseconds kLogicalSessionDefaultRefresh = Milliseconds(1000);
 
     /**
      * An Options type to support the LogicalSessionCacheImpl.
diff --git a/src/mongo/db/s/sharding_initialization_mongod.cpp b/src/mongo/db/s/sharding_initialization_mongod.cpp
index b24cab0..3426fdd 100644
--- a/src/mongo/db/s/sharding_initialization_mongod.cpp
+++ b/src/mongo/db/s/sharding_initialization_mongod.cpp
@@ -116,6 +116,8 @@ void initializeShardingEnvironmentOnShardServer(OperationContext* opCtx,
     initializeGlobalShardingStateForMongoD(
         opCtx, shardIdentity.getConfigsvrConnectionString(), distLockProcessId);
 
+    sleepmillis(10 * 1000);
+
     ReplicaSetMonitor::setSynchronousConfigChangeHook(
         &ShardRegistry::replicaSetChangeShardRegistryUpdateHook);
     ReplicaSetMonitor::setAsynchronousConfigChangeHook(&updateShardIdentityConfigStringCB);
diff --git a/src/mongo/shell/servers.js b/src/mongo/shell/servers.js
index 61cb853..bf28585 100644
--- a/src/mongo/shell/servers.js
+++ b/src/mongo/shell/servers.js
@@ -1096,7 +1096,7 @@ var MongoRunner, _startMongod, startMongoProgram, runMongoProgram, startMongoPro
                 }                 // Disable background cache refreshing to avoid races in tests
-                argArray.push(...['--setParameter', "disableLogicalSessionCacheRefresh=true"]);
+                //argArray.push(...['--setParameter', "disableLogicalSessionCacheRefresh=true"]);
             }
 
             // Since options may not be backward compatible, mongos options are not

and running this repro script:

(function() {
 
    let st = new ShardingTest({shards: 1});
 
    jsTest.log("Block in test, allowing time for one of the shard servers to attempt to refresh its logical session cache and crash");
    sleep(10 * 1000);
 
    st.stop();
 
})();

Note that SessionCollectionSharded::_checkCacheForSessionsCollection returns early if Grid::isShardingInitialized is false.

Generated at Thu Feb 08 04:45:42 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.