Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-90834

Logical session refresh may recreate sessions collection with mismatching UUID

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 8.1.0-rc0, 8.0.0-rc8
    • Affects Version/s: None
    • Component/s: None
    • None
    • Catalog and Routing
    • Fully Compatible
    • ALL
    • v8.0
    • Hide

      Comment out this return to simulate a refresh failing for a reason other than the sessions collection not existing, e.g. interrupted by a stepdown.

      Then run this test (the test file must have a name that sorts after "config"):

        // Note this requires a jstest name that is sorted "after" the shard id "config" because that is
        // used as the shard name for the non config shard and this relies on the the "config" shard being
        // the "first" shard in a sorted order.
        const st = new ShardingTest({shards: 2, configShard: true});
      
        const configShardName = st.shard0.shardName;
        const otherShardName = st.shard1.shardName;
      
        //
        // Transition config shard to dedicated mode, which requires draining config.system.sessions.
        //
      
        let removeRes = assert.commandWorked(st.s.adminCommand({transitionToDedicatedConfigServer: 1}));
        assert.eq(removeRes.state, "started", tojson(removeRes));
      
        assert.commandWorked(st.s.adminCommand({
            moveChunk: "config.system.sessions",
            find: {_id: 0},
            to: otherShardName,
            _waitForDelete: true
        }));
      
        removeRes = assert.commandWorked(st.s.adminCommand({transitionToDedicatedConfigServer: 1}));
        assert.eq(removeRes.state, "completed", tojson(removeRes));
      
        //
        // Transition back to config shard mode and trigger a session cache refresh, which will fail with
        // InvalidUUID and create a new local config.system.sessions collection on the config server.
        //
      
        assert.commandWorked(st.s.adminCommand({transitionFromDedicatedConfigServer: 1}));
      
        // Fails with InvalidUUID.
        assert.commandWorked(st.configRS.getPrimary().adminCommand({refreshLogicalSessionCacheNow: 1}));
      
        st.stop();
      
      Show
      Comment out this return to simulate a refresh failing for a reason other than the sessions collection not existing, e.g. interrupted by a stepdown. Then run this test (the test file must have a name that sorts after "config"): // Note this requires a jstest name that is sorted "after" the shard id "config" because that is // used as the shard name for the non config shard and this relies on the the "config" shard being // the "first" shard in a sorted order. const st = new ShardingTest({shards: 2, configShard: true}); const configShardName = st.shard0.shardName; const otherShardName = st.shard1.shardName; // // Transition config shard to dedicated mode, which requires draining config.system.sessions. // let removeRes = assert.commandWorked(st.s.adminCommand({transitionToDedicatedConfigServer: 1})); assert.eq(removeRes.state, "started", tojson(removeRes)); assert.commandWorked(st.s.adminCommand({ moveChunk: "config.system.sessions", find: {_id: 0}, to: otherShardName, _waitForDelete: true })); removeRes = assert.commandWorked(st.s.adminCommand({transitionToDedicatedConfigServer: 1})); assert.eq(removeRes.state, "completed", tojson(removeRes)); // // Transition back to config shard mode and trigger a session cache refresh, which will fail with // InvalidUUID and create a new local config.system.sessions collection on the config server. // assert.commandWorked(st.s.adminCommand({transitionFromDedicatedConfigServer: 1})); // Fails with InvalidUUID. assert.commandWorked(st.configRS.getPrimary().adminCommand({refreshLogicalSessionCacheNow: 1})); st.stop();
    • CAR Team 2024-06-10
    • 200

      Refreshing the logical session cache on a config server node will first check if the collection exists in the sharding catalog, and if this throws any error, refresh will try to create the collection by sending _shardsvrCreateCollection to the "first" shard in sorted order. This simulates the protocol for sending DDL operations to the primary shard for a database, but config.system.sessions exists in the "config" database, which doesn't have a "true" primary shard (ie no entry in config.databases for it). If the "first" shard was newly added to the cluster, it may not have the config.system.sessions collection locally, so if session refresh sends it _shardsvrCreateCollection (which may happen after any error today, e.g. InterruptedDueToReplStateChange), it may try to create the collection, despite it already existing on other shards. This always seems to fail sharding the collection, but can leave an untracked version of the collection on the "first" shard with a different UUID than the real, tracked sessions collection.

            Assignee:
            jordi.serra-torrens@mongodb.com Jordi Serra Torrens
            Reporter:
            jack.mulrow@mongodb.com Jack Mulrow
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: