MongoDB config server will crash in a cluster which is upgrade from 6.0 version

    • Type: Bug
    • Resolution: Unresolved
    • Priority: Blocker - P1
    • None
    • Affects Version/s: 8.0.19, 8.0.20, 7.0.31
    • Component/s: None
    • None
    • ALL
    • Hide
      1. create an cluster of version  6.0.27  with a mongos 
      2. Ensure that config.mongos contains a mongos record.
      3. Upgrade the cluster to version 7.0 in the order described
      4. Connect to the cluster using mongosh, execute db.getSiblingDB("config").mongos.remove({}), and then check your config server .
      5. The config servers will likely crash and remain unrecoverable until we intervene and correct the oplog data.
      Show
      create an cluster of version  6.0.27  with a mongos  Ensure that config.mongos contains a mongos record. Upgrade the cluster to version 7.0 in the order described Connect to the cluster using mongosh, execute db.getSiblingDB("config").mongos.remove({}), and then check your config server . The config servers will likely crash and remain unrecoverable until we intervene and correct the oplog data.
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      MongoDB config server will crash and can't  start normally in a  cluster which is upgrade from 6.0 version。The crash log is as follows:

      with 7.0+ version:

      {"t":{"$date":"2026-03-19T17:15:03.886+08:00"},"s":"F",  "c":"ASSERT",   "id":23079,   "ctx":"ReplWriterWorker-2","msg":"Invariant failure","attr":{"expr":"erased","file":"src/mongo/db/s/query_analysis_coordinator.cpp","line":164}} 

      with 8.0+ version: 

      {"t":{"$date":"2026-03-17T21:21:24.897+08:00"},"s":"F",  "c":"ASSERT",   "id":23079,   "svc":"S", "ctx":"ReplWriterWorker-3","msg":"Invariant failure","attr":{"expr":"erased","file":"src/mongo/db/s/query_analysis_coordinator.cpp","line":189}} 

      The main reason for this is that QueryAnalysisCoordinator records `_samplers` when inserting documents into `config.mongos` and cleans up `_samplers` when deleting records. There's also an invariant check after `QueryAnalysisCoordinator::onSamplerDelete _samplers.erase`. However, for clusters upgraded from local versions, `config.mongos` retains information from older versions. These records haven't been inserted after the upgrade, so they're not recorded in `_samplers`. This causes `_samplers.erase` to return 0 during deletion, leading to invariant failure and process crash.

      void QueryAnalysisCoordinator::onSamplerDelete(const MongosType& doc) {
          invariant(serverGlobalParams.clusterRole.has(ClusterRole::ConfigServer));
          stdx::lock_guard<Latch> lk(_mutex);    auto erased = _samplers.erase(doc.getName());
          invariant(erased);  
      } 

      I think we need to optimize the logic and remove `invariant(erased)`.

            Assignee:
            Unassigned
            Reporter:
            FirstName lipengchong
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: