Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-68511

movePrimary might introduce sharding metadata inconsistency in MongoDB 5.0+

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Blocker - P1 Blocker - P1
    • 6.0.1, 5.0.11, 6.1.0-rc0
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None
    • Fully Compatible
    • v6.0, v5.0
    • Sharding EMEA 2022-08-08

      ​​

      Issue Status as of Aug 10, 2022

      ISSUE DESCRIPTION AND IMPACT

      In MongoDB 5.0.0-5.0.10 and 6.0.0, when running featureCompatibilityVersion 5.0+; the movePrimary command can cause inconsistent sharding metadata when the target database for the command was created while under featureCompatibilityVersion 4.4 or earlier.

      This issue is fixed in MongoDB 5.0.11 and 6.0.1.

      As a result, after a movePrimary operation:

      • The source shard of the movePrimary operation (the original primary shard) remains the primary shard.
      • Documents in unsharded collections inserted or updated prior to movePrimary are located on the destination shard.
      • Documents in unsharded collections inserted or updated after the movePrimary operation are located on the source shard.
      • The source shard will not have any secondary (non-_id) indexes or non-default collection options. Secondary uniqueness constraints will not be enforced in unsharded collections.
      • Write operations will not be properly routed to preexisting documents in unsharded collections, causing updates and deletes to appear successful without matching and modifying their targets.

      The movePrimary command performs an update to the config.databases collection to complete the operation of changing a database's primary shard. This issue occurs because the update filter for this query checks for equivalency to a subdocument, instead of each field in the subdocument individually using dotted notation. This is an incorrect practice and prevents the update from matching (and therefore updating) the necessary metadata due to the following differences in how metadata is stored and managed:

      • Under featureCompatibilityVersion 4.4 new database metadata documents are created with a version field containing, in order: uuid and lastMod
      • The setFeatureCompatibilityVersion command, when moving from 4.4 to 5.0, updates all existing database metadata documents with a new version format. The result of this update is a version field containing, in order: uuid, lastMod, and timestamp
      • Under featureCompatibilityVersion 5.0, new database metadata documents are created with a version containing, in order: uuid, timestamp, and lastMod

      The movePrimary command's improper use of a document that matches the FCV 5.0 version format causes updates to miss metadata documents that were converted during setFeatureCompatibilityVersion from earlier versions.

      DIAGNOSIS

      If a database in a sharded cluster was created on MongoDB 4.4 or earlier, and the cluster is currently running MongoDB versions 5.0.0-5.0.10 or 6.0.0, the cluster is vulnerable and likely to be impacted during a movePrimary command.

      Signs a cluster has been impacted include one or more of the following:

      • The database was created on MongoDB version 4.4 or earlier and you ran movePrimary on MongoDB 5.0 or 6.0 while running FCV 5.0 or 6.0.
      • The primary shard for a database does not have a complete version of unsharded data on it.
      • Unsharded collections exist on another shard besides the primary shard for that database, and contain data from before a movePrimary operation.
      • You run your database through Atlas, Ops Manager, or Cloud Manager and reducing shard count can't succeed.

      WORKAROUND

      For MongoDB Atlas customers: Please open a support case or start a chat with the Atlas Support team to coordinate this workaround if you have an immediate need to reduce shard count. Otherwise, do not reduce shard count or run movePrimary until upgraded to MongoDB versions 5.0.11 or 6.0.1.

      For all other users (including Ops Manager and Cloud Manager Customers):

      The following command modifies config server metadata to match the format expected by the incorrect codepath, and allow subsequent movePrimary commands to complete correctly. If you are on MongoDB Ops Manager, or Cloud Manager, these steps also make it safe to reduce shard count.

      Prior to performing a movePrimary operation (or reducing shard count in Cloud or Ops Manager) on a vulnerable cluster, run the following command from a mongos router:

      db.getSiblingDB("config").getCollection("databases").updateMany({
             $expr: {
                 $ne: [
                     ["uuid", "timestamp", "lastMod"],
                     {$map: {input: {$objectToArray: "$version"}, in : "$$this.k"}}
                 ]
             }
         },
         [{
             $replaceWith: {
                 $mergeObjects: [
                     "$$ROOT",
                     {
                         version: {
                             uuid: "$version.uuid",
                             timestamp: "$version.timestamp",
                             lastMod: "$version.lastMod"
                         }
                     }
                 ]
             }
         }]);
      

      REMEDIATION

      If you have been impacted:

      1. Stop writes to affected and related collections.
      2. Halt all DDL operations on the database and ensure no DDL operations are running on the database.
      3. Upgrade to a fix version, OR run this command on a vulnerable version, to correct the config.databases collection on the config server replica set:

      db.getSiblingDB("config").getCollection("databases").updateMany({
             $expr: {
                 $ne: [
                     ["uuid", "timestamp", "lastMod"],
                     {$map: {input: {$objectToArray: "$version"}, in : "$$this.k"}}
                 ]
             }
         },
         [{
             $replaceWith: {
                 $mergeObjects: [
                     "$$ROOT",
                     {
                         version: {
                             uuid: "$version.uuid",
                             timestamp: "$version.timestamp",
                             lastMod: "$version.lastMod"
                         }
                     }
                 ]
             }
         }]);
      

      4. For each unsharded collection in the affected database, merge the data from the source primary shard to the destination primary shard. Manual conflict resolution may be required for upserts, and you may also need to identify documents which should be deleted and address documents which have not been correctly updated. Assuming no conflicts, one way to perform this process is:

      • Perform a mongodump directly from the source primary shard.
      • Perform a mongorestore directly to the destination primary shard. Do not restore through a mongos router.

      Important: If you have run multiple movePrimary commands with differing arguments, then data must be merged from the source primary shard and all shards that have been the destination primary shard of a movePrimary operation.

      5. For each unsharded collection in the affected database, drop the collection directly on the source primary shard. Do not drop on a mongos router.
      6. Run the original movePrimary operation again.
      7. Resume writes and allow DDL operations.

      Note: "destination primary shard" in this context is the intended primary shard for the movePrimary operation.

      Original description

      Short summary of the problem

      Calling movePrimary on any database that was created under any FCV pre-v5.0 results in a no-op update on the config.database entry. The result is that unsharded collections get moved on the destination primary shard but are inaccessible via mongos because the metadata still point to the source primary shard.

      Root cause

      The update of the primary field in config.databases entries performed as part of movePrimary has a filter containing a nested BSON for the version field. This is wrong since it means the query is relying on the order of the fields and will not match documents with the exact same fields but in a different order. Using the dotted notation for the update would solve the issue.

      The filter was originally introduced under SERVER-33769 in the catalog manager, later moved in the move primary source manager under SERVER-52758. However it turned out to be problematic only starting from v5.0, when the new field was added.

      Steps to reproduce:

      Apply the following patch to upgrade_downgrade_sharded_cluster.js on the v5.0 branch (tried on revision 80418c74):

      +    var kDbName = 'db';
      +    var shard0 = st.shard0.shardName;
      +    var shard1 = st.shard1.shardName;
      +    assert.commandWorked(st.s.adminCommand({enableSharding: kDbName, primaryShard: shard0}));
      +
           jsTest.log('oldVersion: ' + oldVersion);
       
           st.configRS.awaitReplication();
      @@ -293,6 +298,10 @@ function runChecksAfterBinDowngrade() {
           // Tests after upgrade
           runChecksAfterUpgrade();
       
      +    assert.eq(shard0, st.s.getDB('config').databases.findOne({_id: kDbName}).primary);
      +    assert.commandWorked(st.s.adminCommand({movePrimary: kDbName, to: shard1}));
      +    assert.eq(shard1, st.s.getDB('config').databases.findOne({_id: kDbName}).primary);
      +
           // Setup state before downgrade
           setupStateBeforeDowngrade();
      

            Assignee:
            pierlauro.sciarelli@mongodb.com Pierlauro Sciarelli
            Reporter:
            pierlauro.sciarelli@mongodb.com Pierlauro Sciarelli
            Votes:
            0 Vote for this issue
            Watchers:
            33 Start watching this issue

              Created:
              Updated:
              Resolved: