Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-40061

Chunk move fails due to DuplicateKey error on the `config.chunks` collection at migration commit

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Duplicate
    • Affects Version/s: 3.6.10
    • Fix Version/s: None
    • Component/s: Sharding
    • Labels:
      None
    • Operating System:
      ALL
    • Sprint:
      Sharding 2019-05-20, Sharding 2019-06-03, Sharding 2019-06-17, Sharding 2019-07-01, Sharding 2019-07-15, Sharding 2019-07-29, Sharding 2019-08-12

      Description

      The ChunkType::genID method uses the BSONElement::toString method, which was changed to provide a better formatting for UUID BinData. Unfortunately, the ChunkType::genID is used all around sharding-related code as a value of "_id" field in "config.chunks" collection. When the chunk minimum field has a value that is an UUID, the value of "_id" for v3.6 and v3.4 (and previous versions) differ.

      We've hit it when trying to move chunks manually in a cluser we recently moved from v3.4 to v3.6:

      2019-03-10T16:54:39.264+0300 I COMMAND  [conn469729] command admin.$cmd appName: "MongoDB Shell" command: _configsvrMoveChunk { _configsvrMoveChunk: 1, _id: "a.fs.chunks-files_id_UUID("05660000-0000-e000-96c2-dc81ca6fa911")n_0", ns: "a.fs.chunks", min: { files_id: UUID("05660000-0000-e000-96c2-dc81ca6fa911"), n: 0 }, max: { files_id: UUID("05666100-0000-e000-9252-7b82dea0b186"), n: 3 }, shard: "driveFS-2", lastmod: Timestamp(571033, 1), lastmodEpoch: ObjectId('51793868331d54dfcf8e0032'), toShard: "driveFS-17", maxChunkSizeBytes: 536870912, secondaryThrottle: {}, waitForDelete: false, writeConcern: { w: "majority", wtimeout: 15000 }, lsid: { id: UUID("605f316b-6296-4010-9f26-835b60f923ff"), uid: BinData(0, EE3A53D0CA965E6112DBEBF842D31DC81E8CE7E7548256DE28D08422B2C59D3B) }, $replData: 1, $clusterTime: { clusterTime: Timestamp(0, 0), signature: { hash: BinData(0, 0000000000000000000000000000000000000000), keyId:0 } }, $client: { application: { name: "MongoDB Shell" }, driver: { name: "MongoDB Internal Client", version: "3.6.9" }, os: { type: "Windows", name: "Microsoft Windows 10", architecture: "x86_64", version: "10.0 (build 17134)" }, mongos: { host: "dorado2:27017",client: "10.254.3.70:1334", version: "3.6.10" } }, $configServerState: { opTime: { ts: Timestamp(1552225736, 38), t: 95 } }, $db: "admin" } exception: Chunk move was not successful due to E11000 duplicate key error collection: config.chunks index: ns_1_min_1 dup key: { : "a.fs.chunks", : { files_id: UUID("05660000-0000-e000-96c2-dc81ca6fa911"), n: 0 } } code:DuplicateKey numYields:0 reslen:562 locks:{ Global: { acquireCount: { r: 10, w: 6 } }, Database: { acquireCount: { r: 2, w: 6 } }, Collection: { acquireCount: { r: 2,w: 3 } }, oplog: { acquireCount: { w: 3 } } } protocol:op_msg 340766ms
      

      Of course, the "config.chunks" collection contains this:

      > db.chunks.find({ns:"a.fs.chunks",min: { files_id: UUID("05660000-0000-e000-96c2-dc81ca6fa911"), n: 0 } })
      { "_id" : "a.fs.chunks-files_id_BinData(4, 056600000000E00096C2DC81CA6FA911)n_0", "lastmod" : Timestamp(539637, 1290), "lastmodEpoch" : ObjectId("51793868331d54dfcf8e0032"), "ns" : "a.fs.chunks", "min" : { "files_id" : UUID("05660000-0000-e000-96c2-dc81ca6fa911"), "n" : 0 }, "max" : { "files_id" : UUID("05666100-0000-e000-9252-7b82dea0b186"), "n" : 3 }, "shard" : "driveFS-2" }
      

      Since I do not know what other operations are using the "_id" field, I cannot estimate the true potential of this problem, but cursory inspection of the codebase shows there are at least some places where the update is performed without checking the number of matched/modified documents, so it may be the case that the metadata (I mean the chunk structure) could be lost/damaged silently.

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                1 Vote for this issue
                Watchers:
                19 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: