Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-17066

cleanupOrphaned misses orphans after failed chunk migration

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 2.6.0
    • Component/s: Sharding
    • Labels:
      None
    • Sharding EMEA
    • ALL

      Background

      cleanupOrphaned avoids shard key ranges of pending chunks. This makes sense, since these orphans are expected to soon be owned documents, and removing them would interfere with the migration process and be pointlessly counter-productive.

      Problem

      If the migration succeeds on the TO shard, but then fails on the FROM shard (or otherwise fails to be committed), cleanupOrphaned on the TO shard continues to avoid that chunk range forever, even though the shard does not own that range and there are no in-progress migrations.

      This means that orphans in this range (notably the orphans from the failed migration itself) can no longer be removed by cleanupOrphaned. When the chunk migration is subsequently retried, if it ends up going to another shard, this situation persists forever on the original TO shard primary.

      This manifests at both failMigrationConfigWritePrepare and failMigrationCommit failpoints on the FROM shard side (jstests attached).

      There are no failpoints on the TO shard side, so I couldn't properly test if a failure there also causes the problem or not.

      Workaround

      Stepping down the affected primary allows the new ChunkManager to correctly determine the owned chunk ranges (directly from the config metadata) that cleanupOrphaned should avoid.

      Results

      Split at 0:

       m30000| 2015-01-27T17:38:38.149+1100 [conn4] received splitChunk request: { splitChunk: "foo.bar", keyPattern: { _id: 1.0 }, min: { _id: MinKey }, max: { _id: MaxKey }, from: "shard0000", splitKeys: [ { _id: 0.0 } ], shardId: "foo.bar-_id_MinKey", configdb: "localhost:29000" }
       m30000| 2015-01-27T17:38:38.153+1100 [conn4] splitChunk accepted at version 1|0||54c7326e866945453e19933e
      

      Status:

      --- Sharding Status ---
        sharding version: {
              "_id" : 1,
              "version" : 4,
              "minCompatibleVersion" : 4,
              "currentVersion" : 5,
              "clusterId" : ObjectId("54c7326d866945453e199337")
      }
        shards:
              {  "_id" : "shard0000",  "host" : "localhost:30000" }
              {  "_id" : "shard0001",  "host" : "localhost:30001" }
        databases:
              {  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
              {  "_id" : "foo",  "partitioned" : true,  "primary" : "shard0000" }
                      foo.bar
                              shard key: { "_id" : 1 }
                              chunks:
                                      shard0000       2
                              { "_id" : { "$minKey" : 1 } } -->> { "_id" : 0 } on : shard0000 Timestamp(1, 1)
                              { "_id" : 0 } -->> { "_id" : { "$maxKey" : 1 } } on : shard0000 Timestamp(1, 2)
      

      Deliberately insert an orphan on 2nd shard, and clean it up — note the full MinKey -> MaxKey range is considered by the rangeDeleter (which is correct):

       m30001| 2015-01-27T17:38:38.358+1100 [conn1] insert foo.bar query: { _id: 1.0 } ninserted:1 keyUpdates:0 numYields:0 locks(micros) w:143702 143ms
       m30001| 2015-01-27T17:38:38.358+1100 [conn1] command foo.$cmd command: insert { insert: "bar", documents: [ { _id: 1.0 } ], ordered: true } keyUpdates:0 numYields:0 locks(micros) w:31 reslen:40 143ms
       m30001| 2015-01-27T17:38:38.359+1100 [conn1] remotely refreshing metadata for foo.bar, current shard version is 0|0||000000000000000000000000, current metadata version is 0|0||000000000000000000000000
       m29000| 2015-01-27T17:38:38.359+1100 [initandlisten] connection accepted from 127.0.0.1:50549 #11 (10 connections now open)
       m30001| 2015-01-27T17:38:38.360+1100 [conn1] collection foo.bar was previously unsharded, new metadata loaded with shard version 0|0||54c7326e866945453e19933e
       m30001| 2015-01-27T17:38:38.360+1100 [conn1] collection version was loaded at version 1|2||54c7326e866945453e19933e, took 0ms
       m30001| 2015-01-27T17:38:38.360+1100 [conn1] Deleter starting delete for: foo.bar from { _id: MinKey } -> { _id: MaxKey }, with opId: 29
       m30001| 2015-01-27T17:38:38.373+1100 [conn1] rangeDeleter deleted 1 documents for foo.bar from { _id: MinKey } -> { _id: MaxKey }
      

      Failed migration of the 0 -> MaxKey chunk (1 document) (for the failMigrationCommit case):

       m30999| 2015-01-27T17:38:38.374+1100 [conn1] CMD: movechunk: { moveChunk: "foo.bar", find: { _id: 0.0 }, to: "shard0001" }
       m30999| 2015-01-27T17:38:38.374+1100 [conn1] moving chunk ns: foo.bar moving ( ns: foo.bar, shard: shard0000:localhost:30000, lastmod: 1|2||000000000000000000000000, min: { _id: 0.0 }, max: { _id: MaxKey }) shard0000:localhost:30000 -> shard0001:localhost:30001
       m30000| 2015-01-27T17:38:38.375+1100 [conn4] received moveChunk request: { moveChunk: "foo.bar", from: "localhost:30000", to: "localhost:30001", fromShard: "shard0000", toShard: "shard0001", min: { _id: 0.0 }, max: { _id: MaxKey }, maxChunkSizeBytes: 52428800, shardId: "foo.bar-_id_0.0", configdb: "localhost:29000", secondaryThrottle: false, waitForDelete: false, maxTimeMS: 0 }
       m30000| 2015-01-27T17:38:38.376+1100 [conn4] distributed lock 'foo.bar/genique:30000:1422340718:1213894827' acquired, ts : 54c7326e88b748633811c1e5
       m30000| 2015-01-27T17:38:38.376+1100 [conn4] about to log metadata event: { _id: "genique-2015-01-27T06:38:38-54c7326e88b748633811c1e6", server: "genique", clientAddr: "127.0.0.1:37591", time: new Date(1422340718376), what: "moveChunk.start", ns: "foo.bar", details: { min: { _id: 0.0 }, max: { _id: MaxKey }, from: "shard0000", to: "shard0001" } }
       m29000| 2015-01-27T17:38:38.376+1100 [conn10] CMD fsync: sync:1 lock:0
       m30000| 2015-01-27T17:38:38.397+1100 [conn4] remotely refreshing metadata for foo.bar based on current shard version 1|2||54c7326e866945453e19933e, current metadata version is 1|2||54c7326e866945453e19933e
       m30000| 2015-01-27T17:38:38.398+1100 [conn4] metadata of collection foo.bar already up to date (shard version : 1|2||54c7326e866945453e19933e, took 0ms)
       m30000| 2015-01-27T17:38:38.398+1100 [conn4] moveChunk request accepted at version 1|2||54c7326e866945453e19933e
       m30000| 2015-01-27T17:38:38.398+1100 [conn4] moveChunk number of documents: 1
       m30001| 2015-01-27T17:38:38.398+1100 [initandlisten] connection accepted from 127.0.0.1:59845 #5 (5 connections now open)
       m30001| 2015-01-27T17:38:38.398+1100 [conn5] remotely refreshing metadata for foo.bar based on current shard version 0|0||54c7326e866945453e19933e, current metadata version is 1|2||54c7326e866945453e19933e
       m30001| 2015-01-27T17:38:38.399+1100 [conn5] metadata of collection foo.bar already up to date (shard version : 0|0||54c7326e866945453e19933e, took 0ms)
       m30001| 2015-01-27T17:38:38.399+1100 [migrateThread] starting receiving-end of migration of chunk { _id: 0.0 } -> { _id: MaxKey } for collection foo.bar from localhost:30000 at epoch 54c7326e866945453e19933e
       m30000| 2015-01-27T17:38:38.399+1100 [initandlisten] connection accepted from 127.0.0.1:37598 #5 (5 connections now open)
       m30000| 2015-01-27T17:38:38.400+1100 [conn4] moveChunk data transfer progress: { active: true, ns: "foo.bar", from: "localhost:30000", min: { _id: 0.0 }, max: { _id: MaxKey }, shardKeyPattern: { _id: 1.0 }, state: "clone", counts: { cloned: 0, clonedBytes: 0, catchup: 0, steady: 0 }, ok: 1.0 } my mem used: 0
       m30001| 2015-01-27T17:38:38.400+1100 [migrateThread] Waiting for replication to catch up before entering critical section
       m30001| 2015-01-27T17:38:38.400+1100 [migrateThread] migrate commit succeeded flushing to secondaries for 'foo.bar' { _id: 0.0 } -> { _id: MaxKey }
       m30000| 2015-01-27T17:38:38.402+1100 [conn4] moveChunk data transfer progress: { active: true, ns: "foo.bar", from: "localhost:30000", min: { _id: 0.0 }, max: { _id: MaxKey }, shardKeyPattern: { _id: 1.0 }, state: "catchup", counts: { cloned: 1, clonedBytes: 18, catchup: 0, steady: 0 }, ok: 1.0 } my mem used: 0
       m30000| 2015-01-27T17:38:38.407+1100 [conn4] moveChunk data transfer progress: { active: true, ns: "foo.bar", from: "localhost:30000", min: { _id: 0.0 }, max: { _id: MaxKey }, shardKeyPattern: { _id: 1.0 }, state: "catchup", counts: { cloned: 1, clonedBytes: 18, catchup: 0, steady: 0 }, ok: 1.0 } my mem used: 0
       m30001| 2015-01-27T17:38:38.413+1100 [migrateThread] migrate commit flushed to journal for 'foo.bar' { _id: 0.0 } -> { _id: MaxKey }
       m30000| 2015-01-27T17:38:38.415+1100 [conn4] moveChunk data transfer progress: { active: true, ns: "foo.bar", from: "localhost:30000", min: { _id: 0.0 }, max: { _id: MaxKey }, shardKeyPattern: { _id: 1.0 }, state: "steady", counts: { cloned: 1, clonedBytes: 18, catchup: 0, steady: 0 }, ok: 1.0 } my mem used: 0
       m30000| 2015-01-27T17:38:38.415+1100 [conn4] About to check if it is safe to enter critical section
       m30000| 2015-01-27T17:38:38.415+1100 [conn4] About to enter migrate critical section
       m30000| 2015-01-27T17:38:38.415+1100 [conn4] moveChunk setting version to: 2|0||54c7326e866945453e19933e
       m30001| 2015-01-27T17:38:38.416+1100 [initandlisten] connection accepted from 127.0.0.1:59847 #6 (6 connections now open)
       m30001| 2015-01-27T17:38:38.423+1100 [migrateThread] migrate commit succeeded flushing to secondaries for 'foo.bar' { _id: 0.0 } -> { _id: MaxKey }
       m30001| 2015-01-27T17:38:38.423+1100 [migrateThread] migrate commit flushed to journal for 'foo.bar' { _id: 0.0 } -> { _id: MaxKey }
       m30001| 2015-01-27T17:38:38.423+1100 [migrateThread] about to log metadata event: { _id: "genique-2015-01-27T06:38:38-54c7326e5b94b105b43d330c", server: "genique", clientAddr: ":27017", time: new Date(1422340718423), what: "moveChunk.to", ns: "foo.bar", details: { min: { _id: 0.0 }, max: { _id: MaxKey }, step 1 of 5: 0, step 2 of 5: 0, step 3 of 5: 0, step 4 of 5: 0, step 5 of 5: 23, note: "success" } }
       m29000| 2015-01-27T17:38:38.424+1100 [initandlisten] connection accepted from 127.0.0.1:50553 #12 (11 connections now open)
       m29000| 2015-01-27T17:38:38.424+1100 [conn12] CMD fsync: sync:1 lock:0
       m30000| 2015-01-27T17:38:38.459+1100 [conn4] moveChunk migrate commit not accepted by TO-shard: { active: false, ns: "foo.bar", from: "localhost:30000", min: { _id: 0.0 }, max: { _id: MaxKey }, shardKeyPattern: { _id: 1.0 }, state: "done", counts: { cloned: 1, clonedBytes: 18, catchup: 0, steady: 0 }, ok: 1.0 } resetting shard version to: 0|0||000000000000000000000000
       m30000| 2015-01-27T17:38:38.459+1100 [conn4] moveChunk global lock acquired to reset shard version from failed migration
       m30000| 2015-01-27T17:38:38.459+1100 [conn4] ShardingState::undoDonateChunk acquired _mutex
       m30000| 2015-01-27T17:38:38.459+1100 [conn4] Shard version successfully reset to clean up failed migration
       m30000| 2015-01-27T17:38:38.459+1100 [conn4] MigrateFromStatus::done About to acquire global write lock to exit critical section
       m30000| 2015-01-27T17:38:38.459+1100 [conn4] MigrateFromStatus::done Global lock acquired
       m30000| 2015-01-27T17:38:38.459+1100 [conn4] distributed lock 'foo.bar/genique:30000:1422340718:1213894827' unlocked.
       m30000| 2015-01-27T17:38:38.459+1100 [conn4] about to log metadata event: { _id: "genique-2015-01-27T06:38:38-54c7326e88b748633811c1e7", server: "genique", clientAddr: "127.0.0.1:37591", time: new Date(1422340718459), what: "moveChunk.from", ns: "foo.bar", details: { min: { _id: 0.0 }, max: { _id: MaxKey }, step 1 of 6: 0, step 2 of 6: 23, step 3 of 6: 1, step 4 of 6: 16, note: "aborted", errmsg: "_recvChunkCommit failed!" } }
       m29000| 2015-01-27T17:38:38.460+1100 [conn10] CMD fsync: sync:1 lock:0
       m30000| 2015-01-27T17:38:38.494+1100 [conn4] command admin.$cmd command: moveChunk { moveChunk: "foo.bar", from: "localhost:30000", to: "localhost:30001", fromShard: "shard0000", toShard: "shard0001", min: { _id: 0.0 }, max: { _id: MaxKey }, maxChunkSizeBytes: 52428800, shardId: "foo.bar-_id_0.0", configdb: "localhost:29000", secondaryThrottle: false, waitForDelete: false, maxTimeMS: 0 } ntoreturn:1 keyUpdates:0 numYields:0 locks(micros) W:32 r:67 w:44 reslen:321 120ms
       m30999| 2015-01-27T17:38:38.494+1100 [conn1] moveChunk result: { cause: { active: false, ns: "foo.bar", from: "localhost:30000", min: { _id: 0.0 }, max: { _id: MaxKey }, shardKeyPattern: { _id: 1.0 }, state: "done", counts: { cloned: 1, clonedBytes: 18, catchup: 0, steady: 0 }, ok: 1.0 }, ok: 0.0, errmsg: "_recvChunkCommit failed!" }
      

      After the failed migration (which has left behind 1 orphan), the rangeDeleter is only checking MinKey -> 0. The range 0 -> MaxKey is not scanned, and so the orphan is not cleaned:

       m30001| 2015-01-27T17:38:38.496+1100 [conn1] remotely refreshing metadata for foo.bar based on current shard version 0|0||54c7326e866945453e19933e, current metadata version is 1|2||54c7326e866945453e19933e
       m30001| 2015-01-27T17:38:38.496+1100 [conn1] metadata of collection foo.bar already up to date (shard version : 0|0||54c7326e866945453e19933e, took 0ms)
       m30001| 2015-01-27T17:38:38.496+1100 [conn1] Deleter starting delete for: foo.bar from { _id: MinKey } -> { _id: 0.0 }, with opId: 41
       m30001| 2015-01-27T17:38:38.496+1100 [conn1] rangeDeleter deleted 0 documents for foo.bar from { _id: MinKey } -> { _id: 0.0 }
      

            Assignee:
            backlog-server-sharding-emea [DO NOT USE] Backlog - Sharding EMEA
            Reporter:
            kevin.pulo@mongodb.com Kevin Pulo
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated:
              Resolved: