Loading...

Type: Bug
Resolution: Done
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 2.6.0
Component/s: Sharding
Labels:
None

Assigned Teams:

Sharding EMEA
Operating System:
ALL
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Background

cleanupOrphaned avoids shard key ranges of pending chunks. This makes sense, since these orphans are expected to soon be owned documents, and removing them would interfere with the migration process and be pointlessly counter-productive.

Problem

If the migration succeeds on the TO shard, but then fails on the FROM shard (or otherwise fails to be committed), cleanupOrphaned on the TO shard continues to avoid that chunk range forever, even though the shard does not own that range and there are no in-progress migrations.

This means that orphans in this range (notably the orphans from the failed migration itself) can no longer be removed by cleanupOrphaned. When the chunk migration is subsequently retried, if it ends up going to another shard, this situation persists forever on the original TO shard primary.

This manifests at both failMigrationConfigWritePrepare and failMigrationCommit failpoints on the FROM shard side (jstests attached).

There are no failpoints on the TO shard side, so I couldn't properly test if a failure there also causes the problem or not.

Workaround

Stepping down the affected primary allows the new ChunkManager to correctly determine the owned chunk ranges (directly from the config metadata) that cleanupOrphaned should avoid.

Results

Split at 0:

 m30000| 2015-01-27T17:38:38.149+1100 [conn4] received splitChunk request: { splitChunk: "foo.bar", keyPattern: { _id: 1.0 }, min: { _id: MinKey }, max: { _id: MaxKey }, from: "shard0000", splitKeys: [ { _id: 0.0 } ], shardId: "foo.bar-_id_MinKey", configdb: "localhost:29000" }
 m30000| 2015-01-27T17:38:38.153+1100 [conn4] splitChunk accepted at version 1|0||54c7326e866945453e19933e

Status:

--- Sharding Status ---
  sharding version: {
        "_id" : 1,
        "version" : 4,
        "minCompatibleVersion" : 4,
        "currentVersion" : 5,
        "clusterId" : ObjectId("54c7326d866945453e199337")
}
  shards:
        {  "_id" : "shard0000",  "host" : "localhost:30000" }
        {  "_id" : "shard0001",  "host" : "localhost:30001" }
  databases:
        {  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
        {  "_id" : "foo",  "partitioned" : true,  "primary" : "shard0000" }
                foo.bar
                        shard key: { "_id" : 1 }
                        chunks:
                                shard0000       2
                        { "_id" : { "$minKey" : 1 } } -->> { "_id" : 0 } on : shard0000 Timestamp(1, 1)
                        { "_id" : 0 } -->> { "_id" : { "$maxKey" : 1 } } on : shard0000 Timestamp(1, 2)

Deliberately insert an orphan on 2nd shard, and clean it up — note the full MinKey -> MaxKey range is considered by the rangeDeleter (which is correct):

 m30001| 2015-01-27T17:38:38.358+1100 [conn1] insert foo.bar query: { _id: 1.0 } ninserted:1 keyUpdates:0 numYields:0 locks(micros) w:143702 143ms
 m30001| 2015-01-27T17:38:38.358+1100 [conn1] command foo.$cmd command: insert { insert: "bar", documents: [ { _id: 1.0 } ], ordered: true } keyUpdates:0 numYields:0 locks(micros) w:31 reslen:40 143ms
 m30001| 2015-01-27T17:38:38.359+1100 [conn1] remotely refreshing metadata for foo.bar, current shard version is 0|0||000000000000000000000000, current metadata version is 0|0||000000000000000000000000
 m29000| 2015-01-27T17:38:38.359+1100 [initandlisten] connection accepted from 127.0.0.1:50549 #11 (10 connections now open)
 m30001| 2015-01-27T17:38:38.360+1100 [conn1] collection foo.bar was previously unsharded, new metadata loaded with shard version 0|0||54c7326e866945453e19933e
 m30001| 2015-01-27T17:38:38.360+1100 [conn1] collection version was loaded at version 1|2||54c7326e866945453e19933e, took 0ms
 m30001| 2015-01-27T17:38:38.360+1100 [conn1] Deleter starting delete for: foo.bar from { _id: MinKey } -> { _id: MaxKey }, with opId: 29
 m30001| 2015-01-27T17:38:38.373+1100 [conn1] rangeDeleter deleted 1 documents for foo.bar from { _id: MinKey } -> { _id: MaxKey }

Failed migration of the 0 -> MaxKey chunk (1 document) (for the failMigrationCommit case):

 m30999| 2015-01-27T17:38:38.374+1100 [conn1] CMD: movechunk: { moveChunk: "foo.bar", find: { _id: 0.0 }, to: "shard0001" }
 m30999| 2015-01-27T17:38:38.374+1100 [conn1] moving chunk ns: foo.bar moving ( ns: foo.bar, shard: shard0000:localhost:30000, lastmod: 1|2||000000000000000000000000, min: { _id: 0.0 }, max: { _id: MaxKey }) shard0000:localhost:30000 -> shard0001:localhost:30001
 m30000| 2015-01-27T17:38:38.375+1100 [conn4] received moveChunk request: { moveChunk: "foo.bar", from: "localhost:30000", to: "localhost:30001", fromShard: "shard0000", toShard: "shard0001", min: { _id: 0.0 }, max: { _id: MaxKey }, maxChunkSizeBytes: 52428800, shardId: "foo.bar-_id_0.0", configdb: "localhost:29000", secondaryThrottle: false, waitForDelete: false, maxTimeMS: 0 }
 m30000| 2015-01-27T17:38:38.376+1100 [conn4] distributed lock 'foo.bar/genique:30000:1422340718:1213894827' acquired, ts : 54c7326e88b748633811c1e5
 m30000| 2015-01-27T17:38:38.376+1100 [conn4] about to log metadata event: { _id: "genique-2015-01-27T06:38:38-54c7326e88b748633811c1e6", server: "genique", clientAddr: "127.0.0.1:37591", time: new Date(1422340718376), what: "moveChunk.start", ns: "foo.bar", details: { min: { _id: 0.0 }, max: { _id: MaxKey }, from: "shard0000", to: "shard0001" } }
 m29000| 2015-01-27T17:38:38.376+1100 [conn10] CMD fsync: sync:1 lock:0
 m30000| 2015-01-27T17:38:38.397+1100 [conn4] remotely refreshing metadata for foo.bar based on current shard version 1|2||54c7326e866945453e19933e, current metadata version is 1|2||54c7326e866945453e19933e
 m30000| 2015-01-27T17:38:38.398+1100 [conn4] metadata of collection foo.bar already up to date (shard version : 1|2||54c7326e866945453e19933e, took 0ms)
 m30000| 2015-01-27T17:38:38.398+1100 [conn4] moveChunk request accepted at version 1|2||54c7326e866945453e19933e
 m30000| 2015-01-27T17:38:38.398+1100 [conn4] moveChunk number of documents: 1
 m30001| 2015-01-27T17:38:38.398+1100 [initandlisten] connection accepted from 127.0.0.1:59845 #5 (5 connections now open)
 m30001| 2015-01-27T17:38:38.398+1100 [conn5] remotely refreshing metadata for foo.bar based on current shard version 0|0||54c7326e866945453e19933e, current metadata version is 1|2||54c7326e866945453e19933e
 m30001| 2015-01-27T17:38:38.399+1100 [conn5] metadata of collection foo.bar already up to date (shard version : 0|0||54c7326e866945453e19933e, took 0ms)
 m30001| 2015-01-27T17:38:38.399+1100 [migrateThread] starting receiving-end of migration of chunk { _id: 0.0 } -> { _id: MaxKey } for collection foo.bar from localhost:30000 at epoch 54c7326e866945453e19933e
 m30000| 2015-01-27T17:38:38.399+1100 [initandlisten] connection accepted from 127.0.0.1:37598 #5 (5 connections now open)
 m30000| 2015-01-27T17:38:38.400+1100 [conn4] moveChunk data transfer progress: { active: true, ns: "foo.bar", from: "localhost:30000", min: { _id: 0.0 }, max: { _id: MaxKey }, shardKeyPattern: { _id: 1.0 }, state: "clone", counts: { cloned: 0, clonedBytes: 0, catchup: 0, steady: 0 }, ok: 1.0 } my mem used: 0
 m30001| 2015-01-27T17:38:38.400+1100 [migrateThread] Waiting for replication to catch up before entering critical section
 m30001| 2015-01-27T17:38:38.400+1100 [migrateThread] migrate commit succeeded flushing to secondaries for 'foo.bar' { _id: 0.0 } -> { _id: MaxKey }
 m30000| 2015-01-27T17:38:38.402+1100 [conn4] moveChunk data transfer progress: { active: true, ns: "foo.bar", from: "localhost:30000", min: { _id: 0.0 }, max: { _id: MaxKey }, shardKeyPattern: { _id: 1.0 }, state: "catchup", counts: { cloned: 1, clonedBytes: 18, catchup: 0, steady: 0 }, ok: 1.0 } my mem used: 0
 m30000| 2015-01-27T17:38:38.407+1100 [conn4] moveChunk data transfer progress: { active: true, ns: "foo.bar", from: "localhost:30000", min: { _id: 0.0 }, max: { _id: MaxKey }, shardKeyPattern: { _id: 1.0 }, state: "catchup", counts: { cloned: 1, clonedBytes: 18, catchup: 0, steady: 0 }, ok: 1.0 } my mem used: 0
 m30001| 2015-01-27T17:38:38.413+1100 [migrateThread] migrate commit flushed to journal for 'foo.bar' { _id: 0.0 } -> { _id: MaxKey }
 m30000| 2015-01-27T17:38:38.415+1100 [conn4] moveChunk data transfer progress: { active: true, ns: "foo.bar", from: "localhost:30000", min: { _id: 0.0 }, max: { _id: MaxKey }, shardKeyPattern: { _id: 1.0 }, state: "steady", counts: { cloned: 1, clonedBytes: 18, catchup: 0, steady: 0 }, ok: 1.0 } my mem used: 0
 m30000| 2015-01-27T17:38:38.415+1100 [conn4] About to check if it is safe to enter critical section
 m30000| 2015-01-27T17:38:38.415+1100 [conn4] About to enter migrate critical section
 m30000| 2015-01-27T17:38:38.415+1100 [conn4] moveChunk setting version to: 2|0||54c7326e866945453e19933e
 m30001| 2015-01-27T17:38:38.416+1100 [initandlisten] connection accepted from 127.0.0.1:59847 #6 (6 connections now open)
 m30001| 2015-01-27T17:38:38.423+1100 [migrateThread] migrate commit succeeded flushing to secondaries for 'foo.bar' { _id: 0.0 } -> { _id: MaxKey }
 m30001| 2015-01-27T17:38:38.423+1100 [migrateThread] migrate commit flushed to journal for 'foo.bar' { _id: 0.0 } -> { _id: MaxKey }
 m30001| 2015-01-27T17:38:38.423+1100 [migrateThread] about to log metadata event: { _id: "genique-2015-01-27T06:38:38-54c7326e5b94b105b43d330c", server: "genique", clientAddr: ":27017", time: new Date(1422340718423), what: "moveChunk.to", ns: "foo.bar", details: { min: { _id: 0.0 }, max: { _id: MaxKey }, step 1 of 5: 0, step 2 of 5: 0, step 3 of 5: 0, step 4 of 5: 0, step 5 of 5: 23, note: "success" } }
 m29000| 2015-01-27T17:38:38.424+1100 [initandlisten] connection accepted from 127.0.0.1:50553 #12 (11 connections now open)
 m29000| 2015-01-27T17:38:38.424+1100 [conn12] CMD fsync: sync:1 lock:0
 m30000| 2015-01-27T17:38:38.459+1100 [conn4] moveChunk migrate commit not accepted by TO-shard: { active: false, ns: "foo.bar", from: "localhost:30000", min: { _id: 0.0 }, max: { _id: MaxKey }, shardKeyPattern: { _id: 1.0 }, state: "done", counts: { cloned: 1, clonedBytes: 18, catchup: 0, steady: 0 }, ok: 1.0 } resetting shard version to: 0|0||000000000000000000000000
 m30000| 2015-01-27T17:38:38.459+1100 [conn4] moveChunk global lock acquired to reset shard version from failed migration
 m30000| 2015-01-27T17:38:38.459+1100 [conn4] ShardingState::undoDonateChunk acquired _mutex
 m30000| 2015-01-27T17:38:38.459+1100 [conn4] Shard version successfully reset to clean up failed migration
 m30000| 2015-01-27T17:38:38.459+1100 [conn4] MigrateFromStatus::done About to acquire global write lock to exit critical section
 m30000| 2015-01-27T17:38:38.459+1100 [conn4] MigrateFromStatus::done Global lock acquired
 m30000| 2015-01-27T17:38:38.459+1100 [conn4] distributed lock 'foo.bar/genique:30000:1422340718:1213894827' unlocked.
 m30000| 2015-01-27T17:38:38.459+1100 [conn4] about to log metadata event: { _id: "genique-2015-01-27T06:38:38-54c7326e88b748633811c1e7", server: "genique", clientAddr: "127.0.0.1:37591", time: new Date(1422340718459), what: "moveChunk.from", ns: "foo.bar", details: { min: { _id: 0.0 }, max: { _id: MaxKey }, step 1 of 6: 0, step 2 of 6: 23, step 3 of 6: 1, step 4 of 6: 16, note: "aborted", errmsg: "_recvChunkCommit failed!" } }
 m29000| 2015-01-27T17:38:38.460+1100 [conn10] CMD fsync: sync:1 lock:0
 m30000| 2015-01-27T17:38:38.494+1100 [conn4] command admin.$cmd command: moveChunk { moveChunk: "foo.bar", from: "localhost:30000", to: "localhost:30001", fromShard: "shard0000", toShard: "shard0001", min: { _id: 0.0 }, max: { _id: MaxKey }, maxChunkSizeBytes: 52428800, shardId: "foo.bar-_id_0.0", configdb: "localhost:29000", secondaryThrottle: false, waitForDelete: false, maxTimeMS: 0 } ntoreturn:1 keyUpdates:0 numYields:0 locks(micros) W:32 r:67 w:44 reslen:321 120ms
 m30999| 2015-01-27T17:38:38.494+1100 [conn1] moveChunk result: { cause: { active: false, ns: "foo.bar", from: "localhost:30000", min: { _id: 0.0 }, max: { _id: MaxKey }, shardKeyPattern: { _id: 1.0 }, state: "done", counts: { cloned: 1, clonedBytes: 18, catchup: 0, steady: 0 }, ok: 1.0 }, ok: 0.0, errmsg: "_recvChunkCommit failed!" }

After the failed migration (which has left behind 1 orphan), the rangeDeleter is only checking MinKey -> 0. The range 0 -> MaxKey is not scanned, and so the orphan is not cleaned:

 m30001| 2015-01-27T17:38:38.496+1100 [conn1] remotely refreshing metadata for foo.bar based on current shard version 0|0||54c7326e866945453e19933e, current metadata version is 1|2||54c7326e866945453e19933e
 m30001| 2015-01-27T17:38:38.496+1100 [conn1] metadata of collection foo.bar already up to date (shard version : 0|0||54c7326e866945453e19933e, took 0ms)
 m30001| 2015-01-27T17:38:38.496+1100 [conn1] Deleter starting delete for: foo.bar from { _id: MinKey } -> { _id: 0.0 }, with opId: 41
 m30001| 2015-01-27T17:38:38.496+1100 [conn1] rangeDeleter deleted 0 documents for foo.bar from { _id: MinKey } -> { _id: 0.0 }

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

find-server-17066.js
9 kB
Mar 06 2015 07:27:28 AM UTC
migration_failure_commit_cleanupOrphaned.js
2 kB
Jan 27 2015 07:56:20 PM UTC
migration_failure_configWritePrepare_cleanupOrphaned.js
2 kB
Jan 27 2015 07:56:20 PM UTC

is related to

SERVER-18096 Shard primary incorrectly reuses closed sockets after relinquish and re-election

Closed

Details

Description

Background

Problem

Workaround

Results

Attachments

Attachments

Issue Links

Forms

Activity

People

Dates