-
Type: Bug
-
Resolution: Done
-
Priority: Major - P3
-
None
-
Affects Version/s: 2.6.0
-
Component/s: Sharding
-
None
-
Sharding EMEA
-
ALL
Background
cleanupOrphaned avoids shard key ranges of pending chunks. This makes sense, since these orphans are expected to soon be owned documents, and removing them would interfere with the migration process and be pointlessly counter-productive.
Problem
If the migration succeeds on the TO shard, but then fails on the FROM shard (or otherwise fails to be committed), cleanupOrphaned on the TO shard continues to avoid that chunk range forever, even though the shard does not own that range and there are no in-progress migrations.
This means that orphans in this range (notably the orphans from the failed migration itself) can no longer be removed by cleanupOrphaned. When the chunk migration is subsequently retried, if it ends up going to another shard, this situation persists forever on the original TO shard primary.
This manifests at both failMigrationConfigWritePrepare and failMigrationCommit failpoints on the FROM shard side (jstests attached).
There are no failpoints on the TO shard side, so I couldn't properly test if a failure there also causes the problem or not.
Workaround
Stepping down the affected primary allows the new ChunkManager to correctly determine the owned chunk ranges (directly from the config metadata) that cleanupOrphaned should avoid.
Results
Split at 0:
m30000| 2015-01-27T17:38:38.149+1100 [conn4] received splitChunk request: { splitChunk: "foo.bar", keyPattern: { _id: 1.0 }, min: { _id: MinKey }, max: { _id: MaxKey }, from: "shard0000", splitKeys: [ { _id: 0.0 } ], shardId: "foo.bar-_id_MinKey", configdb: "localhost:29000" } m30000| 2015-01-27T17:38:38.153+1100 [conn4] splitChunk accepted at version 1|0||54c7326e866945453e19933e
Status:
--- Sharding Status --- sharding version: { "_id" : 1, "version" : 4, "minCompatibleVersion" : 4, "currentVersion" : 5, "clusterId" : ObjectId("54c7326d866945453e199337") } shards: { "_id" : "shard0000", "host" : "localhost:30000" } { "_id" : "shard0001", "host" : "localhost:30001" } databases: { "_id" : "admin", "partitioned" : false, "primary" : "config" } { "_id" : "foo", "partitioned" : true, "primary" : "shard0000" } foo.bar shard key: { "_id" : 1 } chunks: shard0000 2 { "_id" : { "$minKey" : 1 } } -->> { "_id" : 0 } on : shard0000 Timestamp(1, 1) { "_id" : 0 } -->> { "_id" : { "$maxKey" : 1 } } on : shard0000 Timestamp(1, 2)
Deliberately insert an orphan on 2nd shard, and clean it up — note the full MinKey -> MaxKey range is considered by the rangeDeleter (which is correct):
m30001| 2015-01-27T17:38:38.358+1100 [conn1] insert foo.bar query: { _id: 1.0 } ninserted:1 keyUpdates:0 numYields:0 locks(micros) w:143702 143ms m30001| 2015-01-27T17:38:38.358+1100 [conn1] command foo.$cmd command: insert { insert: "bar", documents: [ { _id: 1.0 } ], ordered: true } keyUpdates:0 numYields:0 locks(micros) w:31 reslen:40 143ms m30001| 2015-01-27T17:38:38.359+1100 [conn1] remotely refreshing metadata for foo.bar, current shard version is 0|0||000000000000000000000000, current metadata version is 0|0||000000000000000000000000 m29000| 2015-01-27T17:38:38.359+1100 [initandlisten] connection accepted from 127.0.0.1:50549 #11 (10 connections now open) m30001| 2015-01-27T17:38:38.360+1100 [conn1] collection foo.bar was previously unsharded, new metadata loaded with shard version 0|0||54c7326e866945453e19933e m30001| 2015-01-27T17:38:38.360+1100 [conn1] collection version was loaded at version 1|2||54c7326e866945453e19933e, took 0ms m30001| 2015-01-27T17:38:38.360+1100 [conn1] Deleter starting delete for: foo.bar from { _id: MinKey } -> { _id: MaxKey }, with opId: 29 m30001| 2015-01-27T17:38:38.373+1100 [conn1] rangeDeleter deleted 1 documents for foo.bar from { _id: MinKey } -> { _id: MaxKey }
Failed migration of the 0 -> MaxKey chunk (1 document) (for the failMigrationCommit case):
m30999| 2015-01-27T17:38:38.374+1100 [conn1] CMD: movechunk: { moveChunk: "foo.bar", find: { _id: 0.0 }, to: "shard0001" } m30999| 2015-01-27T17:38:38.374+1100 [conn1] moving chunk ns: foo.bar moving ( ns: foo.bar, shard: shard0000:localhost:30000, lastmod: 1|2||000000000000000000000000, min: { _id: 0.0 }, max: { _id: MaxKey }) shard0000:localhost:30000 -> shard0001:localhost:30001 m30000| 2015-01-27T17:38:38.375+1100 [conn4] received moveChunk request: { moveChunk: "foo.bar", from: "localhost:30000", to: "localhost:30001", fromShard: "shard0000", toShard: "shard0001", min: { _id: 0.0 }, max: { _id: MaxKey }, maxChunkSizeBytes: 52428800, shardId: "foo.bar-_id_0.0", configdb: "localhost:29000", secondaryThrottle: false, waitForDelete: false, maxTimeMS: 0 } m30000| 2015-01-27T17:38:38.376+1100 [conn4] distributed lock 'foo.bar/genique:30000:1422340718:1213894827' acquired, ts : 54c7326e88b748633811c1e5 m30000| 2015-01-27T17:38:38.376+1100 [conn4] about to log metadata event: { _id: "genique-2015-01-27T06:38:38-54c7326e88b748633811c1e6", server: "genique", clientAddr: "127.0.0.1:37591", time: new Date(1422340718376), what: "moveChunk.start", ns: "foo.bar", details: { min: { _id: 0.0 }, max: { _id: MaxKey }, from: "shard0000", to: "shard0001" } } m29000| 2015-01-27T17:38:38.376+1100 [conn10] CMD fsync: sync:1 lock:0 m30000| 2015-01-27T17:38:38.397+1100 [conn4] remotely refreshing metadata for foo.bar based on current shard version 1|2||54c7326e866945453e19933e, current metadata version is 1|2||54c7326e866945453e19933e m30000| 2015-01-27T17:38:38.398+1100 [conn4] metadata of collection foo.bar already up to date (shard version : 1|2||54c7326e866945453e19933e, took 0ms) m30000| 2015-01-27T17:38:38.398+1100 [conn4] moveChunk request accepted at version 1|2||54c7326e866945453e19933e m30000| 2015-01-27T17:38:38.398+1100 [conn4] moveChunk number of documents: 1 m30001| 2015-01-27T17:38:38.398+1100 [initandlisten] connection accepted from 127.0.0.1:59845 #5 (5 connections now open) m30001| 2015-01-27T17:38:38.398+1100 [conn5] remotely refreshing metadata for foo.bar based on current shard version 0|0||54c7326e866945453e19933e, current metadata version is 1|2||54c7326e866945453e19933e m30001| 2015-01-27T17:38:38.399+1100 [conn5] metadata of collection foo.bar already up to date (shard version : 0|0||54c7326e866945453e19933e, took 0ms) m30001| 2015-01-27T17:38:38.399+1100 [migrateThread] starting receiving-end of migration of chunk { _id: 0.0 } -> { _id: MaxKey } for collection foo.bar from localhost:30000 at epoch 54c7326e866945453e19933e m30000| 2015-01-27T17:38:38.399+1100 [initandlisten] connection accepted from 127.0.0.1:37598 #5 (5 connections now open) m30000| 2015-01-27T17:38:38.400+1100 [conn4] moveChunk data transfer progress: { active: true, ns: "foo.bar", from: "localhost:30000", min: { _id: 0.0 }, max: { _id: MaxKey }, shardKeyPattern: { _id: 1.0 }, state: "clone", counts: { cloned: 0, clonedBytes: 0, catchup: 0, steady: 0 }, ok: 1.0 } my mem used: 0 m30001| 2015-01-27T17:38:38.400+1100 [migrateThread] Waiting for replication to catch up before entering critical section m30001| 2015-01-27T17:38:38.400+1100 [migrateThread] migrate commit succeeded flushing to secondaries for 'foo.bar' { _id: 0.0 } -> { _id: MaxKey } m30000| 2015-01-27T17:38:38.402+1100 [conn4] moveChunk data transfer progress: { active: true, ns: "foo.bar", from: "localhost:30000", min: { _id: 0.0 }, max: { _id: MaxKey }, shardKeyPattern: { _id: 1.0 }, state: "catchup", counts: { cloned: 1, clonedBytes: 18, catchup: 0, steady: 0 }, ok: 1.0 } my mem used: 0 m30000| 2015-01-27T17:38:38.407+1100 [conn4] moveChunk data transfer progress: { active: true, ns: "foo.bar", from: "localhost:30000", min: { _id: 0.0 }, max: { _id: MaxKey }, shardKeyPattern: { _id: 1.0 }, state: "catchup", counts: { cloned: 1, clonedBytes: 18, catchup: 0, steady: 0 }, ok: 1.0 } my mem used: 0 m30001| 2015-01-27T17:38:38.413+1100 [migrateThread] migrate commit flushed to journal for 'foo.bar' { _id: 0.0 } -> { _id: MaxKey } m30000| 2015-01-27T17:38:38.415+1100 [conn4] moveChunk data transfer progress: { active: true, ns: "foo.bar", from: "localhost:30000", min: { _id: 0.0 }, max: { _id: MaxKey }, shardKeyPattern: { _id: 1.0 }, state: "steady", counts: { cloned: 1, clonedBytes: 18, catchup: 0, steady: 0 }, ok: 1.0 } my mem used: 0 m30000| 2015-01-27T17:38:38.415+1100 [conn4] About to check if it is safe to enter critical section m30000| 2015-01-27T17:38:38.415+1100 [conn4] About to enter migrate critical section m30000| 2015-01-27T17:38:38.415+1100 [conn4] moveChunk setting version to: 2|0||54c7326e866945453e19933e m30001| 2015-01-27T17:38:38.416+1100 [initandlisten] connection accepted from 127.0.0.1:59847 #6 (6 connections now open) m30001| 2015-01-27T17:38:38.423+1100 [migrateThread] migrate commit succeeded flushing to secondaries for 'foo.bar' { _id: 0.0 } -> { _id: MaxKey } m30001| 2015-01-27T17:38:38.423+1100 [migrateThread] migrate commit flushed to journal for 'foo.bar' { _id: 0.0 } -> { _id: MaxKey } m30001| 2015-01-27T17:38:38.423+1100 [migrateThread] about to log metadata event: { _id: "genique-2015-01-27T06:38:38-54c7326e5b94b105b43d330c", server: "genique", clientAddr: ":27017", time: new Date(1422340718423), what: "moveChunk.to", ns: "foo.bar", details: { min: { _id: 0.0 }, max: { _id: MaxKey }, step 1 of 5: 0, step 2 of 5: 0, step 3 of 5: 0, step 4 of 5: 0, step 5 of 5: 23, note: "success" } } m29000| 2015-01-27T17:38:38.424+1100 [initandlisten] connection accepted from 127.0.0.1:50553 #12 (11 connections now open) m29000| 2015-01-27T17:38:38.424+1100 [conn12] CMD fsync: sync:1 lock:0 m30000| 2015-01-27T17:38:38.459+1100 [conn4] moveChunk migrate commit not accepted by TO-shard: { active: false, ns: "foo.bar", from: "localhost:30000", min: { _id: 0.0 }, max: { _id: MaxKey }, shardKeyPattern: { _id: 1.0 }, state: "done", counts: { cloned: 1, clonedBytes: 18, catchup: 0, steady: 0 }, ok: 1.0 } resetting shard version to: 0|0||000000000000000000000000 m30000| 2015-01-27T17:38:38.459+1100 [conn4] moveChunk global lock acquired to reset shard version from failed migration m30000| 2015-01-27T17:38:38.459+1100 [conn4] ShardingState::undoDonateChunk acquired _mutex m30000| 2015-01-27T17:38:38.459+1100 [conn4] Shard version successfully reset to clean up failed migration m30000| 2015-01-27T17:38:38.459+1100 [conn4] MigrateFromStatus::done About to acquire global write lock to exit critical section m30000| 2015-01-27T17:38:38.459+1100 [conn4] MigrateFromStatus::done Global lock acquired m30000| 2015-01-27T17:38:38.459+1100 [conn4] distributed lock 'foo.bar/genique:30000:1422340718:1213894827' unlocked. m30000| 2015-01-27T17:38:38.459+1100 [conn4] about to log metadata event: { _id: "genique-2015-01-27T06:38:38-54c7326e88b748633811c1e7", server: "genique", clientAddr: "127.0.0.1:37591", time: new Date(1422340718459), what: "moveChunk.from", ns: "foo.bar", details: { min: { _id: 0.0 }, max: { _id: MaxKey }, step 1 of 6: 0, step 2 of 6: 23, step 3 of 6: 1, step 4 of 6: 16, note: "aborted", errmsg: "_recvChunkCommit failed!" } } m29000| 2015-01-27T17:38:38.460+1100 [conn10] CMD fsync: sync:1 lock:0 m30000| 2015-01-27T17:38:38.494+1100 [conn4] command admin.$cmd command: moveChunk { moveChunk: "foo.bar", from: "localhost:30000", to: "localhost:30001", fromShard: "shard0000", toShard: "shard0001", min: { _id: 0.0 }, max: { _id: MaxKey }, maxChunkSizeBytes: 52428800, shardId: "foo.bar-_id_0.0", configdb: "localhost:29000", secondaryThrottle: false, waitForDelete: false, maxTimeMS: 0 } ntoreturn:1 keyUpdates:0 numYields:0 locks(micros) W:32 r:67 w:44 reslen:321 120ms m30999| 2015-01-27T17:38:38.494+1100 [conn1] moveChunk result: { cause: { active: false, ns: "foo.bar", from: "localhost:30000", min: { _id: 0.0 }, max: { _id: MaxKey }, shardKeyPattern: { _id: 1.0 }, state: "done", counts: { cloned: 1, clonedBytes: 18, catchup: 0, steady: 0 }, ok: 1.0 }, ok: 0.0, errmsg: "_recvChunkCommit failed!" }
After the failed migration (which has left behind 1 orphan), the rangeDeleter is only checking MinKey -> 0. The range 0 -> MaxKey is not scanned, and so the orphan is not cleaned:
m30001| 2015-01-27T17:38:38.496+1100 [conn1] remotely refreshing metadata for foo.bar based on current shard version 0|0||54c7326e866945453e19933e, current metadata version is 1|2||54c7326e866945453e19933e m30001| 2015-01-27T17:38:38.496+1100 [conn1] metadata of collection foo.bar already up to date (shard version : 0|0||54c7326e866945453e19933e, took 0ms) m30001| 2015-01-27T17:38:38.496+1100 [conn1] Deleter starting delete for: foo.bar from { _id: MinKey } -> { _id: 0.0 }, with opId: 41 m30001| 2015-01-27T17:38:38.496+1100 [conn1] rangeDeleter deleted 0 documents for foo.bar from { _id: MinKey } -> { _id: 0.0 }
- is related to
-
SERVER-18096 Shard primary incorrectly reuses closed sockets after relinquish and re-election
- Closed