[SERVER-38472] A config server can return early for a shardCollection command even if the shard hasn't finished its own shardCollection command Created: 07/Dec/18  Updated: 29/Oct/23  Resolved: 02/Jan/19

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 4.0.4, 4.1.6
Fix Version/s: 4.0.6, 4.1.7

Type: Bug Priority: Major - P3
Reporter: Blake Oler Assignee: Janna Golden
Resolution: Fixed Votes: 0
Labels: sharding-wfbf-day
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.0
Sprint: Sharding 2018-12-31, Sharding 2019-01-14
Participants:
Linked BF Score: 71

 Description   
  1. A config server calls shardCollection
  2. The shard begins shardCollection
  3. The shard writes chunks and the metadata entry for the collection.
  4. The config server steps down, cancelling its shardCollection command.
  5. A new config server steps up.
  6. The new config server retries the shardCollection command.
  7. The new config server sees that the metadata entry for the collection has been written, erroneously assuming that the existence of a metadata entry implies that the shard has finished its shardCollection command. This in turn causes the distributed lock to be released, meaning chunk migrations and splits can get in.
  8. A subsequent moveChunk operation can acquire the collection dist lock and because of this can attempt acquiring the critical section, which currently crashes the server.

A config server should not be able to early return if the shard's shardCollection command is not complete.



 Comments   
Comment by Githook User [ 08/Jan/19 ]

Author:

{'username': 'jannaerin', 'email': 'golden.janna@gmail.com', 'name': 'jannaerin'}

Message: SERVER-38472 Do not check if collection sharded from config server

(cherry picked from commit c3e78c91c3a86fd6aba44a0b3c97062f55512f56)
Branch: v4.0
https://github.com/mongodb/mongo/commit/6eda170a940f24e9510e157a7be23b8ba4b8a28b

Comment by Githook User [ 02/Jan/19 ]

Author:

{'username': 'jannaerin', 'email': 'golden.janna@gmail.com', 'name': 'jannaerin'}

Message: SERVER-38472 Do not check if collection sharded from config server
Branch: master
https://github.com/mongodb/mongo/commit/c3e78c91c3a86fd6aba44a0b3c97062f55512f56

Comment by Janna Golden [ 14/Dec/18 ]

BF-11523 has similar behavior, except after step #7, we then drop the collection and database while _shardsvrShardCollection is still running. This is possible because the config has released the distributed locks. Mongos then later sends an entirely new shardCollection request for the same collection, the config server sends _shardsvrShardCollection to the primary shard. This new request "joins" the original request, and then both return with the error "Collection was successfully written as sharded but got dropped before it could be evenly distributed".

Generated at Thu Feb 08 04:49:03 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.