[SERVER-33973] Force cleanup of possibly remaining partial data (from failed collection/database drop) when rerunning dropCollection command Created: 19/Mar/18  Updated: 29/Oct/23  Resolved: 01/Aug/19

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 4.2.1, 4.3.1

Type: New Feature Priority: Major - P3
Reporter: Kaloian Manassiev Assignee: Blake Oler
Resolution: Fixed Votes: 5
Labels: ShardingAutomationSupport, former-quick-wins, gm-ack
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
depends on SERVER-41813 Allow ViewCatalog lookup without vali... Closed
Duplicate
is duplicated by SERVER-17243 Dropped collection & database still h... Closed
is duplicated by SERVER-21179 Clean-up orphan chunk entries left fr... Closed
is duplicated by SERVER-6413 Validate shard tags and tag chunk reg... Closed
Related
is related to SERVER-17397 Dropping a Database or Collection in ... Closed
Backwards Compatibility: Fully Compatible
Backport Requested:
v4.2, v4.0, v3.6
Sprint: Sharding 2019-05-06, Sharding 2019-06-17, Sharding 2019-07-15, Sharding 2019-07-29, Sharding 2019-08-12
Participants:
Case:
Linked BF Score: 15

 Description   

As described in SERVER-17397, with the way sharded collection/database create and drop are implemented currently, it is possible that a failed create or drop operation might leave around partial information such as incomplete chunks or collection entries. This forces administrators to do manual cleanup and poses the risk of corrupting data due to human error.

We should implement `cleanupOrphanedCollection`/`cleanupOrphanedDatabase` commands, which perform this cleanup and do proper checking and synchronization.

The code for these manual commands will eventually become the basis for implementing consistent drops using a resumable task queue.



 Comments   
Comment by Blake Oler [ 26/Aug/19 ]

Requesting backport to v3.6 as part of the solution for SERVER-34760.

Comment by Githook User [ 15/Aug/19 ]

Author:

{'name': 'Blake Oler', 'email': 'blake.oler@mongodb.com', 'username': 'BlakeIsBlake'}

Message: SERVER-33973 Force cleanup of possibly remaining partial data after failed collection/database drop

(cherry picked from commit 5c13fd19fab91e0aca666269129c51edab3380e2)
Branch: v4.2
https://github.com/mongodb/mongo/commit/cfed6ca4b485f054ee68db3a6eb948e585f128e4

Comment by Githook User [ 01/Aug/19 ]

Author:

{'name': 'Blake Oler', 'username': 'BlakeIsBlake', 'email': 'blake.oler@mongodb.com'}

Message: SERVER-33973 Force cleanup of possibly remaining partial data after failed collection/database drop
Branch: master
https://github.com/mongodb/mongo/commit/5c13fd19fab91e0aca666269129c51edab3380e2

Comment by Blake Oler [ 02/Jul/19 ]

Ran into issues with the new dropCollection approach not working with the kill_aggregation and kill_rooted_or FSM workloads. A solution will be to allow dropCollection on the config server to pass up shard error codes. Running this by the downstream-changes email list.

Comment by Kaloian Manassiev [ 10/Jul/18 ]

Instead of implementing a cleanupOrphanedCollection command, I would instead prefer that we tighten the dropCollection command to always try to perform a full cleanup. This would make the command slower for the unsharded case, but it is guaranteed that it will always make forward progress and there will not be a need to introduce a separate "cleanup" command.

The way it would work is:

  1. Take the collection distributed lock in order to prevent the collection from getting created concurrently
  2. If an entry exists in config.collections, this means that the collection actually exists so that entry should be dropped
  3. If an entry does not exist in config.collections, this means that the collection must have been dropped and the cleanup may need to purge the following entries:
    • All entries for the namespace in config.chunks
    • All entries for the namespace in config.tags
    • Broadcast dropCollection to all shards

alyson.cabral, schwerin - what are your thoughts about the customer impact of making the dropCollection operation in sharded clusters slower? I suspect that this will impact the cases where customers drop and create collections frequently, but since it is only the drop which will be slower, it would only matter if they recreate a collection with the same name.

Comment by Esha Maharishi (Inactive) [ 19/Mar/18 ]

Nice!

Generated at Thu Feb 08 04:35:10 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.