[SERVER-17397] Dropping a Database or Collection in a Sharded Cluster may not fully succeed Created: 26/Feb/15  Updated: 25/Oct/23  Resolved: 09/Jul/21

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.0.16, 3.4.18, 3.6.9, 4.0.5
Fix Version/s: 5.0.0

Type: Bug Priority: Major - P3
Reporter: Peter Garafano (Inactive) Assignee: [DO NOT USE] Backlog - Sharding EMEA
Resolution: Done Votes: 56
Labels: ShardingAutomationSupport, stop-orphaning-fallout
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Duplicate
is duplicated by SERVER-2782 no rollback of chunking if chunking f... Closed
is duplicated by SERVER-16836 Cluster can create the same unsharded... Closed
is duplicated by SERVER-17884 Can't drop a database in sharded envi... Closed
is duplicated by SERVER-19603 Non-Sharded database is present in tw... Closed
is duplicated by SERVER-21866 couldn't find database [aaa] in confi... Closed
is duplicated by SERVER-39167 Concurrent insert into nonexistent da... Closed
is duplicated by SERVER-5521 Better user feedback when drop comman... Closed
Related
related to SERVER-72797 Remove sharding exceptions from inval... Blocked
related to DOCS-15797 Remove deprecated note for collection... Closed
related to SERVER-33973 Force cleanup of possibly remaining p... Closed
related to DOCS-8514 dropDatabase in sharded environments ... Closed
related to SERVER-19811 Disable FSM workloads that drop and r... Closed
related to SERVER-14678 Cleanup leftover collection metadata ... Closed
is related to SERVER-32716 Dropping sharded database or collecti... Closed
is related to SERVER-47372 config.cache collections can remain e... Closed
is related to DOCS-13703 drop collection in sharded environmen... Closed
is related to MONGOID-4826 Support hashed shard key declarations... Closed
Assigned Teams:
Sharding EMEA
Backwards Compatibility: Fully Compatible
Participants:
Case:

 Description   
Issue Status as of Sep 18, 2020

ISSUE SUMMARY
When dropping a database / collection in a sharded cluster, even if the drop is reported as successful it is possible the database / collection may still be present in some nodes in the cluster. In MongoDB 4.2 and later, rerunning the drop command should clean up the data. In MongoDB 4.0 and earlier, we do not recommend that users drop a database or collection and then attempt to reuse the namespace.

USER IMPACT
When the database/collection is not successfully dropped in a given node, the corresponding files continue to use disk space in that node. Attempting to reuse the namespace may lead to undefined behavior.

WORKAROUNDS

To work around this issue one can follow the steps below to drop a database/collection in a sharded environment.

MongoDB 4.4:

  1. Drop the database / collection using a mongos
  2. Rerun the drop command using a mongos

MongoDB 4.2:

  1. Drop the database / collection using a mongos
  2. Rerun the drop command using a mongos
  3. Connect to each mongos and run flushRouterConfig

MongoDB 4.0 and earlier:

  1. Drop the database / collection using a mongos
  2. Connect to each shard's primary and verify the namespace has been dropped. If it has not, please drop it. Dropping a database (e.g db.dropDatabase()) removes the data files on disk for the database being dropped.
  3. Connect to a mongos, switch to the config database and remove any reference to the removed namespace from the collections chunks, locks, databases and collections:

    When dropping a database:

    use config
    db.collections.remove( { _id: /^DATABASE\./ }, {writeConcern: {w: 'majority' }} )
    db.databases.remove( { _id: "DATABASE" }, {writeConcern: {w: 'majority' }} )
    db.chunks.remove( { ns: /^DATABASE\./ }, {writeConcern: {w: 'majority' }} )
    db.tags.remove( { ns: /^DATABASE\./ }, {writeConcern: {w: 'majority' }} )
    db.locks.remove( { _id: /^DATABASE\./ }, {writeConcern: {w: 'majority' }} )
    

    When dropping a collection:

    use config
    db.collections.remove( { _id: "DATABASE.COLLECTION" }, {writeConcern: {w: 'majority' }} )
    db.chunks.remove( { ns: "DATABASE.COLLECTION" }, {writeConcern: {w: 'majority' }} )
    db.tags.remove( { ns: "DATABASE.COLLECTION" }, {writeConcern: {w: 'majority' }} )
    db.locks.remove( { _id: "DATABASE.COLLECTION" }, {writeConcern: {w: 'majority' }} )
    

  4. Connect to the primary of each shard, remove any reference to the removed namespace from the collections cache.databases, cache.collections and cache.chunks.DATABASE.COLLECTION:

    When dropping a database:

    db.getSiblingDB("config").cache.databases.remove({_id:"DATABASE"}, {writeConcern: {w: 'majority' }});
    db.getSiblingDB("config").cache.collections.remove({_id:/^DATABASE.*/}, {writeConcern: {w: 'majority' }});
    db.getSiblingDB("config").getCollectionNames().forEach(function(y) {
    			if(y.indexOf("cache.chunks.DATABASE.") == 0)
    				db.getSiblingDB("config").getCollection(y).drop()
    	})
    

    When dropping a collection:

    db.getSiblingDB("config").cache.collections.remove({_id:"DATABASE.COLLECTION"}, {writeConcern: {w: 'majority' }});
    db.getSiblingDB("config").getCollection("cache.chunks.DATABASE.COLLECTION").drop()
    

  5. Connect to each mongos and run flushRouterConfig


 Comments   
Comment by Tommaso Tocci [ 09/Jul/21 ]

As part of a project to start using reliable coordinators for sharded DDL we made both drop database and collection operations resilient to crashes, stepdowns and network partitions.

The new implementation guarantees that if a drop database/collection operation returns successfully to the client, all the data and metadata associated with that db/collection have been correctly deleted and the namespace could be safely reused immediately. In other words if a drop database/collection operation starts deleting any data it will eventually delete all the data and leave the cluster in a consistent state.

Comment by Githook User [ 01/Apr/20 ]

Author:

{'name': 'Oleg Pudeyev', 'email': '39304720+p-mongo@users.noreply.github.com', 'username': 'p-mongo'}

Message: RUBY-2149 unify change stream and crud spec runners (#1839)

  • make collection2 and database2 optional
  • parse expectations for change streams as extended json
  • unify change stream and crud operations
  • rename fail point to fail point command to match the crud test
  • rename ops to operations for consistency with crud runner
  • extract description and expectations as common attributes for crud and change stream tests
  • add trailing comma
  • move change stream runner code into spec and test files to match the defined classes
  • create a change stream outcome class to handle label matching
  • use crud verifier to check documents in change stream test for equality
  • add removedFields to the expected result to match the actual result

needed since we are now doing strict comparisons

  • move code in crud and transaction tests to resemble each other more
  • sync spec test for extended json changes

Co-authored-by: Oleg Pudeyev <oleg@bsdpower.com>
Branch: master
https://github.com/mongodb/mongo-ruby-driver/commit/bbbcd99e0e779990ff4e6df8989628f2351170c7

Comment by Sheeri Cabral (Inactive) [ 12/Dec/19 ]

Note: there has been work done so that in 4.2 all that is needed is to re-drop the database and flushRouterConfig on all the mongos. In 4.4, all that is needed is to re-drop the database.

This issue remains open as we decide if backporting to versions 4.0 and earlier is possible. There is a workaround for versions 4.0 and earlier, so those who are on 4.0 and below can recover if needed.

Comment by Kaloian Manassiev [ 16/Jan/18 ]

Hi mishra.rajat91@gmail.com,

Thank you for your question. Unfortunately, as it stands now, the zones (tags) will be left around after a collection drop. I have filed SERVER-32716 to track this bug.

You are correct that the workaround steps should include a db.tags.remove({ns: 'DATABASE.COLLECTION'}).

I am going to update the workaround steps above. In the mean time feel free to monitor SERVER-32716 for when the fix makes it into the product and which version.

Best regards,
-Kal.

Comment by Rajat Mishra [ 15/Jan/18 ]

We have implemented a tag aware sharding and created tags for each shard. In the steps given for workaround, do we also also need to remove the documents from tags collection present in config database.

Comment by Andy Schwerin [ 11/May/17 ]

Direct work on this problem is not scheduled at present, but enabling work on the shard and distributed catalogs is taking place during 3.6 development.

Comment by Clive Hill [ 09/May/17 ]

What are the plans in terms of resolving this issue? Is work being scheduled on this?

Comment by Kelsey Schubert [ 27/Mar/17 ]

Hi gauravps,

It is not possible for issue described by this ticket to affect non-sharded clusters. Please open a new SERVER ticket and supply additional details about this behavior (MongoDB Version, storage engine, how you observe that the database has not been deleted), and we will be happy to investigate.

Thank you,
Thomas

Comment by Gaurav Shellikeri [ 24/Mar/17 ]

@ramon.fernandez, is there any chance this might be affecting replicated setups? We have a cluster of three mongod nodes with one of them set to master. ~ once in 24 hours, we see that a deleted database (deleted using dropDatabase from our regression runner scripts) does not actually get deleted. We are re-using the same database name so we can identify who that database belongs to.

Comment by James Blackburn [ 14/Oct/16 ]

Would be good to have this fixed. It causes all sorts of problems with real-world workloads.

Comment by Henrik Hofmeister [ 27/May/16 ]

Seeing same issue in:

db version v3.0.11
git version: 48f8b49dc30cc2485c6c1f3db31b723258fcbf39

This severely affects performance (we're creating and dropping db's as part of an integration test process).

Comment by Ramon Fernandez Marina [ 27/Apr/16 ]

andrewdoumaux@gmail.com, sorry to hear you're being affected by this issue. The biggest impact of this bug is if you attempt to reuse the namespace, which I'd recommend against. As described above, dropping collections may yield stale metadata that should not have a significant impact on your cluster as long as the namespace is not reused, but if sh.status() is impacted due to having a large number of collections then the only workaround is, unfortunately, to clean up the orphan metadata as described above.

Regards,
Ramón.

Comment by Andrew Doumaux [ 27/Apr/16 ]

My issue might be somewhat related, and not sure when its going to fully bite me.

So my use case is caching analytic output in MongoDB. Since we have no good way to know what data has changed between analytic runs, we load the data into a new shared collection and once the data has been fully loaded and replicated, we drop the old/previous collection and via an aliasing process the service layer starts reading from the new collection.

However, in my current environment we are creating and dropping 50+ collections a day. Thus over the course of a year there will be an ~20k documents in the "config.collections" collection. This does seem to impact sh.status() since it does a find() across the config.collections collection.

Are there any good means of cleaning up a dropped sharded collection? Or at this point is the work-around state here the best option to clean up orphaned metadata?

Comment by Ramon Fernandez Marina [ 07/Apr/15 ]

Thanks for the update paulgpa, glad to hear you were able to make progress. Please note that while completing step (2) is sufficient to reclaim disk space, unless you also complete (3) and (4) it is highly likely that you'll run into trouble if you attempt to reuse the dropped namespace.

Comment by Pavlo Grinchenko [ 06/Apr/15 ]

Thank you, Ramon. Doing (2) did help us to remove what's left from the database that we wanted to remove. We didn't do the router drill.

Comment by Ramon Fernandez Marina [ 03/Apr/15 ]

paulgpa, the data files are removed in step 2; I've amended the ticket's summary box to reflect that.

Please see the documentation on the config database for more information. In step 3 you can use find() to find all the references to the namespace that was not successfully removed, and remove() to delete the relevant documents. For example, if I had

mongos> db.databases.find()
{ "_id" : "admin", "partitioned" : false, "primary" : "config" }
{ "_id" : "test", "partitioned" : false, "primary" : "shard0000" }

and test is the database that I needed to drop, I would run:

mongos> db.databases.remove({_id:"test"})
WriteResult({ "nRemoved" : 1 })
mongos> db.databases.find()
{ "_id" : "admin", "partitioned" : false, "primary" : "config" }

If you need further assistance on the details please post on the mongodb-user group or Stack Overflow with the mongodb tag, where your question will reach a larger audience. A question like this involving more discussion would be best posted on the mongodb-user group.

Regards,
Ramón.

Comment by Pavlo Grinchenko [ 03/Apr/15 ]
  1. Please notice that our Mongo cluster is version 2.6.8
  2. Assuming that we did all steps 1-4 - can we remove files on disk?
  3. We cannot claim to be comfortable with messing with config database
  4. At which point data files will be removed? Do you expect them to be removed by our Mongo cluster or this is all just a preparation for us to remove those by hand?
Comment by Pavlo Grinchenko [ 03/Apr/15 ]

We need some practical recommendations. We do understand that it will be fixed some day, but we have a disk issue within 1 week. Can we simply remove files for the dropped databases (successfully) from the shard hosts?

Generated at Thu Feb 08 03:44:17 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.