[SERVER-31485] Race between move chunks and dropIndex may lead to IndexNotFound error Created: 10/Oct/17 Updated: 30/Oct/23 Resolved: 03/Nov/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 3.5.13 |
| Fix Version/s: | 3.6.0-rc3 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Eddie Louie | Assignee: | Esha Maharishi (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Steps To Reproduce: | Use the following patch:
Then run the following file using: ./mongo --nodb repro_bf6752.js
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Sprint: | Sharding 2017-10-23, Sharding 2017-11-13 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Linked BF Score: | 0 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
A race condition between the migration of chunks to the secondary shard (and index creation) and the drop index. With help from Max Hirschhorn we theorize the following scenario. |
| Comments |
| Comment by Githook User [ 03/Nov/17 ] |
|
Author: {'name': 'Esha Maharishi', 'username': 'EshaMaharishi', 'email': 'esha.maharishi@mongodb.com'}Message: |
| Comment by Esha Maharishi (Inactive) [ 25/Oct/17 ] |
|
max.hirschhorn and I identified two bugs across dropIndexes and createIndexes, summarized in However, there's an additional bug within the dropIndexes bug: it's possible for dropIndexes to return ok:0 with IndexNotFound instead of ok:1 with NamespaceNotFound, because a recipient shard drops its exclusive database lock in between creating the collection and creating the indexes. This ticket will make recipient shards hold the database lock across creating the collection and creating the indexes to fix this smaller issue within dropIndexes only. |
| Comment by Esha Maharishi (Inactive) [ 23/Oct/17 ] |
|
Update: This can manifest in a more general way that cannot be fixed with local synchronization between createIndexes and receiving a chunk on a recipient shard (and/or local synchronization between createIndexes and donating a chunk on a donor shard):
One solution could be to make createIndexes (and collMod) take a distlock, since migrations occur under a distlock. The drawback is that the Balancer schedules migrations frequently (and holds a collection distlock for the duration of each), possibly frequently enough to starve createIndexes from ever grabbing the distlock. |
| Comment by Esha Maharishi (Inactive) [ 10/Oct/17 ] |
|
This is the relevant code on the recipient shard, where it copies indexes from the donor shard and then creates the collection entry on itself: https://github.com/mongodb/mongo/blob/r3.5.13/src/mongo/db/s/migration_destination_manager.cpp#L485-L646 |
| Comment by Esha Maharishi (Inactive) [ 10/Oct/17 ] |
|
Hmm, I see the race now. I think it's slightly different than the one described above - it seems like it's: 1) migration initiated A fix could be to make a recipient shard take a collection lock between asking the donor shard for indexes and creating the collection entry on itself. Thanks very much to eddie.louie and max.hirschhorn for their work on this hard-to-diagnose issue. |