[SERVER-29745] Range deletion after moving away a chunk must wait for metadata update to finish before proceeding Created: 20/Jun/17 Updated: 30/Oct/23 Resolved: 13/Jul/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | 3.5.11 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Dianna Hohensee (Inactive) | Assignee: | Dianna Hohensee (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||
| Sprint: | Sharding 2017-07-10, Sharding 2017-07-31 | ||||||||||||
| Participants: | |||||||||||||
| Linked BF Score: | 0 | ||||||||||||
| Description |
|
Range deletion and metadata updates are both done asynchronously without order. If data deletion were to propagate to a secondary before a metadata update, this would be wrong. |
| Comments |
| Comment by Dianna Hohensee (Inactive) [ 13/Jul/17 ] |
|
Finally determined that the hang was caused by holding a ScopedCollectionMetadata object while scheduling and waiting to range deletion. The solution was to stop holding it – it wasn't necessary, anyway, so it's probably better not to hold on to it anyway. However, holding that scoped object of the latest metadata should not have held up range deletion of an unused range from an old metadata version: it's a bug that clean up was never scheduled. I suspect the error is either related to this not evaluating to true for some reason when clean up is first requested, or the ScopedCollectionMetadata destructor code that should schedule cleanup when old metadata is released. |
| Comment by Githook User [ 13/Jul/17 ] |
|
Author: {u'username': u'DiannaHohensee', u'name': u'Dianna Hohensee', u'email': u'dianna.hohensee@10gen.com'}Message: |
| Comment by Githook User [ 11/Jul/17 ] |
|
Author: {u'username': u'DiannaHohensee', u'name': u'Dianna Hohensee', u'email': u'dianna.hohensee@10gen.com'}Message: Revert " This reverts commit 3b1554c77ce9c80b30044654ff2cab3aff7070d4. |
| Comment by Githook User [ 11/Jul/17 ] |
|
Author: {u'username': u'DiannaHohensee', u'name': u'Dianna Hohensee', u'email': u'dianna.hohensee@10gen.com'}Message: |
| Comment by Dianna Hohensee (Inactive) [ 08/Jul/17 ] |
|
Reverted the commit. It appears to be causing a hang, e.g. https://evergreen.mongodb.com/task/mongodb_mongo_master_enterprise_rhel_62_64_bit_slow1_344bf6e257e1427bc594bacac3f5983c2bdeaacf_17_07_07_12_44_23 The hang is in the CollectionRangeDeleter code. There's no corresponding "Finished deleting mr_during_migrate.coll range ...." message after the donor finishes the migration and starts waiting. And one of the thread dumps has CollectionRangeDeleter::DeleteNotification::waitStatus in it. I have not diagnosed the range deletion problem, merely identified that it is the problem and needed to be reverted. The CollectionRangeDeleter functions called in moveChunk were changed in this commit. It seems to have unwittingly surfaced a bug. |
| Comment by Githook User [ 08/Jul/17 ] |
|
Author: {u'username': u'DiannaHohensee', u'name': u'Dianna Hohensee', u'email': u'dianna.hohensee@10gen.com'}Message: Revert " This reverts commit 344bf6e257e1427bc594bacac3f5983c2bdeaacf. |
| Comment by Githook User [ 07/Jul/17 ] |
|
Author: {u'username': u'DiannaHohensee', u'name': u'Dianna Hohensee', u'email': u'dianna.hohensee@10gen.com'}Message: |