[SERVER-24994] Assertion when sharded collection is dropped during metadata changes Created: 11/Jul/16 Updated: 20/Jan/17 Resolved: 15/Nov/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Querying, Sharding |
| Affects Version/s: | 3.2.5 |
| Fix Version/s: | 3.2.12, 3.4.0-rc4 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Aaron Westendorf | Assignee: | Max Hirschhorn |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||
| Operating System: | ALL | ||||||||||||
| Backport Completed: | |||||||||||||
| Backport Requested: |
v3.2
|
||||||||||||
| Sprint: | TIG 2016-10-31, TIG 2016-11-21 | ||||||||||||
| Participants: | |||||||||||||
| Description |
|
I had a sharded collection that was initialized while the balancer was off. Consequently, tests I was running on that collection skewed the load to a single shard. Seeing this, I enabled the balancer and waited a short time, but then decided that it would be faster for me to start with a new collection. To free resources, I disabled balancing on that collection and then not seeing a large enough decrease in resource usage, tried to delete the whole database. From a mongos shell, I tried:
Moments later, the primary for the shard crashed with the following assertion:
|
| Comments |
| Comment by Githook User [ 20/Jan/17 ] |
|
Author: {u'username': u'visemet', u'name': u'Max Hirschhorn', u'email': u'max.hirschhorn@mongodb.com'}Message: (cherry picked from commit 57ed82f0692bfb4e7a045a0108d029e53b21e3f8) |
| Comment by Githook User [ 15/Nov/16 ] |
|
Author: {u'username': u'visemet', u'name': u'Max Hirschhorn', u'email': u'max.hirschhorn@mongodb.com'}Message: |
| Comment by Aaron Westendorf [ 15/Jul/16 ] |
|
That's great news Max, I'm glad that you were able to reproduce and find the cause. |
| Comment by Max Hirschhorn [ 14/Jul/16 ] |
|
Hi aaron.westendorf, I have successfully reproduced the issue you've described. The following is an explanation of some of the related technical details. The invariant(descriptor) failure in InternalPlanner::_indexScan() means that the IndexDescriptor* was null. The caller Helpers::removeRange() had passed in a null pointer because we failed to find the shard key index in the Collection's index catalog. This is because the shard key index had been dropped as part of the "dropDatabase" command you issued. The root cause of this failure is a race between the RangeDeleter thread trying to clean up (using the shard key index) the chunk that was donated to another shard by the balancer and another thread dropping the shard key index. I'm reassigning this ticket to the sharding team to work on a fix now that the cause is understood. Thanks, |