[SERVER-32783] CollectionCloner::shutdown() should not block on resetting _verifyCollectionDroppedScheduler Created: 18/Jan/18 Updated: 30/Oct/23 Resolved: 26/Jan/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | 3.6.3, 3.7.2 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Benety Goh | Assignee: | Benety Goh |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | initialSync | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Sprint: | Repl 2018-01-29 | ||||||||||||||||
| Participants: | |||||||||||||||||
| Linked BF Score: | 0 | ||||||||||||||||
| Description |
|
CollectionCloner::shutdown() may block waiting for the implicit join inside RemoteCommandRetryScheduler's destructor (while calling reset() in "_verifyCollectionDroppedScheduler").
|
| Comments |
| Comment by Githook User [ 11/Feb/18 ] |
|
Author: {'email': 'benety@mongodb.com', 'name': 'Benety Goh', 'username': 'benety'}Message: (cherry picked from commit 785f56934fcb09f121980ccf6c51d97c3af80fa2) |
| Comment by Githook User [ 10/Feb/18 ] |
|
Author: {'email': 'benety@mongodb.com', 'name': 'Benety Goh', 'username': 'benety'}Message: (cherry picked from commit 73f0e0047afb1f0c0965e4c5e540decdf92c9a72) |
| Comment by Githook User [ 10/Feb/18 ] |
|
Author: {'email': 'benety@mongodb.com', 'name': 'Benety Goh', 'username': 'benety'}Message: (cherry picked from commit b3e32fea3fb27391fce4b170b4dcec1f25b780e4) |
| Comment by Githook User [ 10/Feb/18 ] |
|
Author: {'email': 'benety@mongodb.com', 'name': 'Benety Goh', 'username': 'benety'}Message: (cherry picked from commit acf7bec77edde339ed6fb1bb89f7f03888144476) |
| Comment by Githook User [ 09/Feb/18 ] |
|
Author: {'email': 'benety@mongodb.com', 'name': 'Benety Goh', 'username': 'benety'}Message: (cherry picked from commit 35b9b4287581fdc9f37d3afeebfb2c9895b2428b) |
| Comment by Githook User [ 09/Feb/18 ] |
|
Author: {'email': 'benety@mongodb.com', 'name': 'Benety Goh', 'username': 'benety'}Message: (cherry picked from commit a046f953101dc64af42da2c72e79a11098f76a7e) |
| Comment by Githook User [ 09/Feb/18 ] |
|
Author: {'email': 'benety@mongodb.com', 'name': 'Benety Goh', 'username': 'benety'}Message: (cherry picked from commit 8b44a736464e31e2a38e40171cb34063f180171c) |
| Comment by Githook User [ 26/Jan/18 ] |
|
Author: {'name': 'Benety Goh', 'email': 'benety@mongodb.com', 'username': 'benety'}Message: |
| Comment by Githook User [ 25/Jan/18 ] |
|
Author: {'name': 'Benety Goh', 'email': 'benety@mongodb.com', 'username': 'benety'}Message: |
| Comment by Githook User [ 25/Jan/18 ] |
|
Author: {'name': 'Benety Goh', 'email': 'benety@mongodb.com', 'username': 'benety'}Message: |
| Comment by Githook User [ 25/Jan/18 ] |
|
Author: {'name': 'Benety Goh', 'email': 'benety@mongodb.com', 'username': 'benety'}Message: |
| Comment by Githook User [ 25/Jan/18 ] |
|
Author: {'name': 'Benety Goh', 'email': 'benety@mongodb.com', 'username': 'benety'}Message: |
| Comment by Githook User [ 25/Jan/18 ] |
|
Author: {'name': 'Benety Goh', 'email': 'benety@mongodb.com', 'username': 'benety'}Message: |
| Comment by Githook User [ 25/Jan/18 ] |
|
Author: {'name': 'Benety Goh', 'email': 'benety@mongodb.com', 'username': 'benety'}Message: |
| Comment by Matthew Russotto [ 19/Jan/18 ] |
|
I think just removing the .reset() should be sufficient; we never re-use the CollectionCloner after cancelling remaining work, right? So the only effect of removing the reset will be that any outstanding calls to _verifyCollectionWasDropped will return immediately, which is correct behavior. I believe this will also allow the extra scheduleWork at line 870 (in 3.6) to be removed. The deadlock is that the callback to _verifyCollectionDroppedScheduler requires the CollectionCloner lock, but _verifyCollectionDroppedScheduler.join() won't complete until all callbacks are finished running, and _cancelRemainingWork_inlock() holds the CollectionCloner lock. |
| Comment by Spencer Brody (Inactive) [ 18/Jan/18 ] |
|
benety.goh, do you have a sense how tricky this would be to fix? |