[SERVER-26898] _migrateClone may hold WT snapshot for a long time Created: 03/Nov/16 Updated: 25/Jan/18 Resolved: 04/Nov/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 3.1.8, 3.2.0, 3.2.10, 3.4.0-rc2 |
| Fix Version/s: | 3.2.11, 3.4.0-rc3 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Bruce Lucas (Inactive) | Assignee: | Kaloian Manassiev |
| Resolution: | Done | Votes: | 0 |
| Labels: | code-only | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Backwards Compatibility: | Fully Compatible | ||||
| Operating System: | ALL | ||||
| Backport Completed: | |||||
| Participants: | |||||
| Description |
|
This _migrateClone ran for about half an hour, but numYields is 0.
|
| Comments |
| Comment by Kaloian Manassiev [ 10/Nov/16 ] |
|
There are no specific JS or unit-test tests for the WT snapshot release since we have no way to test for that in 3.2 apart from writing a heavy stress scenario, which is not automatable. Instead this particular fix was tested manually through stepping in the debugger. The referenced commit restores the usage of ScopedTransaction inside the cloner and other loops. The destructor of ScopedTransaction resets the WT range of pinned data so it can be cleared from cache. In this 3.2 commit we removed the usage of AutoGetCollectionForRead, without realizing that it also removes the underlying ScopedTransaction. This means for the duration of the loop we will hold that range pinned. In 3.4 we have made migrations more unit-testable though introducing abstractions and isolating the chunk cloning from the rest of migration so we will be able to programmatically assert that no WT data is being pinned. Hope this helps. -Kal. |
| Comment by Roy Reznik [ 10/Nov/16 ] |
|
It looks like this commit doesn't contain any tests. Or have I missed the validation that this fix works? |
| Comment by Kaloian Manassiev [ 09/Nov/16 ] |
|
This is not isolated to the cloning part of sharding migration and that's why there were other places where ScopedTransaction had to be re-introduced. However, since cloning typically transfers the most data, it is most prominent there. |
| Comment by Asya Kamsky [ 09/Nov/16 ] |
|
Is this specific to _migrateClone? I.e. there's no relationship to say initial sync replication cloning, is there? |
| Comment by Githook User [ 04/Nov/16 ] |
|
Author: {u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}Message: |
| Comment by Githook User [ 04/Nov/16 ] |
|
Author: {u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}Message: Adds explicit ScopedTransactions which were removed as part of commit |
| Comment by Kaloian Manassiev [ 03/Nov/16 ] |
|
The problem occurs because even though we yield the collection lock, we never reset the WT snapshot. In 3.0 we use AutoGetCollectionForRead which internally resets the WT snapshot. This is a bug in 3.2 and 3.4.0-rc2, but does not exist in 3.0. |