[SERVER-26898] _migrateClone may hold WT snapshot for a long time Created: 03/Nov/16  Updated: 25/Jan/18  Resolved: 04/Nov/16

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.1.8, 3.2.0, 3.2.10, 3.4.0-rc2
Fix Version/s: 3.2.11, 3.4.0-rc3

Type: Bug Priority: Major - P3
Reporter: Bruce Lucas (Inactive) Assignee: Kaloian Manassiev
Resolution: Done Votes: 0
Labels: code-only
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Completed:
Participants:

 Description   

This _migrateClone ran for about half an hour, but numYields is 0.

2016-11-02T14:22:17.136+0000 I COMMAND [conn4545837] command admin.$cmd command: _migrateClone { _migrateClone: 1, sessionId: "699910aecbf021343ba5984d0c6cd5c5" } keyUpdates:0 writeConflicts:0 numYields:0 reslen:16773886 locks:{ Global: { acquireCount: { r: 8300 } }, Database: { acquireCount: { r: 4150 } }, Collection: { acquireCount: { r: 4150 } } } protocol:op_command 1709840ms



 Comments   
Comment by Kaloian Manassiev [ 10/Nov/16 ]

Hi royrez@microsoft.com

There are no specific JS or unit-test tests for the WT snapshot release since we have no way to test for that in 3.2 apart from writing a heavy stress scenario, which is not automatable. Instead this particular fix was tested manually through stepping in the debugger.

The referenced commit restores the usage of ScopedTransaction inside the cloner and other loops. The destructor of ScopedTransaction resets the WT range of pinned data so it can be cleared from cache. In this 3.2 commit we removed the usage of AutoGetCollectionForRead, without realizing that it also removes the underlying ScopedTransaction. This means for the duration of the loop we will hold that range pinned.

In 3.4 we have made migrations more unit-testable though introducing abstractions and isolating the chunk cloning from the rest of migration so we will be able to programmatically assert that no WT data is being pinned.

Hope this helps.

-Kal.

Comment by Roy Reznik [ 10/Nov/16 ]

It looks like this commit doesn't contain any tests. Or have I missed the validation that this fix works?

Comment by Kaloian Manassiev [ 09/Nov/16 ]

This is not isolated to the cloning part of sharding migration and that's why there were other places where ScopedTransaction had to be re-introduced. However, since cloning typically transfers the most data, it is most prominent there.

Comment by Asya Kamsky [ 09/Nov/16 ]

Is this specific to _migrateClone? I.e. there's no relationship to say initial sync replication cloning, is there?

Comment by Githook User [ 04/Nov/16 ]

Author:

{u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}

Message: SERVER-26898 Use ScopedTransaction at the migration donor
Branch: master
https://github.com/mongodb/mongo/commit/f94a71efa246401defa7fd1c48d9c20e1e652dd9

Comment by Githook User [ 04/Nov/16 ]

Author:

{u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}

Message: SERVER-26898 Use ScopedTransaction in the MigrationSourceManager

Adds explicit ScopedTransactions which were removed as part of commit
31716d2ae526d82d7d36464f6c9fae8b9f38542f. This ensures that the WT
snapshot will be reset at the manual yield points done by the
MigrationSourceManager.
Branch: v3.2
https://github.com/mongodb/mongo/commit/7f75a49639cdc910dcdbc3f8757c3474156d3ead

Comment by Kaloian Manassiev [ 03/Nov/16 ]

The problem occurs because even though we yield the collection lock, we never reset the WT snapshot. In 3.0 we use AutoGetCollectionForRead which internally resets the WT snapshot.

This is a bug in 3.2 and 3.4.0-rc2, but does not exist in 3.0.

Generated at Thu Feb 08 04:13:32 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.