[SERVER-38284] Remove donor collection X-lock acquisition for starting the clone phase Created: 28/Nov/18 Updated: 29/Oct/23 Resolved: 22/Feb/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | 4.1.9 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Kaloian Manassiev | Assignee: | Blake Oler |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||
| Sprint: | Sharding 2018-12-17, Sharding 2018-12-31, Sharding 2019-01-14, Sharding 2019-01-28, Sharding 2019-02-11, Sharding 2019-02-25 | ||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Description |
|
The collection X-lock acquisition when entering the migration clone phase is a necessary synchronization which serves two purposes:
Synchronization (1) should be implemented by introducing a lock manager ResourceMutex object on the MigrationSourceManager decoration and add a MigrationSourceManager::getCloner method, which returns a scoped object which holds this mutex in MODE_IX and has a bool and MigrationChunkClonerSource* operators, which return nullptr if there is no active migration or the active cloner. That way, all write code paths will acquire this mutex in mode IX, whereas migration start will acquire it in mode X when it installs the clone driver. Synchronization (2) can be implemented by waiting for the last written timestamp to become journaled (or even majority committed) before starting to clone the chunk. Because of this, collection X-lock acquisition can easily be replaced with a call to the replication coordinator’s waitUntilOpTimeForRead after the writes tracking for the chunk has been activated. That way it is guaranteed that all changes to the chunk will be captured either in the cloned snapshot or in xferMods. Xfermods for committed changes only |
| Comments |
| Comment by Githook User [ 22/Feb/19 ] |
|
Author: {'name': 'Blake Oler', 'username': 'BlakeIsBlake', 'email': 'blake.oler@mongodb.com'}Message: |
| Comment by Blake Oler [ 15/Jan/19 ] |
|
| Comment by Judah Schvimer [ 14/Jan/19 ] |
|
Three comments:
Majority committed doesn't always mean journaled. It depends on the value of writeConcernMajorityJournalDefault. I'm not sure if that matters, but thought I'd mention it. 2)
Can you please clarify if these are storage-committed or replication-(majority)-committed writes? Can you please clarify if this is a storage-transaction or a mongodb-transaction? 3) I think the proposed participant interface is a bit over-complicated. Can we just replace endTransactionAndRetrieveOperations() with retrieveOperationsForMigrate() and clearOperationsInMemory() rather than adding them and keeping it with a boolean? It's unclear to me what endTransactionAndRetrieveOperations() would do that the combination of the two new methods does not do. |
| Comment by Blake Oler [ 09/Jan/19 ] |
|
judah.schvimer I outlined proposed changes to the transaction participant in the third section of the ticket so that we may observe statements of a prepared transaction on commit. Let me know if the proposed changes seem sane to you. |
| Comment by Kaloian Manassiev [ 08/Jan/19 ] |
We should just do this before attempting to disallow standalones as shard servers. |
| Comment by Esha Maharishi (Inactive) [ 08/Jan/19 ] |
|
I think |
| Comment by Blake Oler [ 08/Jan/19 ] |
|
kaloian.manassiev after doing some digging, I found out that there was no consolidated ticket to track tests that still run as standalone shards and also use chunk migrations. I've compiled them into this one ticket ( Additionally, there are fourteen suites using ShardedClusterFixture that are running shards as standalones. These need to be changed and evaluated to make sure they will pass with replica set shards. This is tracked in There is also a ticket that max.hirschhorn pointed out to me – we need to make sure that these changes won't affect Queryable Backup, something that is outside of my current knowledge base. Maybe esha.maharishi as the assignee of We are going to take on a non-negligible amount of work to make sure our testing infrastructure is up-to-par with these proposed restrictions. Your idea for using the X-lock only if we're not in a replica set may work. What would you propose as the path forward based on this knowledge? |
| Comment by Kaloian Manassiev [ 04/Jan/19 ] |
|
Since transactions are not used in standalone shards, you could use a collection X lock there to establish a visibility barrier with all writes that happened before that point. Standalone shards are also no longer supported, but we still keep the functionality because we couldn't ret rid of the last few remaining tests. Check with janna.golden for what was the reason we couldn't switch them to 1-node RS - she might know better. |
| Comment by Blake Oler [ 02/Jan/19 ] |
|
kaloian.manassiev regarding synchronization (2), if we wait for read concern as a replica set, then that will cause us to lose test coverage for any test that uses a standalone shard and also attempts to migrate for any reason. Is this an alright gap to induce, or should we instead wait for all writes to be journaled on the current node? |
| Comment by Githook User [ 28/Dec/18 ] |
|
Author: {'username': 'BlakeIsBlake', 'email': 'blake.oler@mongodb.com', 'name': 'Blake Oler'}Message: |