[SERVER-38284] Remove donor collection X-lock acquisition for starting the clone phase Created: 28/Nov/18  Updated: 29/Oct/23  Resolved: 22/Feb/19

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 4.1.9

Type: Task Priority: Major - P3
Reporter: Kaloian Manassiev Assignee: Blake Oler
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-39017 Allow prepared transaction statements... Closed
depends on SERVER-39021 Switch migrations to observe multi-st... Closed
Related
related to SERVER-71219 Migration can miss writes from prepar... Closed
related to SERVER-80236 Race in migration source registration... Closed
Backwards Compatibility: Fully Compatible
Sprint: Sharding 2018-12-17, Sharding 2018-12-31, Sharding 2019-01-14, Sharding 2019-01-28, Sharding 2019-02-11, Sharding 2019-02-25
Participants:

 Description   

The collection X-lock acquisition when entering the migration clone phase is a necessary synchronization which serves two purposes:

  1. Removes the need for a mutex necessary for reading and writing to MSM* decoration of the CollectionShardingRuntime, where nullptr value means that writes are not tracked and non-nullptr value means that the current migration is tracking writes.
  2. Ensures that the chunk migration will start tracking writes to the chunk after all documents, which the clone phase will see have been journaled.

Synchronization (1) should be implemented by introducing a lock manager ResourceMutex object on the MigrationSourceManager decoration and add a MigrationSourceManager::getCloner method, which returns a scoped object which holds this mutex in MODE_IX and has a bool and MigrationChunkClonerSource* operators, which return nullptr if there is no active migration or the active cloner. That way, all write code paths will acquire this mutex in mode IX, whereas migration start will acquire it in mode X when it installs the clone driver.

Synchronization (2) can be implemented by waiting for the last written timestamp to become journaled (or even majority committed) before starting to clone the chunk. Because of this, collection X-lock acquisition can easily be replaced with a call to the replication coordinator’s waitUntilOpTimeForRead after the writes tracking for the chunk has been activated. That way it is guaranteed that all changes to the chunk will be captured either in the cloned snapshot or in xferMods.

Xfermods for committed changes only
Since we are removing a collection X-lock acquisition, which creates a barrier after which all active transactions on the collection have committed, we need to ensure that migration chunk cloner source doesn't miss writes that started before the migration (and would never had called the LogOpForShardingHandler of the migration manager). This will be achieved by ensuring that shardObserveInsertOp is only called for committed writes and that on transaction commit we call it for each document written for the migrated collection.



 Comments   
Comment by Githook User [ 22/Feb/19 ]

Author:

{'name': 'Blake Oler', 'username': 'BlakeIsBlake', 'email': 'blake.oler@mongodb.com'}

Message: SERVER-38284 Remove donor collection X-lock acquisition for starting the clone phase in migrations
Branch: master
https://github.com/mongodb/mongo/commit/7fce23cd2642bb3ff8d972e32e8ea2c82d951f35

Comment by Blake Oler [ 15/Jan/19 ]

judah.schvimer

  1. We currently rely on majority commit writes to confirm completion of cloning (here). kaloian.manassiev and I decided that waiting for local read concern from replication is valid at the beginning of the clone process because, in all cases, we wait for majority commit to complete.
  2. We are referring to replication committed writes. However, as said in the previous point, we switched to using locally visible writes via local read concern.
  3. Conversation on this has been sent to SERVER-39017.
Comment by Judah Schvimer [ 14/Jan/19 ]

Three comments:
1)

waiting for the last written timestamp to become journaled (or even majority committed)

Majority committed doesn't always mean journaled. It depends on the value of writeConcernMajorityJournalDefault. I'm not sure if that matters, but thought I'd mention it.

2)

committed writes and that on transaction commit

Can you please clarify if these are storage-committed or replication-(majority)-committed writes? Can you please clarify if this is a storage-transaction or a mongodb-transaction?

3) I think the proposed participant interface is a bit over-complicated. Can we just replace endTransactionAndRetrieveOperations() with retrieveOperationsForMigrate() and clearOperationsInMemory() rather than adding them and keeping it with a boolean? It's unclear to me what endTransactionAndRetrieveOperations() would do that the combination of the two new methods does not do.

Comment by Blake Oler [ 09/Jan/19 ]

judah.schvimer I outlined proposed changes to the transaction participant in the third section of the ticket so that we may observe statements of a prepared transaction on commit. Let me know if the proposed changes seem sane to you.

Comment by Kaloian Manassiev [ 08/Jan/19 ]

What would you propose as the path forward based on this knowledge?

Your idea for using the X-lock only if we're not in a replica set may work.

We should just do this before attempting to disallow standalones as shard servers.

Comment by Esha Maharishi (Inactive) [ 08/Jan/19 ]

I think SERVER-32531 was filed for the work queryable backup needed to disallow standalones as shard servers.

Comment by Blake Oler [ 08/Jan/19 ]

kaloian.manassiev after doing some digging, I found out that there was no consolidated ticket to track tests that still run as standalone shards and also use chunk migrations. I've compiled them into this one ticket (SERVER-38894). There are twenty-two conflicting tests scattered across seven tickets. Investigation needs to be done for a portion of them – for the rest that can simply be converted, time still needs to be taken to verify they will pass Evergreen. I estimate that to be a sprint and a half's worth of work.

Additionally, there are fourteen suites using ShardedClusterFixture that are running shards as standalones. These need to be changed and evaluated to make sure they will pass with replica set shards. This is tracked in SERVER-38898. I estimate this to be at least a sprint's worth of work.

There is also a ticket that max.hirschhorn pointed out to me – we need to make sure that these changes won't affect Queryable Backup, something that is outside of my current knowledge base. Maybe esha.maharishi as the assignee of SERVER-32529 might know more?

We are going to take on a non-negligible amount of work to make sure our testing infrastructure is up-to-par with these proposed restrictions.

Your idea for using the X-lock only if we're not in a replica set may work.

What would you propose as the path forward based on this knowledge?

Comment by Kaloian Manassiev [ 04/Jan/19 ]

Since transactions are not used in standalone shards, you could use a collection X lock there to establish a visibility barrier with all writes that happened before that point.

Standalone shards are also no longer supported, but we still keep the functionality because we couldn't ret rid of the last few remaining tests. Check with janna.golden for what was the reason we couldn't switch them to 1-node RS - she might know better.

Comment by Blake Oler [ 02/Jan/19 ]

kaloian.manassiev regarding synchronization (2), if we wait for read concern as a replica set, then that will cause us to lose test coverage for any test that uses a standalone shard and also attempts to migrate for any reason. Is this an alright gap to induce, or should we instead wait for all writes to be journaled on the current node?

Comment by Githook User [ 28/Dec/18 ]

Author:

{'username': 'BlakeIsBlake', 'email': 'blake.oler@mongodb.com', 'name': 'Blake Oler'}

Message: SERVER-38284 Create concurrency lock for CollectionShardingRuntime
Branch: master
https://github.com/mongodb/mongo/commit/84a0dd98f9bedec0d104b912f23b3a1221ae456e

Generated at Thu Feb 08 04:48:30 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.