[SERVER-37988] recover locks on step up at the beginning of the state transition rather than at the end Created: 07/Nov/18  Updated: 29/Oct/23  Resolved: 15/May/19

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 4.1.12

Type: Task Priority: Minor - P4
Reporter: Judah Schvimer Assignee: Gregory Wlodarek
Resolution: Fixed Votes: 0
Labels: txn_storage
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-37987 Use finer grained locks for on step u... Closed
depends on SERVER-38139 Ban temporary collections in prepared... Closed
depends on SERVER-38140 Only acquire a Database IX lock for r... Closed
depends on SERVER-39520 Only use collection MODE_X locks for ... Closed
Related
related to SERVER-40700 Deadlock between read prepare conflic... Closed
related to SERVER-40723 Deadlock between S lock acquisition o... Closed
related to SERVER-37381 Allow prepared transactions to surviv... Closed
is related to SERVER-35365 MapReduce temporary inc collections s... Closed
is related to SERVER-37199 Yield locks of transactions in second... Closed
is related to SERVER-41037 Stepup should kill all user operation... Closed
Backwards Compatibility: Fully Compatible
Sprint: Storage NYC 2019-05-06, Storage NYC 2019-05-20
Participants:

 Description   

Prepared transactions hold collection and database IX locks across state transitions, and can only release those locks by processing a 'commitTransaction' or 'abortTransaction' command. It is very possible that in order to receive a 'commitTransaction' or 'abortTransaction' command, a node must go through a state transition to step up or down. Thus we cannot just wait for the prepared transaction to release its locks to acquire locks for the state transition.

This means that state transitions cannot acquire locks that would conflict with locks held by a prepared transaction.

The only state transition that I know of that acquires locks is step-up.

Shutdown does not count as a state transition. It aborts all prepared storage transactions without making that abort durable in any way before acquiring any locks. This occurs after shutting down the replication system (though before killing all operations), so I think it should be safe.



 Comments   
Comment by Githook User [ 15/May/19 ]

Author:

{'name': 'Gregory Wlodarek', 'username': 'GWlodarek', 'email': 'gregory.wlodarek@mongodb.com'}

Message: SERVER-37988 Recover locks on step up at the beginning of the state transition rather than at the end
Branch: master
https://github.com/mongodb/mongo/commit/bae6f255cb2c99d557f02d3861c4abb654fe3071

Comment by Githook User [ 15/May/19 ]

Author:

{'email': 'gregory.wlodarek@mongodb.com', 'name': 'Gregory Wlodarek', 'username': 'GWlodarek'}

Message: SERVER-37988 Add an optional predicate argument in forEachCollectionFromDb() to allow skipping unsatisfied collections without grabbing them in the desired lock mode
Branch: master
https://github.com/mongodb/mongo/commit/9af1defd2b1b831322e716baa8a4e54b27bd534a

Comment by Githook User [ 15/May/19 ]

Author:

{'email': 'gregory.wlodarek@mongodb.com', 'name': 'Gregory Wlodarek', 'username': 'GWlodarek'}

Message: SERVER-37988 Change Collection::getIndexSize() to be const
Branch: master
https://github.com/mongodb/mongo/commit/9054590882e5fb7b205593e100e83b17f85b9108

Comment by Judah Schvimer [ 01/May/19 ]

We will continue yielding locks for the time being, so this ticket is still required.

Comment by Judah Schvimer [ 23/Apr/19 ]

SERVER-40723 may stop yielding locks for prepared transactions which will make this ticket go away.

Comment by Judah Schvimer [ 18/Apr/19 ]

Sorry for the confusing question. I was unclear on why, today, we kill global S locks on step down but not step up. I was asking if this ticket was related to why we didn't have to kill global S locks on step up today. I've asked this on SERVER-40700 in response to Suganthi's comment and will move the discussion there.

Comment by Tess Avitabile (Inactive) [ 17/Apr/19 ]

I'm not sure I understand the question. I think the deadlock described in SERVER-40700 can also happen on step-up (prepared transaction waits for state transition; afterClusterTime read waits for prepared transaction; stepup waits for afterClusterTime read), but it will be fixed in the same way as stepdown, by making sure that reads don't conflict with stepup/stepdown. However, the original problem of dropping temp collections preventing us from recovering locks at the beginning of state transition still exists.

Comment by Judah Schvimer [ 17/Apr/19 ]

tess.avitabile and suganthi.mani, will this cause us to have to kill operations that acquire global S locks on step-up, similar to what we do on step-down? Or does this go away with locking changes in SERVER-40700?

Comment by Eric Milkie [ 03/Apr/19 ]

By moving the lock recovery from the end of the stepup to the beginning, we hope to discover any further lock violations; these will manifest as deadlocks.

Comment by Judah Schvimer [ 20/Feb/19 ]

I wanted to signal (mostly to myself) that this was lower priority in the prepare epic, but still required. This seemed like the best way. I can change it back if you'd prefer not to (somewhat) overload the meaning of the field.

Comment by Tess Avitabile (Inactive) [ 20/Feb/19 ]

judah.schvimer, why was the priority reduced to minor?

Comment by Judah Schvimer [ 14/Nov/18 ]

Based on this discussion and talking to tess.avitabile, we will fix this in SERVER-37199 (option 3) for now. This ticket will then become "recover locks on step up at the beginning of the state transition rather than at the end" when the Replication or Storage teams have time to fix collection drops to only take collection X locks.

Comment by Geert Bosch [ 14/Nov/18 ]

While we're very close to being able to do option 2), we are not completely there yet. On the positive side, all (temporary) collections that need dropping on step up will be 2-phase dropped, so in essence they only get a name change. After the patch to no longer destroy Collection objects on rename, we don't need the MODE_X database lock to prevent Collection pointers from becoming invalidated. However, the one remaining thing is that we need to update the name in the Database object, and that is protected by the database lock.

So maybe we should go with option 3 for now.

Comment by Judah Schvimer [ 14/Nov/18 ]

geert.bosch, do you know how big of a task 2) is?

After talking with milkie, there appears to be an option 3.
3) Release prepared transaction locks on secondaries as soon as they successfully prepare (described here). Only restore the prepared locks after we do all of the transition work we must do. This means that if we do any writes during the transition that should have conflicted with the prepared transaction, even taking as fine-grained locks as possible, then instead of deadlocking we will potentially corrupt data.

Comment by Geert Bosch [ 14/Nov/18 ]

I much prefer 2). AFAIK we already ban temp collections in transactions. We want to do the second part regardless for many reasons. Given that wouldn't 2) be both easier and safer?

Comment by Judah Schvimer [ 14/Nov/18 ]

geert.bosch and milkie, I currently have two plans for fixing the temp collection locking. Please let me know your thoughts or ideas.
1) Asynchronously drop temp collections after step up. Change the temp collection names to use UUIDs so they are not used and do not cause collisions after step up and simply schedule a task to drop all of the temp collections that will succeed as soon as it can acquire the required locks.
2) Ban temp collections in transactions and make collection drops (or temp collection drops specifically) only acquire a collection lock in mode X and a database lock in mode IX. These drops are two phase drops, so they're really renames within a database. I don't know how we'd change the locking requirements here.

1) seems easier but 2) feels safer and easier to know it's correct.

Comment by Judah Schvimer [ 14/Nov/18 ]

After discussing with charlie.swanson, agg $out and map-reduce both use temporary collections on arbitrary databases. Those collections do not need to be dropped synchronously on step up. If they're done asynchronously we must make sure that we only drop the temp collections from before the step up and not any new ones, and that no reads and writes go to the temp collections after step up. Changing temporary collection names (which now use an atomic counter but could use a UUID for uniqueness) to not collide would solve this problem. This ticket will need to make dropping those temp collections not conflict with locks held by prepared transactions.

Comment by Judah Schvimer [ 13/Nov/18 ]

Here is a list of the S and X locks taken on step up:
1) Sharding initializes collections in the 'config' database, which is not allowed in transactions currently.
2) ShardingStateRecovery acquires a collection X lock on 'admin.system.version' which cannot be used in a transaction.
3) Sharding updates the shard identity config string and acquires a collection X lock on 'admin.system.version' which cannot be used in a transaction.
4) We initialize the session catalog, by creating the 'config.transactions' collection. This is in the 'config' database, which is not allowed in transactions currently.
5) We drop all temp collections. This acquires a database X lock for all databases that have a temp collection.

The only work for this ticket is to fix temp collections or decide they are fine as is. I do not think we need to support temporary collections in transactions, though work in this ticket would need to ban them from transactions if we needed to take a collection lock in mode X on them.

Generated at Thu Feb 08 04:47:38 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.