[SERVER-39424] Test that DDL operations can't succeed after prepared transactions are aborted during shutdown Created: 07/Feb/19  Updated: 08/Jan/24  Resolved: 30/Apr/19

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 4.1.11

Type: Task Priority: Major - P3
Reporter: Samyukta Lanka Assignee: Dianna Hohensee (Inactive)
Resolution: Fixed Votes: 0
Labels: prepare_testing, txn_storage
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
is related to SERVER-38994 Step down on SIGTERM Closed
is related to SERVER-35817 Allow shutdowns while prepared transa... Closed
is related to SERVER-37199 Yield locks of transactions in second... Closed
is related to SERVER-38162 Acquire ReplicationStateTransitionLoc... Closed
Backwards Compatibility: Fully Compatible
Sprint: Repl 2019-03-25, Repl 2019-04-08, Storage NYC 2019-05-06
Participants:

 Description   

Once the UninterruptibleLockGuard work is completed and all UninterruptibleLockGuard s that conflict with prepared transactions are removed, we should verify that DDL operations are successfully killed and they cannot succeed after prepared transactions have been aborted during shutdown.



 Comments   
Comment by Githook User [ 01/May/19 ]

Author:

{'email': 'justin.seyster@mongodb.com', 'name': 'Justin Seyster', 'username': 'jseyster'}

Message: SERVER-39424 Lint fixup
Branch: master
https://github.com/mongodb/mongo/commit/55790d8e6b46a8d337dbc069d04fa12fda5e9583

Comment by Githook User [ 30/Apr/19 ]

Author:

{'email': 'dianna.hohensee@10gen.com', 'name': 'Dianna', 'username': 'DiannaHohensee'}

Message: SERVER-39424 Test that a DDL op started after a prepared transaction fails during shutdown
Branch: master
https://github.com/mongodb/mongo/commit/11681d3507d05bf2c5b7873d96239b3bbfbca0dd

Comment by Dianna Hohensee (Inactive) [ 25/Apr/19 ]

I've verified my reasoning is correct by running the create collection command concurrently with shutdown, such that it's lock request is enqueued behind the RSTL lock. The cmd failed in the lock acquisition code, as anticipated

13.679-0400 d43061| 2019-04-25T12:57:13.678-0400 I  STORAGE  [conn2] sleeping before closing transportation layer
13.817-0400 d43061| 2019-04-25T12:57:13.817-0400 I  COMMAND  [conn1] create collection failpoint, sleeping
16.679-0400 d43061| 2019-04-25T12:57:16.679-0400 I  STORAGE  [conn2] done sleeping
16.679-0400 d43061| 2019-04-25T12:57:16.679-0400 I  NETWORK  [conn2] shutdown: going to close listening sockets...
16.680-0400 d43061| 2019-04-25T12:57:16.679-0400 I  NETWORK  [conn2] removing socket file: /tmp/mongodb-43061.sock
16.681-0400 d43061| 2019-04-25T12:57:16.680-0400 I  STORAGE  [conn2] enqueued RSTL lock, sleeping before killing ops
17.817-0400 d43061| 2019-04-25T12:57:17.817-0400 I  COMMAND  [conn1] done sleeping in create collection
17.817-0400 d43061| 2019-04-25T12:57:17.817-0400 I  COMMAND  [conn1] starting collection creation, after index creation?
17.817-0400 d43061| 2019-04-25T12:57:17.817-0400 I  -        [conn1] ~~~waiting for lock
18.318-0400 d43061| 2019-04-25T12:57:18.318-0400 I  -        [conn1] ~~~waiting for lock
18.818-0400 d43061| 2019-04-25T12:57:18.818-0400 I  -        [conn1] ~~~waiting for lock
19.319-0400 d43061| 2019-04-25T12:57:19.319-0400 I  -        [conn1] ~~~waiting for lock
19.820-0400 d43061| 2019-04-25T12:57:19.820-0400 I  -        [conn1] ~~~waiting for lock
20.321-0400 d43061| 2019-04-25T12:57:20.321-0400 I  -        [conn1] ~~~waiting for lock
20.822-0400 d43061| 2019-04-25T12:57:20.821-0400 I  -        [conn1] ~~~waiting for lock
21.323-0400 d43061| 2019-04-25T12:57:21.322-0400 I  -        [conn1] ~~~waiting for lock
21.681-0400 d43061| 2019-04-25T12:57:21.681-0400 I  STORAGE  [conn2] done sleeping again
21.682-0400 d43061| 2019-04-25T12:57:21.681-0400 I  -        [conn1] ~~~lock acquisition threw: InterruptedAtShutdown: interrupted at shutdown
21.682-0400 d43061| 2019-04-25T12:57:21.682-0400 I  STORAGE  [conn2] Hanging server in shutdown after clearing transactions due to failpoint
21.683-0400 d43061| 2019-04-25T12:57:21.683-0400 I  COMMAND  [conn1] done with createCollection, res: InterruptedAtShutdown: interrupted at shutdown
21.684-0400 d43061| 2019-04-25T12:57:21.683-0400 I  COMMAND  [conn1] Timed out obtaining lock while trying to gather storage statistics for a slow operation
21.684-0400 d43061| 2019-04-25T12:57:21.684-0400 I  COMMAND  [conn1] command admin.$cmd appName: "MongoDB Shell" command: create { create: "baz", lsid: { id: UUID("67bf8e5e-3616-4fa5-8ede-166ea3e1813c") }, $db: "admin" } numYields:0 ok:0 errMsg:"interrupted at shutdown" errName:InterruptedAtShutdown errCode:11600 reslen:120 locks:{ Global: { acquireCount: { r: 1, w: 1 }, acquireWaitCount: { w: 1 }, timeAcquiringMicros: { w: 3505005 } } } flowControl:{ acquireCount: 1 } protocol:op_msg 7866ms

However, I did so with a combination of fail points and server code sleepmillis calls, which does not have sufficient reliability to be committed. The problem is that we close the transportation layer during shutdown before we do anything with locks or interrupting operations. Also, we do not have the infrastructure to handle server shutdown in a parallel shell: my test did not exit gracefully.

Comment by Dianna Hohensee (Inactive) [ 24/Apr/19 ]

Yes, seems reasonable, I'll create such a test.

Comment by Judah Schvimer [ 24/Apr/19 ]

I'd like to at the very least add a test that the above reasoning holds for some DDL operation. Does that seem reasonable?

Comment by Dianna Hohensee (Inactive) [ 24/Apr/19 ]

From what I can see in our code, I think I'm concluding that no work needs to be done for this ticket.

This is the shutdown code of interest to us

    {
        ....
        ....
 
        // Acquire the RSTL in mode X. First we enqueue the lock request, then kill all operations,
        // destroy all stashed transaction resources in order to release locks, and finally wait
        // until the lock request is granted.
        repl::ReplicationStateTransitionLockGuard rstl(
            opCtx, MODE_X, repl::ReplicationStateTransitionLockGuard::EnqueueOnly());
 
        // Kill all operations. After this point, the opCtx will have been marked as killed and will
        // not be usable other than to kill all transactions directly below.
        serviceContext->setKillAllOperations();
 
        // Destroy all stashed transaction resources, in order to release locks.
        killSessionsLocalShutdownAllTransactions(opCtx);
 
        rstl.waitForLockUntil(Date_t::max());
    }

Once the RSTL lock is enqueued, further CRUD/DDL ops may queue up behind it. We then mark all ongoing ops as killed by calling setKillAllOperations(), and any future new ops will also immediately be marked as killed. Then we wait to acquire the RSTL lock, ensuring that any ops with locks have finished. Now we only have to worry about lock requests enqueued behind/after the RSTL lock. It is not guaranteed that operations themselves will check for interrupt frequently enough to notice that they have been killed. However, the locking system does guarantee this, for lock requests that are not under UninterruptibleLockGuards.

LockerImpl::lockComplete
        ....
        ....
        if (opCtx && _uninterruptibleLocksRequested == 0) {
            result = _notify.wait(opCtx, waitTime);                                 <-- what runs for interruptible lock acquisitions
        } else {
            result = _notify.wait(waitTime);                                             <-- what runs for UninterruptibleLockGuards
        }
        ....
        ....

_notify is a CondVarLockGrantNotification and CondVarLockGrantNotification::wait will check for interrupt before returning a successful LockResult

LockResult CondVarLockGrantNotification::wait(OperationContext* opCtx, Milliseconds timeout) {
    invariant(opCtx);
    stdx::unique_lock<stdx::mutex> lock(_mutex);
    if (opCtx->waitForConditionOrInterruptFor(
            _cond, lock, timeout, [this] { return _result != LOCK_INVALID; })) {
        // Because waitForConditionOrInterruptFor evaluates the predicate before checking for
        // interrupt, it is possible that a killed operation can acquire a lock if the request is
        // granted quickly. For that reason, it is necessary to check if the operation has been
        // killed at least once before accepting the lock grant.
        opCtx->checkForInterrupt();
        return _result;
    }
    return LOCK_TIMEOUT;
}

So any ops with lock requests enqueued after the RSTL will not successfully acquire locks for CRUD/DDL ops.

I do not think ReplicationCoordinator::canAcceptWritesFor* is an appropriate invariant to make during shutdown to ensure no further writes occur because 1) that check is only intended to stop external clients 2) it is in a high level replication concept and should not be used to control lower level activity 3) ReplicationCoordinator::stepDown called during shutdown seems too racy to depend upon forcing (waiting forever might actually be forever, though I have not looked in depth).

CC: milkie

Comment by Judah Schvimer [ 24/Apr/19 ]

It seems that it can only throw if it times out waiting for the lock or if another stepdown is in progress. Can we wait indefinitely for the lock (since we'll need it eventually to shutdown anyways) and continuously try to step down until it either succeeds or gets a NotMaster error?

Alternatively, we could "do something to prevent writes from succeeding" after we release prepared locks. sSepdown was that thing, but if that doesn't work we could try setting canAcceptWritesFor directly, or add a shutdown check where we check canAcceptWritesFor in the shutdown path.

Comment by Dianna Hohensee (Inactive) [ 24/Apr/19 ]

judah.schvimer, it appears that the ReplicationCoordinator::stepDown function can still throw errors if force=true is passed into it, so it can still fail. I don't see how we could safely invariant afterwards that canAcceptWritesFor is false, unless ReplicationCoordinator::stepDown can be changed, which is outside of my current expertise – it would take me a long while to figure out the replication code.

Comment by Judah Schvimer [ 23/Apr/19 ]

Yes, the above comment as you described is what we discussed with milkie would be the work on this ticket. It's not directly testing it, but it gives us confidence with the invariant and the existing test coverage around shutting down secondaries.

Comment by Dianna Hohensee (Inactive) [ 23/Apr/19 ]

judah.schvimer, is what you delineate in the last comment the work you want done for this ticket? If I understand correctly, you'd like

  • a forced stepdown call following a failed unforced stepdown call
  • an invariant that writes are no longer accepted after stepdown
  • and "something" to prevent reads on prepared documents during shutdown – I'm guessing that means that after the locks are dropped from interrupting/aborting prepared transactions we need to block regular readers across the board (not just on docs that were/are part of transactions).

This ticket is no longer about testing that DDL ops fail after prepared transactions are interrupted/aborted during shutdown?

Comment by Judah Schvimer [ 05/Apr/19 ]

Now that we step down before shutdown, this is now identical to stepdown. The work on this ticket should make the stepdown in shutdownTask call force stepdown on failure here.

We will then add an invariant that canAcceptWritesFor is set to false here after the stepdown completes.

We will also do something to prevent reads on prepared documents from succeeding after the prepared transactions are aborted in shutdown.

Comment by Judah Schvimer [ 20/Mar/19 ]

siyuan.zhou, good point. Using the stepdown procedure wouldn't completely protect us from improper uses of UninterruptibleLockGuards. If a DDL op with an UninterruptibleLockGuard queued behind the RSTL, then the RSTL could still be acquired by the shutdown. We would then yield prepared locks and after the shutdown released the RSTL, the DDL op could acquire a lock that should have conflicted with the prepared transaction and finish its operation erroneously.

By this logic, improper UninterruptibleLockGuards would also be a problem during stepdown when we yield locks from prepared transactions after acquiring the RSTL. In that case though, we do change "canAcceptWritesFor" to false, so that will likely protect the DDL op from succeeding.

Comment by Siyuan Zhou [ 19/Mar/19 ]

If the DDL were not interruptible, then we would not be able to acquire the RSTL at all.

Do you mean the uninterruptible DDL are waiting for RSTL while the shutdown is running? We don't hold RSTL for the whole shutdown process though. After shutdown releases the RSTL, can these uninterruptible DDL ops continue?

Are you saying that if they were uninterruptible then "mark any new operations interrupted on their creation" would not suffice for preventing new operations?

Speaking of new operations, according to mira.carey@mongodb.com, ops that come in after setKillAllOperations are after we've shutdown the transport layer, so there won't be any new user ops. Internal processes after setKillAllOperations are possible though.

Comment by Judah Schvimer [ 19/Mar/19 ]

If the DDL were not interruptible, then we would not be able to acquire the RSTL at all. Are you saying that if they were uninterruptible then "mark any new operations interrupted on their creation" would not suffice for preventing new operations?

Comment by Siyuan Zhou [ 19/Mar/19 ]

I feel I'm missing why RSTL is a barrier. If the DDL is not interruptible, can they continue after RSTL is released by shutdown?

Comment by Judah Schvimer [ 19/Mar/19 ]

I agree we don't need the loop or the reader/writer differentiation.

If we want to separate the killing of unprepared and the yielding of prepared transactions and move the latter to under RSTL, we also need to extend the scope of RSTL on shutdown, but up to where is still a question. Up to where we acquire the global lock?

This is the main change I'm proposing. siyuan.zhou, why would we need to extend the scope of the RSTL? This would be using the RSTL as a barrier since we cannot acquire it while operations hold locks and new operations cannot acquire locks as you pointed out.

Comment by Siyuan Zhou [ 18/Mar/19 ]

I don't see why we need to kill ops in a loop or treat readers and writers differently. Killing operations on shutdown will also mark any new operations interrupted on their creation. If we want to separate the killing of unprepared and the yielding of prepared transactions and move the latter to under RSTL, we also need to extend the scope of RSTL on shutdown, but up to where is still a question. Up to where we acquire the global lock?

For testing, I think adding one test case for the above scenario is a good idea to verify the mechanism is right. Then the concurrency tests sound a good solution to cover this kind of bugs, given their past coverage on subtle concurrency bugs.

Comment by Judah Schvimer [ 15/Mar/19 ]

This may get done anyways by SERVER-38994.

Comment by Judah Schvimer [ 15/Mar/19 ]

Per conversation with milkie, the race seems to come from the fact that we yield prepared transaction locks before acquiring the RSTL. Stepdown, in contrast, aborts unprepared transactions and then acquires the RSTL, and then yields prepared transaction locks. suganthi.mani points out that stepdown is a bit overly-complex for shutdown. It loops killing operations to avoid killing readers, but shutdown wants to kill readers and writers. If we kill all operations on shutdown, then we don't need this looping behavior. I think we can do two things on this ticket.
1) We can create the test above. (Required)
2) We can change the shutdown code to use the same (or similar but simpler) mechanism as stepdown. (Optional, and we can split this out once we reach agreement)

CC siyuan.zhou for any other thoughts.

Comment by Eric Milkie [ 15/Mar/19 ]

I'm not sure the test would have value unless we included all possible operations that could be blocked by the prepare, to detect when a programmer might change any of their lock acquisitions to Uninterruptible. Even if we could, the test would not age gracefully, as we could always add new operations in the future and forget to add them to the test.

How about this: Instead of trying to test for a situation that we hope never happens in the future, we could set the lock manager into a mode that would not allow any further lock requests to be granted after prepared transactions are aborted. That is the specific situation that we are trying to avoid here.

Comment by Judah Schvimer [ 15/Mar/19 ]

milkie, do you think the following test adds value?

  1. Start a transaction that inserts a document in collection foo
  2. Attempt to create an index on collection foo that should block on the transaction
  3. Prepare the transaction
  4. Shut down the node
  5. Restart the node and see if the index was created successfully

If index creation were uninterruptible, then it would succeed here after shutdown aborts prepared transactions. The index build would acquire the RSTL before shutdown and shutdown would be blocked on the index build to complete successfully.

Any DDL op would work for this test. One problem is that this somewhat only tests that we kill all operations and the interruptibility of the specific DDL op we choose for the test. All of this will also be tested in SERVER-39994 for all DDL ops, but in a less targeted way.

Comment by Eric Milkie [ 14/Mar/19 ]

I am having trouble imagining a test that would pass with today's code but fail if a DDL operation in the future hits an uninterruptible lock and would subsequently succeed at shutdown time, as per the ticket title. In particular, writing a test for detecting that some situation doesn't happen is tricky unless we have good traction on what the code logic would be when it does happen. I don't think we can proceed with this ticket until more details are added to the description.

Comment by Samyukta Lanka [ 11/Mar/19 ]

I think tests like this might be useful to make sure that someone doesn't make one of these operations uninterruptible in the future without realizing they could conflict with prepared transactions.

Comment by Judah Schvimer [ 11/Mar/19 ]

If they are interruptible and we interrupt them, how would they sneak in? Like what are we looking for?

Comment by Samyukta Lanka [ 11/Mar/19 ]

judah.schvimer we might want to go through all the places that had a conflicting UninterruptibleLockGuard and make sure those cannot sneak in during shutdown, but I'm not sure if you find that necessary.

Comment by Eric Milkie [ 11/Mar/19 ]

The first thing shutdown does is go through the stepdown command codepath; therefore, your test proposal could be simplified as "step down the node" instead of "shut down the node". I'm unconvinced that we need to write separate shutdown tests, but if we do, we should write the stepdown test first, and then simply substitute "shutdown" for "stepdown" at the appropriate point in the test.

Comment by Samyukta Lanka [ 11/Mar/19 ]

This came out of SERVER-38162. What I had in mind was making sure something like a dropCollection cannot succeed after a prepared transaction has been aborted during shutdown. For the test I imagined we could do something like the following: prepare a transaction, run a DDL operation like dropCollection and make sure that it is waiting for a lock held by the prepared transaction, shutdown the node, hang shutdown right after the transaction has been aborted, and make sure that the DDL operation has been killed.

Comment by Judah Schvimer [ 08/Mar/19 ]

I think this came out of SERVER-35817?

Comment by Judah Schvimer [ 08/Mar/19 ]

samy.lanka, can you please clarify what prepare work this is intended to test since there is no shutdown related linked ticket and how you expect the test to be structured?

Generated at Thu Feb 08 04:52:01 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.