[SERVER-41766] Secondary may encounter prepare conflict when applying write that sets the multikey flag Created: 14/Jun/19  Updated: 29/Oct/23  Resolved: 27/Jun/19

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 4.2.0-rc0
Fix Version/s: 4.2.0-rc3, 4.3.1

Type: Bug Priority: Major - P3
Reporter: William Schultz (Inactive) Assignee: William Schultz (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File multikey_prepare_txns.js    
Issue Links:
Backports
Depends
Related
related to SERVER-34774 Converting an index to multikey is no... Closed
related to SERVER-41988 Ignore prepare conflicts during secon... Closed
related to SERVER-46229 setting multikey on index fails if in... Closed
related to SERVER-56877 insert operations may fail to set ind... Closed
related to SERVER-59842 Setting wildcard index as multikey ca... Closed
is related to SERVER-41848 minimum_visible_with_cluster_time.js ... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.2
Sprint: Repl 2019-07-01
Participants:
Linked BF Score: 19

 Description   

There is a divergence around how we deal with setting the multikey flag on primary versus secondary. On primary, we go through this codepath, and we register an onCommit handler on the OperationContext to switch the in-memory multikey bit to true, which means we won't set it until the prepared transaction commits. So, that allows subsequent writes to try updating the multikey flag again, since they see that the bit is not set. These writes would hit a prepare conflict on the collection catalog entry document and block, which is OK. On secondary, however, when we apply a prepared transaction, we appear to set the in-memory multikey bit immediately, instead of waiting for the prepared transaction to commit, since the OperationContext we use here doesn't appear to be correlated to the actual transaction we applied and the WUOW commits right away. Because of this behavior divergence, the following scenario, with two nodes, n0 and n1, is possible:

  1. A primary node (n0) prepares a transaction that did an insert that set the multikey flag. It does not update the in-memory multikey bit since the transaction has not committed yet.
  2. The secondary (n1), applies the prepare operation that the primary just received. It does set the in-memory multikey bit because of the behavior outlined above.
  3. Node n1 steps up and becomes primary, n0 steps down and becomes secondary.
  4. The new primary, n1, receives an insert operation that tries to set the multikey flag again. Because the in-memory bit has already been set, it does not try to write the flag again, and doesn't encounter a prepare conflict.
  5. The current secondary, n0, tries to apply the insert operation that n1 just executed. Since n0 did not previously set the in-memory multikey bit, it tries to write the flag again, but it will hit a prepare conflict, since the transaction started in step (1) is still prepared. The secondary is now stuck in oplog application, unable to proceed.

The attached repro ( multikey_prepare_txns.js ) demonstrates this scenario. One possible suggestion to fix this issue is to do the multikey update in a side transaction on primary, so that it would not generate prepare conflicts at all. It should generally be safe to set the multikey flag on a collection earlier than is necessary, and it likely doesn't need to be atomic with respect to the original transaction that did the write.



 Comments   
Comment by Githook User [ 12/Aug/19 ]

Author:

{'name': 'William Schultz', 'email': 'william.schultz@mongodb.com', 'username': 'will62794'}

Message: SERVER-41766 Remove obsolete code for tracking multikey writes inside multi-document transactions

Now that we update the multikey flag within a transaction in a side transaction, the in-memory state about the multikey write will be naturally visible to subsequent writes inside the transaction, so we don't need to keep around any extra structures to enforce this anymore.
Branch: master
https://github.com/mongodb/mongo/commit/20ba91db04c0b7b3d10fe2527b6938b1a14fcaa6

Comment by Githook User [ 02/Jul/19 ]

Author:

{'name': 'William Schultz', 'username': 'will62794', 'email': 'william.schultz@mongodb.com'}

Message: SERVER-41766 Update the on-disk multikey flag in a side transaction when running inside a multi-document transaction

When a write inside a multi-document transaction needs to set an index as multikey, we update the multikey flag in the on-disk catalog in a transaction separate from the parent transaction. We commit this side transaction immediately, so as to avoid the catalog write generating prepare conflicts if it was written as part of a parent transaction that later became prepared. In general, it is safe to set an index as multikey too early. The multikey write is timestamped at the most recent value of the LogicalClock.

(cherry picked from commit 3147f5e1c37546b817934ef892d5e353170a9935)
Branch: v4.2
https://github.com/mongodb/mongo/commit/e549b451eb4ba9aa11f79a2356061f5dab6c943b

Comment by William Schultz (Inactive) [ 01/Jul/19 ]

Additional notes from discussion on this ticket:

Within a multi-document transaction, reads should be able to see the effect of previous writes done within that transaction. If a previous write in a transaction has set the index to be multikey, then a subsequent read must know that fact in order to return correct results. This is true in general for multikey writes. After this change, we update the multikey flag in memory and on-disk as soon as we do such a write inside a transaction. We do not wait for the transaction to commit. Thus, transactions should be able to read their own multikey writes correctly, without any extra logic.

Comment by Githook User [ 27/Jun/19 ]

Author:

{'name': 'William Schultz', 'email': 'william.schultz@mongodb.com', 'username': 'will62794'}

Message: SERVER-41766 Update the on-disk multikey flag in a side transaction when running inside a multi-document transaction

When a write inside a multi-document transaction needs to set an index as multikey, we update the multikey flag in the on-disk catalog in a transaction separate from the parent transaction. We commit this side transaction immediately, so as to avoid the catalog write generating prepare conflicts if it was written as part of a parent transaction that later became prepared. In general, it is safe to set an index as multikey too early. The multikey write is timestamped at the most recent value of the LogicalClock.
Branch: master
https://github.com/mongodb/mongo/commit/3147f5e1c37546b817934ef892d5e353170a9935

Generated at Thu Feb 08 04:58:36 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.