[SERVER-83537] Investigate removal of side transaction for multikey Created: 22/Nov/23  Updated: 20/Dec/23  Resolved: 20/Dec/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Jordi Olivares Provencio Assignee: Jordi Olivares Provencio
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
Assigned Teams:
Catalog and Routing
Sprint: CAR Team 2023-11-27, CAR Team 2023-12-11, CAR Team 2023-12-25
Participants:
Linked BF Score: 3
Story Points: 3

 Description   

During recovery of a node we perform oplog replay in order to reconstruct the node. As part of this, we reconstruct the prepared transactions so that we might commit them later.

If at this point the transaction should set the multikey flag to be true we might encounter an error with WT in the following case:

  • The internal durable timestamp for the catalog page advances to T=2
  • The prepared transaction at time T=1 attempts to set the multikey flag using a side transaction
  • The side transaction fails to commit due to WT returning an error to avoid data inconsistencies since we're writing back in time, thus potentially invalidating all values from that point onwards since they could've read a stale value.

Ideally we should try to avoid doing a side transaction and just accumulate the change with the original recovery unit. This would lead to prepare conflicts until the operation commits which is what's happening with the primary now. On secondaries however, this operation isn't symmetric as explained by SERVER-41766.

We should investigate whether it is safe to remove the side transaction or if the write is safe to do without concerns for potential data inconsistencies.



 Comments   
Comment by Jordi Olivares Provencio [ 20/Dec/23 ]

As a side transaction is deemed necessary for secondary replication the next best thing that we could do is to modify the side transaction to actually force the index to become multikey. However, in the absence of a specific new command or a new collMod option the most compatible form without implying an FCV change is to force an implicit multikey setting by inserting and then deleting a special document that enables the flag. This would functionally be a no-op but would trigger the multikey flag setting during oplog replication/recovery.

The problem with this approach is that we can't generate such document with confidence that it won't cause issues. If the index is specified with {{

{unique: true}

}} then we must synthetically produce a value that doesn't exist in the entire collection. Failure to do so would cause secondary replication to fail.

As we want to refactor multikey in the future to not be as special with explicit replication via the oplog I'm closing this ticket as Won't Fix. Fixing this would require a very large effort that would have to be scrapped altogether when we refactor multikey.

Comment by Jordi Olivares Provencio [ 13/Dec/23 ]

Not using the side-transaction isn't as safe as we would wish. Suppose we have a multi-document transaction that has prepared and changed the multikey metadata. Any operation that comes afterwards that does an insert and modifies the multikey metadata will get a prepare conflict until the transaction commits. This effectively deadlocks the server.

Note that this is an issue with secondary replication since it would effectively stop replication from making forward progress. The secondary would prepare the transaction and then impede the applier thread from making forward progress.

Generated at Thu Feb 08 06:52:29 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.