[SERVER-63258] Resolve inconsistency around the write to the findAndModify image collection for prepared internal transactions Created: 03/Feb/22  Updated: 06/Dec/22  Resolved: 14/Feb/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Cheahuychou Mao Assignee: [DO NOT USE] Backlog - Sharding NYC
Resolution: Won't Do Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-63633 Remove TODO listed in SERVER-63258 Closed
is related to SERVER-63071 [Retryability] Prepared internal tran... Closed
is related to SERVER-62785 Write change stream pre-images in the... Closed
Assigned Teams:
Sharding NYC
Participants:

 Description   

For prepared internal transactions for retryable findAndModify, the pre/post image is written to the image collection at prepare time. On the primary the write is done in a side storage engine transaction, whereas on secondaries the write is done in the prepared transaction's storage engine transaction. This has caused the primary and secondaries to have inconsistent behaviors:

  • On nodes that are secondaries when the transaction enters prepare, the config.image_collection IX lock is held along with other locks acquired for the transaction until the transaction commits or aborts. So if there is failover, step up can hang (to be solved in SERVER-63071).
  • If the transaction aborts after prepare, the image collection on the primary is expected to be inconsistent with the image collection on secondaries. The reason is that when the transaction aborts, the write to image collection only gets rolled back on secondaries.

To solve this, there are two options:

  1. Make secondaries also write to the image collection in a side storage engine transaction. One challenge here is to determine what timestamp the storage engine transaction should use.
  2. Make primary write the image collection in the transaction’s storage transaction. This would require flipping the order in TransactionParticipant to write the applyOps oplog entries before putting the transaction’s storage transaction into prepare. It is unclear if this would be safe. 


 Comments   
Comment by Githook User [ 11/Mar/22 ]

Author:

{'name': 'Cheahuychou Mao', 'email': 'mao.cheahuychou@gmail.com', 'username': 'cheahuychou'}

Message: SERVER-63633 Remove TODO listed in SERVER-63258
Branch: master
https://github.com/mongodb/mongo/commit/55491b29a47c33aa6875650f4e0a23a831b9cf8a

Comment by Jack Mulrow [ 14/Feb/22 ]

For more context, we don't believe the current behavior is a bug, just hard to reason about, and we don't believe we'll have to change this code in the near future, so addressing this isn't worth the effort at this time.

Comment by Ratika Gandhi [ 14/Feb/22 ]

mindaugas.malinauskas, we are closing this ticket but if Query team finds that we need it let us know and we'll revisit. Thanks! 

Generated at Thu Feb 08 05:57:18 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.