[SERVER-48002] Do not enforce DataCorruptionDetected assertion when ignoring prepare conflicts Created: 07/May/20  Updated: 29/Oct/23  Resolved: 19/Nov/20

Status: Closed
Project: Core Server
Component/s: Querying, Storage
Affects Version/s: None
Fix Version/s: 4.9.0, 4.4.4

Type: Bug Priority: Major - P3
Reporter: Cheahuychou Mao Assignee: Louis Williams
Resolution: Fixed Votes: 0
Labels: qexec-team
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Problem/Incident
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.4
Sprint: Execution Team 2020-11-16, Execution Team 2020-11-30
Participants:
Linked BF Score: 34

 Description   

Snapshot isolation cannot be guaranteed for operations that ignore prepare conflicts. This means that two reads of the same record in the same snapshot can return different results. In practice, the assertion added by SERVER-40620 fails if an index points to a non-existent record. This may return false positives if a record is fetched in the collection after a prepared transaction that deletes that record commits.

We cannot enforce the DataCorruptionDetected assertion if we are also ignoring prepare conflicts, or we will continue to get false positives.

Original description:

SERVER-37364 made the transaction coordinator return the decision as soon as the decision is persisted. As a result, the client is no longer guaranteed to see the effects of the transaction immediately after the commitTransaction returns. Based on this evergreen patch, a find command that is run immediately after an update shard key operation (but before the transaction finishes committing in the background) can fail with DataCorruptionDetected error (added in SERVER-40620).



 Comments   
Comment by Githook User [ 17/Dec/20 ]

Author:

{'name': 'Louis Williams', 'email': 'louis.williams@mongodb.com', 'username': 'louiswilliams'}

Message: SERVER-48002 Do not enforce DataCorruptionDetected assertion when ignoring prepare conflicts

Snapshot isolation cannot be guaranteed for operations that ignore
prepare conflicts. This means that two reads of the same record in the
same snapshot can return different results. In practice, this can lead
to false positive DataCorrutionDetected assertions.

(cherry picked from commit a01be7ed8e475775ebec57dd6291c3cf5cd33ccf)
Branch: v4.4
https://github.com/mongodb/mongo/commit/859ecb733b5e057a3a27283302a6b01ef98ad00f

Comment by Louis Williams [ 19/Nov/20 ]

Commit: https://github.com/mongodb/mongo/commit/a01be7ed8e475775ebec57dd6291c3cf5cd33ccf

Comment by Louis Williams [ 16/Nov/20 ]

I had to revert my patch because it broke the test designed to exercise this assertion. Most reads ignore prepare conflicts. By making this change, we will completely lose the ability to detect this type of inconsistency for reads, with the exception of multi-doc transactions.

This assertion has proven to be somewhat valuable in identifying snapshot isolation bugs in 4.4, but not index consistencies. 

I am considering the following:

  • Accept that this assertion will not be able to detect inconsistencies for reads in favor of reducing false positives.
  • Only enforce the assertion if no prepared transactions have committed or aborted while a snapshot was open. This would likely require modifying SnapshotId to include a generational counter for prepared transactions. For operations that ignore prepared transactions, a generational counter in SnapshotId will indicate whether any prepared transactions have committed or aborted. We have most of this infrastructure already to handle prepare conflicts, so we could potentially repurpose it to continue enforcing this assertion.
Comment by Githook User [ 11/Nov/20 ]

Author:

{'name': 'Louis Williams', 'email': 'louis.williams@mongodb.com', 'username': 'louiswilliams'}

Message: Revert "SERVER-48002 Do not enforce DataCorruptionDetected assertion when ignoring prepare conflicts"

This reverts commit 523247d096a796c15c911370e622a3614411a25b.
Branch: master
https://github.com/mongodb/mongo/commit/f5ad04df2255518e16048e09dcf2baeff64bd7a8

Comment by Githook User [ 11/Nov/20 ]

Author:

{'name': 'Louis Williams', 'email': 'louis.williams@mongodb.com', 'username': 'louiswilliams'}

Message: SERVER-48002 Do not enforce DataCorruptionDetected assertion when ignoring prepare conflicts

Snapshot isolation cannot be guaranteed for operations that ignore
prepare conflicts. This means that two reads of the same record in the
same snapshot can return different results. In practice, this can lead
to false positive DataCorrutionDetected assertions.
Branch: master
https://github.com/mongodb/mongo/commit/523247d096a796c15c911370e622a3614411a25b

Comment by Louis Williams [ 03/Nov/20 ]

Thanks, justin.seyster for the repro. I think I can see what is happening. A read is ignoring prepare conflicts (default behavior) and is racing with a transaction deleting the document such that it sees the index entry but not the record.

Consider the following example with an index on "x":

  • Assume the following existing state:
    • Collection: RID(1) --> {x: 100}
    • Index: 100 --> RID(1)
  • The following operations take place in a transaction and are prepared:
    • Delete key 100 from index
    • Delete RID(1) from collection
  • Start an un-timestamped read that ignores prepare conflicts. The reader observes the key 100 in the index because it ignores the prepared deletion
  • The prepared transaction commits
  • The reader looks up RID(1) in the collection, but finds that it is missing

In general, if an operation ignores prepare conflicts, it cannot guarantee snapshot isolation. This example demonstrates that behavior. If we had not returned an error in this case (behavior before SERVER-40620), we would have just returned nothing, and a user would not notice any problems.

The first solution that comes to mind is to not check this assertion for operations that ignore prepare conflicts. Most reads ignore prepare conflicts, however, so most reads will end up never checking this assertion.

Comment by Justin Seyster [ 17/Jun/20 ]

connie.chen Sure, I just ran this on ToT (ec679c6c), and it failed with DataCorruptionDetected after about an hour of repeating.

Comment by Connie Chen [ 15/Jun/20 ]

justin.seyster would you be able to re-run the torture test on the current code? Storage Engines has made quite a few changes in the last 3 weeks so we want to make sure it's still a problem. 

Comment by Justin Seyster [ 27/May/20 ]

Thanks for filing this! After investigating, I don't think that the DataCorruptionDetected uassert() added by SERVER-40620 is observing expected behavior. To me, this looks like a potentially serious data inconsistency error. Unfortunately, I cannot reproduce it, even after running the test (move_chunk_update_shard_key_in_retryable_write.js) overnight. I also wrote a torture test to designed to subject a single replica set to the conditions that the failing shard would have experienced, but that also failed to reproduce the DataCorruptionDetected error.

From the perspective of the failing shard, it gets a connection from mongoS that starts a multi-document transaction, adds a new document ({x: 1}) via upsert, prepares the transaction, and then commits it. A separate connection from the same monogS queries the collection for the {x: 1} document. The commitTransaction command and the query happen at almost exactly the same time and may be executing concurrently in the mongoD.

[js_test:move_chunk_update_shard_key_in_retryable_write] 2020-05-07T05:09:24.004+0000 d20023| 2020-05-07T05:09:24.000+00:00 D3 TXN      20507   [conn55] "Received commitTransaction for transaction with txnNumber {opCtx_getTxnNumber} on session {opCtx_getLogicalSessionId}","attr":{"opCtx_getTxnNumber":35,"opCtx_getLogicalSessionId":{"id":{"$uuid":"34adc0a1-f500-4f55-907b-4e869fe4fc5e"},"uid":{"$binary":{"base64":"47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=","subType":"0"}}}}
[js_test:move_chunk_update_shard_key_in_retryable_write] 2020-05-07T05:09:24.004+0000 d20023| 2020-05-07T05:09:24.003+00:00 E  QUERY    4615603 [conn57] "Erroneous index key found with reference to non-existent record id {recordId}: {indexKeyData}. Consider dropping and then re-creating the index with key pattern {indexKeyPattern} and then running the validate command on the collection.","attr":{"recordId":{"RecordId":1},"indexKeyData":[{"key":{"":0},"pattern":{"x":1}}],"indexKeyPattern":{"x":1}}

Because the commitTransaction and the query are concurrent, it is expected behavior that the query could return 0 or 1 documents, depending on how exactly the two commands are scheduled.

However, I do not believe it is expected behavior for the query to observe that the index contains the document {x: 1} but the collection does not (i.e., the collection does not have a record matching the index entry's record id), even if the query happens while while the document is part of a prepared transaction that is in the process of committing. That's what the DataCorruptionDetected uassert() shows is happening, which is why I'm a bit worried about it. This needs some more investigation, but I'm not sure what the next steps are.

Generated at Thu Feb 08 05:15:50 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.