[SERVER-42250] `readConcern: snapshot` cause many write conflicts Created: 16/Jul/19  Updated: 11/Dec/19  Resolved: 10/Dec/19

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: None

Type: Question Priority: Major - P3
Reporter: wei Assignee: Siyuan Zhou
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Sprint: Repl 2019-08-12, Repl 2019-08-26, Repl 2019-09-09, Repl 2019-09-23, Repl 2019-10-07, Repl 2019-10-21, Repl 2019-11-04, Repl 2019-11-18, Repl 2019-12-02, Repl 2019-12-16
Participants:

 Description   

Coming from SERVER-39672.

In one our transaction performance test on mongodb 4.0.9, we saw many write conflict errors and the write conflict error does not indicate which document it is conflicting on.

The environment is only running with one primary mongodb 4.0.9 with no secondary nodes in one machine.

  • we tested in mongodb 4.0.10 later and everything passed
  • then, we switched to `readConcern: local' in 4.0.9 and everything passed too
  • our job mostly just does append operations and does not have any real write conflicts so that seeing many write conflicts is strange to us

We are curious why changing `readConcern` levels can change write conflict behavior. In theory, they should behave the same and it is more about timing. Or, the server should throw a different error.

As the production system will use `readConcern: snapshot`, we are worried whether we might see this issue back in production?

Can we get an explanation what is the relationship between readConcern and write conflict errors?

Thanks,

 



 Comments   
Comment by Siyuan Zhou [ 10/Dec/19 ]

Sorry for this late response on why both "snapshot" and the default "local" read concerns execute speculatively but still behave differently in terms of write conflicts.

The "snapshot" read concern reads from the all-committed snapshot, which makes sure the concurrent writes are all either committed or aborted, so that the snapshot reflects a point-in-time in the order of cluster timestamp. In contract, the "local" read concern doesn't care about the concurrent writes and allows "holes" (for those concurrent writes) before its read timestamp. In fact, the "local" read concern can read the latest version of committed data regardless of the data's timestamp. The all-committed snapshot may be a little staler than the latest timestamp due to the holes. That's the reason why the "snapshot" read concern may see more write conflicts than the "local" read concern. They gave slightly different isolation guarantees and read from different snapshots.

As you may have noticed, there was a bug SERVER-39672 in that the default read concern was effectively "snapshot" rather than "local" by mistake in 4.0.9. After changing the default to "local", the write conflicts should be reduced in 4.0.10.

Since the underlying issue has been fixed, I'm closing this ticket as "Gone Away".

Comment by Wei Li [ 26/Aug/19 ]

alyson.cabral,

Thanks for the answer!

Yes, in my view, the dramatic difference is strange to me. Still, I wonder whether it is related to the configuration that the primary is running locally without any secondary nodes attached. In production, we will run with different configuration. Still, the behavior change in 4.0.9 to 4.0.10 is still intriguing.

As for write conflict error, I wonder whether mongo can report the corresponding conflicting documents. The issue is that client knows which conflicts are expected and which conflicts are not expected. When the write concern error is caught on client side, client can not determine whether it is expected behavior or not.

Thanks for the explanation of inventory. Yes, I understand this is a perfect example of snapshot consistency. Under the situation, when the write concern error is thrown out, I hope that the write concern error indicate that the inventory key is causing the issue. In this way, it is much easier for clients to figure out whether there are other possible issues on clients side.

Thanks,

 

Comment by Alyson Cabral (Inactive) [ 23/Aug/19 ]

Hi wei,

Sorry about the delay on this! We typically handle these sorts of questions through our support channel and do not reserve a lot of server engineering time to quickly respond to requests for more behavioral information.  

A couple of quick things I'd like to clarify. In 4.0, the only difference between the readConcern values is the snapshot time selected. However similar to 'local', both 'majority' and 'snapshot' speculatively execute on a snapshot that is not durable and wait for durability at commit time. To guarantee that everything you read was durable, you'll need to use readConcern 'snapshot' in tandem with w:majority in order to get snapshot guarantees. That being said, I'm surprised that you are seeing dramatically different behavior between the isolation levels as I would expect them to be choosing very similar snapshot times, given the speculative behavior. I'm confirming with the team that my surprise is warranted, and I'll keep you posted. 

Another clarifying point is that we define a write conflict at the document level. Any write to a document after your transaction snapshot time, and thus not reflected in the transaction snapshot, will conflict with your transactions writes to the same document. It doesn't matter if the updates logically append and are conflict-free.  We surface write conflicts with a 'TransientTransactionError' label which I recommend you retry on as this also captures network errors and elections. We introduced an API in 4.2 that does this retrying automatically for you. This is listed under the callback API in that docs page.

To illustrate why we made this choice of write conflict behavior, here is an example. 

Let's say I have a transaction with two statements:
statement 1: read the inventory value
statement 2: update the inventory value (without doing an $inc but with a $set)
statement 3: insert order document

Let's throw some values in there. First, the application may read an inventory value of 63 then, based on the order, need to update that value to 60 as the order contained three of the items. However, concurrently another transaction committed to that document and updated the inventory value, let's say to 62. This value of 62 is not reflected in the new transactions snapshot time. If the server were to hide a write conflict and both transactions were successful I'd be left with an inventory value of 60 instead of the correct value of 59. We want the reads and writes to be on the same timeline. 

Aly

Comment by Wei Li [ 22/Aug/19 ]

This has been a while and can we get an update to the problem?

Comment by wei [ 16/Jul/19 ]

My description of the testing environment might be confusing. The setup is:

  • one machine running primary mongodb 4.0.9
  • no secondaries
Generated at Thu Feb 08 04:59:59 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.