[SERVER-34919] Write conflict between batched inserts within transactions incorrectly throws DuplicateKey error Created: 09/May/18 Updated: 29/Oct/23 Resolved: 29/May/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Storage |
| Affects Version/s: | None |
| Fix Version/s: | 4.0.0-rc1, 4.1.1 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | William Schultz (Inactive) | Assignee: | Eric Milkie |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||
| Backport Requested: |
v4.0
|
||||||||||||
| Sprint: | Storage NYC 2018-06-04 | ||||||||||||
| Participants: | |||||||||||||
| Description |
|
When running two concurrent, multi-document transactions that try to insert document sets that have a non-empty intersection, one of the transactions should fail with a WriteConflict, since both transactions try to write to the same document. Instead, it appears that a DuplicateKey error is instead being thrown. See attached repro. This is the transaction history being tested in the repro script:
The first insert by T1 throws a DuplicateKey error. What is additionally odd is that the DuplicateKey error appears to be thrown on _id: 1, when we would expect the WriteConflict error to be thrown on _id: 2. e.g.
|
| Comments |
| Comment by Githook User [ 29/May/18 ] | ||||||||
|
Author: {'username': 'milkie', 'name': 'Eric Milkie', 'email': 'milkie@10gen.com'}Message: (cherry picked from commit 3312ff09502ceb92d93f65f92d4e823df993a927) | ||||||||
| Comment by Githook User [ 29/May/18 ] | ||||||||
|
Author: {'username': 'milkie', 'name': 'Eric Milkie', 'email': 'milkie@10gen.com'}Message: | ||||||||
| Comment by Eric Milkie [ 24/May/18 ] | ||||||||
|
I figured this out. It has to do with the way we handle WCE when doing a vectored insert. On WCE, instead of retrying the entire vector insert, we instead devolve into doing individual inserts, and retrying individual inserts if we get subsequent WCE's. | ||||||||
| Comment by William Schultz (Inactive) [ 11/May/18 ] | ||||||||
|
Here are a few additional data points, including a more minimal test case. If we replace t1Op and t2Op in the repro we get the following results: Case 1
Result:
Case 2
Produces expected WriteConflict correctly. | ||||||||
| Comment by William Schultz (Inactive) [ 11/May/18 ] | ||||||||
|
Ah, yes, that's definitely very odd! Sorry I sort of missed that in the diagnosis. Added it to the description. | ||||||||
| Comment by Eric Milkie [ 11/May/18 ] | ||||||||
|
What's even weirder is the key value the conflict is on (this wasn't shown in the description). When I run Will's repro, the Duplicate Key error indicates the duplicate key is from document d1, not d2! | ||||||||
| Comment by Spencer Brody (Inactive) [ 09/May/18 ] | ||||||||
|
Ah, I see, thanks for clarifying. That does sound like a bug then, especially if the behavior is different between batch and single-inserts. This also seems like a storage (or maybe query?) bug, so I'm going to leave this assigned to the storage backlog for now. milkie, let me know if you think this should be picked up by the repl transactions team. | ||||||||
| Comment by William Schultz (Inactive) [ 09/May/18 ] | ||||||||
|
milkie No. The problem goes away (WriteConflict is thrown instead of a DuplicateKey error) when doing single document inserts, like the example you gave. | ||||||||
| Comment by Eric Milkie [ 09/May/18 ] | ||||||||
|
Does the same problem occur without using vectored inserts? That is, doing this:
I'm curious to know how vectored inserts are involved here. | ||||||||
| Comment by William Schultz (Inactive) [ 09/May/18 ] | ||||||||
|
spencer By "second transaction" I assume you mean T1. Since T1 starts before T2 commits, it should execute against a snapshot that doesn't see any of T2's writes. So why would you expect it to produce a DuplicateKey error when it writes to d2, if d2 doesn't exist at its read timestamp? T1 and T2 are concurrent transactions that both try to write to the same document (d2), which should produce a WriteConflict, as far as I understand it. For reference, this case does produce a WriteConflict when each transaction inserts only a single (conflicting) document. So I don't see why it shouldn't when they both do batch inserts. | ||||||||
| Comment by Spencer Brody (Inactive) [ 09/May/18 ] | ||||||||
|
william.schultz I don't follow why this is wrong. The second transaction gets a duplicate key error from the duplicate _id value. That seems reasonable to me. |