[SERVER-72127] Upsert behaving differently on clustered collections compared with regular Created: 14/Dec/22 Updated: 27/Oct/23 Resolved: 22/Mar/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Henrik Edin | Assignee: | Jordi Olivares Provencio |
| Resolution: | Works as Designed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Assigned Teams: |
Storage Execution
|
||||||||
| Sprint: | Execution Team 2023-02-20, Execution Team 2023-02-06, Execution Team 2023-03-06, Execution Team 2023-03-20, Execution Team 2023-04-03 | ||||||||
| Participants: | |||||||||
| Case: | (copied to CRM) | ||||||||
| Description |
|
When upsert fallback to insert it can cause a DuplicateKey error thrown to the user if there's another concurrent insert/upsert on the same key when the collection is clustered. Should upsert retry the update when encountering DuplicateKey? |
| Comments |
| Comment by Louis Williams [ 22/Mar/23 ] |
|
jordi.olivares-provencio@mongodb.com thanks for the explanation here. I think it would be safe to close this as "Works as Designed". |
| Comment by Jordi Olivares Provencio [ 14/Mar/23 ] |
|
A clustered collection inevitably returns a DuplicateKey error that emulates having a unique _id index if there is a duplicate record conflict. This means that the retry mechanism already is agnostic to the collection being clustered or not. In fact, there is no difference from the retry logic point of view between the two of them. Accordingly, the retry logic as described by If we were to treat the upsert as a logical transaction that fails with a WriteConflictException it would be equivalent to retrying the upsert by default as suggested by this ticket. This however contradicts with the previous ticket, as we are taking away the application's opportunity to decide what to do with this error. As such, it would potentially be a breaking change of current behaviour to existing users. For example, in the case of collections with only an _id unique index, an upsert with {{query: {_id: <key>, field1: <value1>} }} would go from returning a duplicate key error to potentially retrying it. Thus we could break an application's logic. henrik.edin@mongodb.com The scenario you describe here is already the case if the user provides an _id query, so the scenario you describe in the comment is already the case today. Can you clarify what scenario you found were the behaviour of a normal collection versus a clustered one is different? |
| Comment by Louis Williams [ 16/Dec/22 ] |
|
henrik.edin@mongodb.com you raise a good point. We actually already have code the retries upserts on DuplicateKey errors for that reason. So we could look into modifying that code to handle clustered indexes. I had actually assumed that upsert performs the update an insert attempt in the same WUOW, but after looking at the code, we do not. If we did, then the case you described should not happen because the storage engine would just generate a write conflict. I actually think we could eliminate the need for any DuplicateKey retry handling (see |
| Comment by Henrik Edin [ 15/Dec/22 ] |
|
Is that really correct louis.williams@mongodb.com? If you are doing increments or something like that it doesn't seem correct to just overwrite and report both operations as succeeded. If we instead translate DuplicateKey in this case as WCE I believe the retry would do what is expected. |
| Comment by Louis Williams [ 15/Dec/22 ] |
|
We should probably use "overwrite=true" when upserting records into clustered collections, rather than overwrite=false for all operations. |