[SERVER-38212] $out fails with duplicate _id key error Created: 20/Nov/18 Updated: 29/Jul/20 Resolved: 26/Nov/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Aggregation Framework |
| Affects Version/s: | 3.6.2 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Prashant Chaudhari | Assignee: | Danny Hatcher (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Operating System: | ALL | ||||
| Steps To Reproduce: | I have a collection with about a million records. The aggregation query explained above returns about 300k results which I am trying to dump into a collection with randomly generated name. This aggregation query fails with the below error: Mongo::error::operationfailure(insert for $out failed: { lastop: { ts: timestamp(1542714271, 9378), t: 39 }, connectionid: 242453, err: "e11000 duplicate key error collection: api smartquest co production.tmp.agg out.637144 index: id dup key: { : objectid('5bf2347a4b8a98775e4dbf95') }", code: 11000, codename: "duplicatekey", n: 0, ok: 1.0, operationtime: timestamp(1542714271, 9378), $clustertime: { clustertime: timestamp(1542714271, 9379), signature: { hash: bindata(0, 0000000000000000000000000000000000000000), keyid: 0 }} } (16996)) |
||||
| Participants: | |||||
| Description |
|
| Comments |
| Comment by Talles Airan [ 29/Jul/20 ] | ||||||||||||||||||||||||||||||||||||
|
This bug is happening in high load environments I have a system that accesses users, it happens that the moment it generates an objectId mongodb already generates another one, I have about 144 requests per second. I had to implement an objectId of mine from one of mongodb combined with the current timestamp including a random_bytes I made | ||||||||||||||||||||||||||||||||||||
| Comment by Danny Hatcher (Inactive) [ 12/Dec/18 ] | ||||||||||||||||||||||||||||||||||||
|
Hello Prashant, Yes, that is correct. Thank you, Danny | ||||||||||||||||||||||||||||||||||||
| Comment by Prashant Chaudhari [ 10/Dec/18 ] | ||||||||||||||||||||||||||||||||||||
|
I couldn't really correlate my use case with the Read Isolation mentioned in the docs. Are you suggesting that while the $out operation is in progress, other write operations affecting the same collection may interleave and the affect the result of the aggregation? | ||||||||||||||||||||||||||||||||||||
| Comment by Danny Hatcher (Inactive) [ 26/Nov/18 ] | ||||||||||||||||||||||||||||||||||||
|
Hello Prashant, I believe that you may be encountering one of the concepts within MongoDB's read isolation. As the aggregation is searching through the large collection to return results, it is possible that some documents are being returned multiple times. Because you are then attempting to insert those documents into a new collection using their original {{_id}}s, conflicts will occur as the same document would attempt to be inserted twice. I see that you have also posted this question on Stack Overflow. You mentioned in one of your comments there that this only happens sometimes. That helps support the above theory; only occasionally are writes causing your reads to "duplicate". Would it fit your business case to insert those documents without their original _id fields or by projecting that field to something else? That way a randomly-generated _id will be created for each document and you shouldn't encounter this error. As a question such as this is better suited to Stack Overflow and you have already asked the question there, I will close this ticket. Thank you, Danny | ||||||||||||||||||||||||||||||||||||
| Comment by Prashant Chaudhari [ 20/Nov/18 ] | ||||||||||||||||||||||||||||||||||||
|
Here is a mongo console log for the same query:
|