[SERVER-14322] Retry on predicate unique index violations of update + upsert -> insert when possible Created: 20/Jun/14  Updated: 04/Dec/23  Resolved: 29/Jan/19

Status: Closed
Project: Core Server
Component/s: Write Ops
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Scott Hernandez (Inactive) Assignee: Backlog - Query Team (Inactive)
Resolution: Duplicate Votes: 117
Labels: storch
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Duplicate
duplicates SERVER-37124 Retry full upsert path when duplicate... Closed
is duplicated by SERVER-22607 Duplicate key error on replace with u... Closed
is duplicated by SERVER-10350 Atomicity of Upserts on multi field u... Closed
is duplicated by SERVER-16905 Concurrent upserts to the same non-ex... Closed
is duplicated by SERVER-19600 Upsert results in E11000 duplicate ke... Closed
is duplicated by SERVER-20022 Upsert errors with "duplicate key error" Closed
is duplicated by SERVER-20158 Errors during findAndModify runs on M... Closed
is duplicated by SERVER-20380 Upsert is not atomic when inserting Closed
is duplicated by SERVER-21006 duplicate key error on upsert Closed
is duplicated by SERVER-22988 MongoDB Concurrency Lock Failed for M... Closed
is duplicated by SERVER-36763 refreshLogicalSessionCacheNow can err... Closed
is duplicated by SERVER-18784 Errors during concurrent update/upser... Closed
is duplicated by SERVER-21948 DuplicateKeyError during upsert Closed
is duplicated by SERVER-16261 Make insert on update (upsert: true) ... Closed
Related
related to SERVER-44980 wrong update result with upsert:true Closed
related to PYTHON-1510 db.collection.save() cause Duplicate... Closed
related to DOCS-12234 Docs for SERVER-37124: Retry full ups... Closed
is related to SERVER-19694 Distributed lock should retry on dupl... Closed
is related to SERVER-59758 findAndModify with upsert=true genera... Closed
is related to SERVER-72028 E11000 duplicate key error collection... Closed
is related to SERVER-72127 Upsert behaving differently on cluste... Closed
is related to SERVER-47212 Retry full upsert path when duplicate... Closed
Assigned Teams:
Query
Participants:
Case:

 Description   
Issue Status as of Jan 29, 2019

ISSUE DESCRIPTION AND IMPACT

During an update with upsert:true option, two (or more) threads may attempt an upsert operation using the same query predicate and, upon not finding a match, the threads will attempt to insert a new document. Both inserts will (and should) succeed, unless the second causes a unique constraint violation. This MongoDB behavior is documented here.

It is expected that the client will take appropriate action upon detection of such constraint violation. Appropriate action may vary depending on individual application requirements.

IMPROVEMENT DESCRIPTION

As part of SERVER-37124, fixed in versions 4.1.6 and later, the server will automatically retry performing the upsert in cases where it is detectably safe to do so. It is safe for the server to retry when the following conditions hold:

  • The operation has the upsert:true flag.
  • The operation is not a multi-update.
  • The upsert failed with the DuplicateKey error code due to a unique index with key pattern kp.
  • The match condition is either a single equality predicate, or a logical AND of equality predicates.
  • The set of fields over which there is an equality predicate equals the set of fields in the index key pattern kp.
  • The fields in the match condition are not mutated by the update modifiers.

The following table contains examples of upsert operations that the server can automatically retry on DuplicateKey error, and examples where the server cannot automatically retry:

   kp                                                              update                     server will retry
{_id: 1} db.coll.update({_id:5}, {$set:{a:5}}, {upsert:true})     yes
{a:1} db.coll.update({a:{$in:[5]}}, {$set:{b:5}}, {upsert:true})       yes 
{a:1,b:1} db.coll.update({a:5,b:7},{$set:{c:5}}, {upsert:true})       yes
{a:1,b:1} db.coll.update({a:5,b:7},{$set:{a:7,c:5}},{upsert:true})       no, since a is in kp and is mutated 
{a:1} db.coll.update({_id:5},{$set:{b:6}}, {upsert:true})       no, since the predicate does not contain a 
{_id:1} db.coll.update({_id:5,v:6},{$set:{a:5,b:6}}, {upsert:true})       no, since v is not in kp 
{a:1} db.coll.update({a:{$lt:5}},{$set:{b:5}}, {upsert:true})      no, since the predicate on a is not an equality 

This ticket will now be resolved as duplicate of SERVER-37124 since we don't believe there is further work that we can do on the server and the application must correctly determine whether the operation that failed can be safely retried again or not.

VERSIONS
This improvement in SERVER-37124 will be included in the 4.2 production release

Original description

It is possible that two updates come in with upsert:true, resulting in neither finding a document and both inserting new documents which conflict on unique index violations of the query predicate. In this case it is possible for the server to retry any failed updates.

Currently clients can retry the update to get the same behavior which is expected from the server.

This affects both updates and findAndModify. See SERVER-10350 for slightly more background.



 Comments   
Comment by James Wahlin [ 21/Oct/22 ]

vickyrddy97@gmail.com this fix is not tied to FCV version and will be available under FCV 4.4 with the 5.0 binary.

Comment by Vishnukanth Reddy Boreddy [ 21/Oct/22 ]

@James Wahlin Do we need to have FCV also updated to 5.0 for the fix to work or just DB upgrade to 5.0 and FCV at 4.4 would also work?

Comment by Gaël Jourdan-Weil [ 28/Sep/21 ]

Thanks @James Wahlin for the notice

Comment by James Wahlin [ 27/Sep/21 ]

SERVER-47212 tracks this issue for the findAndModify command. The fix is currently available in MongoDB 5.0, but was deemed as risky to backport to 4.4 or earlier.

Comment by Jordi Salvat i Alabart [ 27/Sep/21 ]

Same here. I'll open a separate ticket.

Comment by Kuan-Ru Fu [ 10/Sep/21 ]

I also get the same problem with `findAndModify` on mongodb 4.2. Does this improvement already on the mongodb 4.2?

Comment by Gaël Jourdan-Weil [ 02/Sep/21 ]

Is the auto retry mechanism applied in case of `findAndModify`?

I'm using Mongo 4.2 and experiencing the error with `findAndModify` operations even though my query match all the conditions for auto retry.

Comment by Arthur Darcet [ 14/Jun/18 ]

@Andy Schwerin Just to be sure I understand this correctly: Why isn't it enough to check if the operation tried an insert + the duplicate document that triggered the exception matches the predicate that was sent?

 

 

Comment by Andy Schwerin [ 13/Nov/17 ]

This behavior is distinct from the one that retryable writes addresses. Retryable write support requires being certain about whether a previously attempted write happened or not, in the face of communication errors. This behavior occurs even in the presence of certain knowledge about whether a write succeeded or failed. Indeed, the DuplicateKey error tells the client in no uncertain terms that the write failed.

The challenge for automatic retry on duplicate key error is that we cannot know that the violated constraint is the unique index constraint from the match criteria, rather than some other unique index that might exist on the document. Handling a situation in which there may be multiple unique indexes correctly is what makes this harder to do in MongoDB than in the application, which knows what's what.

Put another way, automatically retrying on a DuplicateKey error might fail indefinitely if the unique index constraint violated was not the one from the match expression.

Comment by Sherry Ummen [ 13/Nov/17 ]

So now since 3.6 has re-tryable writes. Does that mean that now using 3.6 this will not be a problem?

Comment by Amit Gupta [ 13/Nov/17 ]

I believe I'm experiencing this issue as well. It seems hinted in the comments above that creating a unique index on the query predicate will either reduce or eliminate this issue? Is either case true? Any further explanation of how/why a unique index affects the occurrence of the issue?

Comment by Brad Vogel [ 13/Oct/17 ]

I saw that Mongo 3.6 has retryable rewrites (https://groups.google.com/forum/#!msg/mongodb-announce/BwDlVtKw74I/HH86-nnhAgAJ). Does that mean this is fixed?

Comment by nagisatosanae [ 01/Sep/17 ]

Extremely critical!!
This problem can also be seen when using the client driver connection pool.
It also appears in $set $setOninsert with upsert option.

Comment by bh [ 27/Jun/17 ]

+1

This is annoying and wastes a bit of time for `updateOne`, but its a breaking feature for bulk writes, when a single failure affects the entire group of updates.

Comment by Tudor Aursulesei [ 17/Apr/17 ]

+1
This should be at least documented somewhere.

Comment by James Perryman [X] [ 12/Jan/17 ]

This is a pretty big issue and remains in current version 3.4. Entails mongodb cannot be used properly in two phase commit scenario. Please provide an update.

Comment by Sergio Rykov [ 09/Jan/17 ]

MongoDB 3.4 is there and now we are in 2017 with this issue.

What's the state of issue?

Comment by Guillaume Rousseau [ 16/Dec/16 ]

got index creation failure because of this bug. pymongo.errorspymongo.errors..DuplicateKeyErrorDuplicateKeyError: : E11000 duplicate key error. It kills the running process since index is running in background.
I reduced the number of errors checking before if the document already exists, but it still happens
Any work around ? (I'm using pymongo and a non sharded collection)
G.R.

Comment by Jan Kondratowicz [ 14/Dec/16 ]

Since we updated to WiredTiger, we're having hundreds of those errors daily. I believe it's a major issue, making write operations randomly unreliable. I've been encountering this issue on a non-sharded collection, running a single NodeJS process, which is hardly a parallel environment.

Comment by Sherry Ummen [ 17/Oct/16 ]

We are lossing data quite often now because of this. Retry logic we have not added yet. Wondering whether this got any updates? Or any discussion happened on this issue ?

Comment by Ramon Fernandez Marina [ 01/Sep/16 ]

Hi sherry.ummen@napa.fi, please see Andy's comment above. A solution to this ticket requires quite a bit of discussion that, unfortunately hasn't happened yet to the best of my knowledge.

So while we'd like to address it in the upcoming MongoDB 3.4 (as indicated by the "3.3 Desired" fixVersion), it is likely that a fix for this ticket will slip, and users will have to continue to use application code to get around this behavior.

We'll post an updated on this ticket if the current plan and schedule change.

Regards,
Ramón.

Comment by Sherry Ummen [ 31/Aug/16 ]

Hello,

Could please tell what is the status of this issue? When will it be ready and in which release?

Thanks

Comment by bob whitehurst [ 07/Jun/16 ]

I am not sure if this may be helpful. I was having this problem and changed the shard key to be unique and it resolved the problem. That may not work for everyone. While diagnosing the problem I turned the verbosity level to 5 and was able to capture the error condition in the log.

There were two threads doing an upsert on the same entry that did not exist. The first thread (conn20) determines that the entry does not exist but instead of committing the transaction it does a rollback. A second thread (conn17) does the same action and does a commit. After a slight delay, the first thread then creates a new transaction but does not check to see if an entry already exists and thus it results in a duplicate entry.

I can't copy and paste from my log as it is on a isolated network, but this is an abbreviated version:

[conn20] Winning plan: ...
[conn20] Winning plan had zero results. ...
[conn20] WT_rollback_transaction
...
[conn17] Winning plan: ...
[conn17] Winning plan had zero results. ...
[conn17] Changed immutable fields ...
[conn17] WT_commit_transaction
[conn17] UpdateResult – upserted: ...
...
[conn20] Changed immutable fields ...
[conn20] WT begin_transaction
[conn20] WT commit_transaction
[conn20] UpdateResult – upserted
...

Comment by Jon Hyman [ 07/Apr/16 ]

We rewrote every single upsert in our application to use a global Redis lock based on the shard key, which is clearly undesirable, but I think it's a more complete solution than the retry. The retry situation works only if you have a unique indexes on your insertion/query predicate, but if you don't have a unique index, you can still insert duplicate data when this scenario happens. Chances are if you're using an upsert you can probably solve via unique index, but not always.

Comment by Daniel Barreto [ 07/Apr/16 ]

We just upgraded to WiredTiger (3.2.3) and this issue is happening a lot. We are loosing data because of this and will have to change our application logic to add retries there (really annoying since "upsert" should not be complaining about duplicate entries (E11000). Please let us know if there is any other workaround.

Comment by Jon Hyman [ 11/Jan/16 ]

Fantastic, thanks.

Comment by Asya Kamsky [ 11/Jan/16 ]

Confirmed - as of 2.2 you can use for your shard key a prefix of an index, so having index on customer_id, event, date (whether unique or non-unique) means you are not required to have one on just customer_id, event (your shard key).

Comment by Jon Hyman [ 10/Jan/16 ]

If our shard key is

{ customer_id, event }

(non-unique), I didn't realize we could add a unique index for

{ customer_id, event, date }

. We definitely have that index, it's just not unique, so we could change it to that then if it's possible.

Comment by Asya Kamsky [ 10/Jan/16 ]

Do you not have a unique index on customer_id, event, date? It seems like you should want to have such an index for performance (and you don't need to change the shard key here - in fact after creating the new index you can drop the index on customer_id,event)

Comment by Jon Hyman [ 09/Jan/16 ]

Thanks, Andy, it would be good to know for clarification.

Here's one example. We have a collection that looks like this which tracks occurrences of an event over time:

{ customer_id: X, event: Y, date: Z, num_occurrences: W }

that is sharded on

{ customer_id: 1, event: 1 }

, and we update it with an upsert that $inc's num_occurrences on a given date. We did this so queries for a given

{ customer_id, event }

pair over a date range would be on the same shard for query isolation. We had figured that Mongo's upsert was atomic when we made the design decision years ago. Now I'm concerned that this will cause two documents for a given

{ customer_id, event, date }

pair. While we could go back and update the shard key, the collection has of hundreds of millions of documents in it now, is hundreds of gigabytes, so that would be unrealistic without a significant amount of downtime. Also it would alter query isolation. As such, I'll be updating our code to manually handle the synchronization of the insert for this collection.

Comment by Andy Schwerin [ 09/Jan/16 ]

jonhyman, I have a minor clarification here. In a sharded collection, an update with upsert must always contain the document's shard key, since in the event that the upsert becomes an insert the shard key must be known. Any index that has the shard key as its prefix can also enforce a unique index constraint in mongodb, IIRC. I just looked, and think the documentation may be incorrect on this point. If you're interested, I can do a little digging.

Anyhow, if you combine those two facts, you have a situation where you can always have a unique index when using upsert, unless you really cannot have a unique index on the search criteria due to the application design. However, if your application cannot accept a uniqueness constraint on your upsert criteria, in what sense is it incorrect to insert a second document? That is, if it is legal for two documents to match the criteria, how should mongodb decide when to insert a new document or not?

This ticket is really more about automatically performing the upsert as an update on unique key violation, rather than requiring the client to do so. Prior to MongoDB 3.0, this situation was somewhat rare, but starting in 3.0, when using the WiredTiger storage engine, it has become more frequent, making it more of a hassle for app developers than it used to be.

Even the situation in this ticket is a little tricky, because the automatic retry on duplicate key is based on the assumption that the violated constraint is the unique index constraint from the match criteria, and not some other unique index that might exist on the document. Handling a situation in which there may be multiple unique indexes correctly is what makes this harder to do in MongoDB than in the application, which knows what's what.

Comment by Jon Hyman [ 08/Jan/16 ]

As a workaround, we're going to use Redis for locking around all Mongo upserts to add the missing atomicity. While it's easy to code for retrying when you get a primary key error, that upserts can cause multiple document insertions where there are not unique indexes is pretty awful. Sharded collections can only have one unique index, so you're pretty limited with your options here once you shard.

Comment by Tim Widmann [ 08/Jan/16 ]

I also got this with MongoDB 3.0.6 and WiredTiger

Comment by Ryan [ 08/Jan/16 ]

I just hit this on mongodb-linux-x86_64-rhel70-3.2.0 with wiredTiger.

Comment by Jon Hyman [ 08/Jan/16 ]

We just got this on 3.0.8. Had never seen it before (though I was on 2.6.11 previously).

Comment by 아나 하리 [ 04/Jan/16 ]

It's okay until MongoDB 3.0.3, But from 3.0.4 to 3.0.6 (This is the last version I have tested) Duplicated Key error is happened.

Comment by Dnv.IytH [ 30/Dec/15 ]

Which version of this problem begin to appear, it works well with wiredTiger engine in version 3.0.0.

Comment by 아나 하리 [ 21/Nov/15 ]

This E11000(Duplicate key) exception during upsert operation happens from MongoDB 3.0.4
After little digging github commit, I found that "bfdf548bc4c882ab944e0cf2f1e6015752fc05c2" patch is a little bit weird.

https://github.com/mongodb/mongo/commit/bfdf548bc4c882ab944e0cf2f1e6015752fc05c2

I think MONGO_WRITE_CONFLICT_RETRY_LOOP_BEGIN/END macro for "UpdateStage::doInsert()" is useless during "upsert" operation.
Retrying only insert will fail with DuplicateKey(forever until removing that record) if other thread inserted the record for the same key already.
For upsert operation, mongodb should retry both Update and Insert for every WriteConflictException not only Insert.

And "UpdateStage::work(WorkingSetID* out)" has retry code already.
Does this patch aims for pure insert (not upsert) ?
Even so, I am not sure we need retry for insert conflict.

Comment by Andrew Doumaux [ 09/Sep/15 ]

I'm receiving this same error in v3.0.6/wiredtiger engine when upserting into a collection that only has a index on _id and the upsert query is only on _id. This is using the Java Driver mongo 2.13 driver. I'm currently using Apache Camel Split to handle the multithread processing of records that are upserted into mongo.

Comment by Ramon Fernandez Marina [ 19/Aug/15 ]

Simple shell repro for this problem, courtesy of igor:

load('jstests/libs/parallelTester.js'); // for ScopedThread
 
var t = db.getSiblingDB("upsert_bug").getCollection("col");
 
t.drop();
 
var upsertFn = function() {
  for (var i = 0; i < 100; ++i) {
    var c = db.getSiblingDB("upsert_bug").getCollection("col");
    var x = c.update({_id:3}, {$set: {a:'123', b: Math.random()}}, true)
    assert(!x.getWriteError(), tojson(x));
    c.remove({"_id":3});
  }
}
 
var threads = [];
for (var i = 0; i < 2; ++i) {
  var thread = new ScopedThread(upsertFn, db);
  threads.push(thread);
  thread.start();
}
 
threads.forEach(function(t) {
    t.join();
});

Comment by Michael Kania [ 19/Aug/15 ]

Seems like lots of reports. We see this issue being pretty common 3.0 WireTiger and RocksDB. What's the plan with this ticket?

Generated at Thu Feb 08 03:34:29 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.