[SERVER-36797] Sharded Cluster Race Condition: duplicate documents inserted with hashed shard key / unique index (rare) Created: 21/Aug/18 Updated: 27/Oct/23 Resolved: 27/Aug/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 3.6.6 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | J-EC | Assignee: | Nick Brewer |
| Resolution: | Works as Designed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Operating System: | ALL |
| Steps To Reproduce: | I wrote a test application that is able to reproduce as noted in the Description. If you need my test application to reproduce, then I'll have to refactor it so that it is more generic and will compile for you, otherwise I am able to reproduce very easily and can provide any data that is needed. |
| Participants: |
| Description |
|
Hi, I'm relatively to Mongo and seem to have stumbled upon an apparent race condition / bug in a sharded Mongo deployment. Please advise if I'm missing something fundamental? Problem Summary: Very rarely, though reproducible, duplicate documents can be inserted into a hashed sharded collection, even when a unique index exists that should explicitly prevent this from occurring. Please see the details below. Test Environment: The race condition was originally reproduced with a larger deployment (3 member replica sets), but it was simplified as follows for reproduction: General environment:
Minimum sharded cluster:
Application:
Test Purpose: In a sharded collection using hashed sharding, verify that duplicate documents (those with the same shard key value) cannot be inserted but are instead rejected. For example, two good examples are demonstrated below where a "duplicate key error" is returned since the shard key value already exists in the collection. Good Example #1: using _id as the hashed shard key
Good Example #2: using "repo1pk" field-name as the hashed shard key (with a unique index also defined)
Example #2 is really how my application behaves, even though I reproduced the problem below using _id just to see if it would behave differently (it doesn't). Race Condition / Bug: I wrote a test application that attempts to insert & delete a single document - repeat forever - where "_id" is my hashed shard key:
Four instances of my test driver are started and compete trying to create & delete the single document above. The sharded collection should have at most 0..1 documents at any given time, and this is generally the case. However very rarely a duplicate document is inserted, where my tool detects this during document deletion when Mongo responds with { "deletedCount" : 2 } since it deletes all documents matching { "fn" : "1" }. When my tool detects that more than one document was deleted it spits out:
For example, during a 5 minute test run there were 270,000 create (insert) attempts where 134,514 were successful since the doc didn't yet exist, where 141 duplicate documents were detected during the delete cycle:
To confirm that duplicate documents were actually inserted, I wrote a reader that constantly tries to read the "fn1" fieldname, where it can sometimes catch instances of duplicate documents as shown below:
The above behavior seems to be unexpected and a race condition in light of the following: Mongo documentation notes that the above behavior shouldn't be possible because of the existence of a 'unique' index for _id (or in the case where I explicitly create a unique index when using "repo1pk" as the hashed shard key):
Also Kevin Adistambha with Mongo recently wrote that when inserting or updating a document that, "The index and the collection will be updated as a single atomic operation." So it would seem that an attempt to insert a duplicate document should be caught and rejected due to the presence of a unique index? Other data: This is the dummy document that is created, deleted, repeat forever.
The set of defined indexes for my test. I doubt this matters, but I've reproduced with & without an index on "fn1" below.
Here is the shard status when reproducing the problem. Curious: why is "unique: false" stated below, even though a unique index is created by default (for _id) and is otherwise (usually) enforced?
|
| Comments |
| Comment by Nick Brewer [ 27/Aug/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
jmmec Thanks for the detailed report - I've used a similar test to reproduce the behavior you're describing. After talking this over with our Query team, I believe the behavior you're seeing is expected, albeit somewhat difficult to trigger. I'll try to explain both of the cases you've identified where more than one document is returned: Two documents deleted:
Each document is deleted within a separate storage-level transaction which may see a different snapshot of the data. This makes it possible to delete the document, and then pick up a new snapshot that reflects the insertion of a new document. Two documents returned via find():
With WiredTiger, read operations see a consistent snapshot of data until they yield; at yield points (and only at yield points), a query can change the snapshot from which it is reading. As a result, the default read concern may end up using multiple snapshots for read isolation. If you were to use transactions with the snapshot read concern, it would force your query to use only one snapshot, and you would never see multiple documents returned from such a query. What is happening here is that the find() finds the one document that exists in the snapshot that is used when the read is initialized, then yields its read lock to another read operation, then when the other read finishes and the find() continues, the query utilizes a newer snapshot and finds the second document that was inserted while it was yielded. -Nick | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by J-EC [ 23/Aug/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi, Here is a brief update with some new logs on a different system from yesterday (2018-08-22) described above. System #2: Today I reproduced the duplicate insert problem with Mongo v3.6.7 (has been v3.6.6) in a different container in a different network and did not (over several test attempts) hit the OpTime/rollback logs as hit on 2018-08-22. However once I did hit new logs that I've not seen before:
System #2 Simplified Non-Sharded Configuration:
Logs: Please see the attached log file dated 2018-08-23. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by J-EC [ 22/Aug/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Nick, I restarted everything and began following your instructions – and had gotten pretty far reproducing it at each step in a more and more simplified sharded deployment – when I realized that I had forgotten to configure my test collection for sharding. doh... Since it seemed that you were wanting to further simplify the deployment to the bare minimal where the problem was still reproducible, then it turns out that a sharded deployment isn't needed at all. Below is the simplest configuration where the problem is reproducible, where I attempted to answer some of your more general questions. Simplified Non-Sharded Configuration:
Start the single mongod:
Where I'm reusing my sharded configuration file, but am not starting a mongos or config-server:
Configure the shard and also start my test clients so that the non-sharded collection is created.
Only one mongo instance exists as expected:
There are only 3 connections from dbClient to mongod, where 2 are the insert / deleter test clients, and 1 is doing SDAM in the C Client library:
Disk usage is always OK and CPU is maxed out near 100%:
The duplicate document problem is reproduced; refer to attached file "2018-08-22 Server-36797 Logs.txt" which has details. There are suspicious logs regarding OpTime and potential rollback. See my other comments in the attached text log file about the location of other logs you requested. Delete and Query Commands: The delete command that detects duplicate documents would be similar to this query (but of course executed via the Mongo C driver):
The query command (my reader) that can sometimes detect duplicate documents is similar to this (but of course executed via the Mongo C driver):
Other: For completeness in case you ask about this:
Please let me know if you need other data captured, especially any clarifications about where log files are written, etc.. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Nick Brewer [ 21/Aug/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
jmmec Thanks for your report. I'd like to confirm a few things:
Additionally, could increase the log level via db.setLogLevel(1, "query"), and then include logs from a time when this behavior occurs? Thanks, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by J-EC [ 21/Aug/18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Before opening this bug, I searched for existing bugs and also did more investigation in addition to what is noted above, and here is a summary. I've reproduced the problem using two different approaches for creating a document, where for both approaches the hashed shard key is configured with a unique index. Below I reference the Mongo CLI commands, although realize that I am using the Mongo C driver: 1. db.collection.insertOne() 2. db.collection.updateOne($setOnInsert, upsert:true) For approach #1:
For approach #2:
General:
|