[SERVER-58936] Unique index constraints may not be enforced Created: 28/Jul/21  Updated: 29/Oct/23  Resolved: 28/Jul/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 4.4.7, 5.0.0-rc2
Fix Version/s: 4.4.8, 5.0.2, 5.1.0-rc0

Type: Task Priority: Blocker - P1
Reporter: Jonathan Streets (Inactive) Assignee: Jonathan Streets (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File findUniquenessViolation.js     File findUniquenessViolations_latest.js     Text File output-example.txt    
Issue Links:
Depends
Problem/Incident
is caused by WT-7264 Creating a new configuration for sear... Closed
is caused by SERVER-56509 Wrap unique index insertion _keyExist... Closed
Related
related to SERVER-58943 Add more test coverage for unique ind... Closed
Backwards Compatibility: Fully Compatible
Participants:
Case:

 Description   
Issue Summary as of August 17, 2021

ISSUE DESCRIPTION AND IMPACT
For collections that have additional unique indexes apart from the default _id index, SERVER-56509 introduced a regression which may result in documents being inserted that violate those unique indexes’ uniqueness constraint. These documents will be replicated successfully from the primary to the secondaries.

If this bug is exercised and multiple documents exist in a collection violating a unique index constraint, subsequent delete operations using the affected unique index will only modify half (rounding up) of the affected documents being targeted per execution. This is a result of internal optimizations that rely on uniqueness. Query and update operations are not affected and will return all targeted documents.

DIAGNOSIS AND REMEDIATION
This issue affects MongoDB 4.4.7, 5.0.0 and 5.0.1. This issue is resolved in 4.4.8 and 5.0.2.

Deployments on the affected versions that rely on unique indexes apart from the _id index should be upgraded to MongoDB 4.4.8 or 5.0.2 as soon as possible.

After upgrading to a version that is not impacted by this bug, users can determine whether they have been impacted by using the validate() command to validate all collections or by running the attached script, findUniquenessViolations.js

This script iterates through every database and collection in the cluster looking for unique indexes that are not the _id index. For each unique index that it finds, it will perform an operation to list:

  • each index key value that is incorrectly duplicated
  • the _ids of each document with that key value

As this script will potentially perform multiple index scans, we would recommend issuing it against a secondary to minimize production impact.

Running the script

Here is an example invocation of the script (you may also use the legacy mongo shell), which will output results to results.txt in the current directory:

mongosh <mongouri>/?readPreference=secondary --username admin --authenticationDatabase admin findUniquenessViolation.js | tee results.txt
 
2021-08-16T17:16:09.593Z Searching node for any documents that violate a unique index...
 
2021-08-16T17:16:09.622Z Searching for duplicates in test.c that has 1 unique index(es).
2021-08-16T17:16:09.718Z Found 4 document(s) in 'test.c' (index: {"t":1,"i":1}) with duplicate values with key: {"t":99,"i":99}
 1  {"_id":{"$oid":"610425674cd0a4f976c591c3"}}
 2  {"_id":{"$oid":"6104256a4cd0a4f976c5dc49"}}
 3  {"_id":{"$oid":"6104256b4cd0a4f976c60294"}}
 4  {"_id":{"$oid":"6104256c4cd0a4f976c62aae"}}
 
2021-08-16T17:16:09.722Z Found 4 documents that violate a unique index, affecting 1 collection(s) in 1 database(s).
 
Result JSON:
 
[{"database":"test","collection":"c","key":{"t":99,"i":99},"docs":[{"_id":{"$oid":"611a9d43eb3440c0b1efa28d"}},{"_id":{"$oid":"611a9d46eb3440c0b1eff424"}},{"_id":{"$oid":"611a9d47eb3440c0b1f018ed"}},{"_id":{"$oid":"611a9d48eb3440c0b1f0413e"}}]}]

You can inspect the affected documents by querying on the provided _ids. Depending on the results and application logic, it may be safe to remove the duplicated documents, otherwise more involved reconciliation may be required.

For example, in this case, reviewing the affected documents we can see that they all match:

> db.c.find({$or:[ { "_id" : ObjectId("610425674cd0a4f976c591c3") }, { "_id" : ObjectId("6104256a4cd0a4f976c5dc49") }, { "_id" : ObjectId("6104256b4cd0a4f976c60294") }, { "_id" : ObjectId("6104256c4cd0a4f976c62aae") } ]})
{ "_id" : ObjectId("610425674cd0a4f976c591c3"), "t" : 99, "i" : 99, "x" : "x" }
{ "_id" : ObjectId("6104256a4cd0a4f976c5dc49"), "t" : 99, "i" : 99, "x" : "x" }
{ "_id" : ObjectId("6104256b4cd0a4f976c60294"), "t" : 99, "i" : 99, "x" : "x" }
{ "_id" : ObjectId("6104256c4cd0a4f976c62aae"), "t" : 99, "i" : 99, "x" : "x" }

Therefore, based on our knowledge of the application, we can safely remove all but one using the _id:

> db.c.remove({$or:[ { "_id" : ObjectId("610425674cd0a4f976c591c3") }, { "_id" : ObjectId("6104256a4cd0a4f976c5dc49") }, { "_id" : ObjectId("6104256b4cd0a4f976c60294") } ]})
WriteResult({ "nRemoved" : 3 })
> db.c.find({$or:[ { "_id" : ObjectId("610425674cd0a4f976c591c3") }, { "_id" : ObjectId("6104256a4cd0a4f976c5dc49") }, { "_id" : ObjectId("6104256b4cd0a4f976c60294") }, { "_id" : ObjectId("6104256c4cd0a4f976c62aae") } ]})
{ "_id" : ObjectId("6104256c4cd0a4f976c62aae"), "t" : 99, "i" : 99, "x" : "x" }

Additional Option: Specifying namespaces to query
After running the script, you may notice that databases and namespaces have been skipped for reasons such as not being authorized to read a collection.

We were unable to access these locations:
Databases
["admin","config","local",”db_1”]
 
Namespaces
["admin.system.keys","db_0.coll_0","db_0.coll_1"]

You may want to run the script only against namespaces that have been skipped. You can do this by modifying the script and providing an array of namespaces with the format ‘database_name.collection_name’ in the namespace variable. Namespaces containing the admin, local, and config databases are unlikely to contain duplicate documents and may be ignored. For all other namespaces, verify that the user running the script has sufficient permissions to read the namespace.

findUniquenessViolation.js snippet

// ------------------------------------------------------------------------------------
// Populate this array with specific namespaces to scan only them for duplicates,
// using the format 'database_name.collection_name'
// ------------------------------------------------------------------------------------
 
namespaces = ["db_0.coll_0","db_0.coll_1"];

Additional Option: Automatic cleanup

If you are absolutely certain that inserted documents will be materially similar, this script can be leveraged to delete all but either the newest or oldest of each set of duplicates.

This option is disabled default and is only suitable if the contents of the duplicate documents are materially similar for your use-case.

Warning: Use this facility only with extreme care as documents targeted by the script will be permanently deleted and does not back up or output the contents of those documents.

To use this script to clean up duplicate documents without regard for application-specific logic, uncomment the declaration of cleanupType in the script and set that variable to either delete_oldest or delete_newest.

This ticket will track the reverts of SERVER-56509 which was released in 5.0.0-rc2 and 4.4.7



 Comments   
Comment by Edwin Zhou [ 23/Aug/21 ]

Hi deyan@nuxni.com,

Thanks for reporting an issue with this script.

This seems to be an issue with the legacy mongo shell where EJSON is not supported. Will you please attempt to run this script using mongosh? You may download the new shell using the instructions listed here.

Please download the latest version of the script, findUniquenessViolations_latest.js. Before rerunning the script, set the verboseOutput (line #22) in the script to true. Please note that when using mongosh and tee'ing the output of the script, you may need to explicitly set the password in the mongosh command-line options. For example:

mongosh -u admin findUniquenessViolations_latest.js | tee out.txt

Failing to do so will result in an output suppressed by a password-prompt.

Best,
Edwin

Comment by Deyan Petrov [ 23/Aug/21 ]

Hi,

 

Running the latest script gives me now errors like this:

 

2021-08-23T06:27:57.441Z Searching for duplicates in xxxx.yyyy that has 1 unique index(es). 2021-08-23T06:27:57.487Z 
We are unable to access yyyy. Printing error and skipping collection... 
2021-08-23T06:27:57.487Z [unknown type]

 

The 30089_findUniquenessViolation-dotReplacement.js was actually working ...

 

Br,

Deyan

Comment by Kelsey Schubert [ 13/Aug/21 ]

Thanks for reporting this issue with the script.

If you are seeing the following error message,

We are unauthorized to access ${collectionname}. Skipping collection...

when running the original script as a user with sufficient privileges, you may need to either enable allow disk use or escape dots and dollars from your index specification.

I've uploaded a new script, findUniquenessViolations_latest.js , that makes both of these modifications, please use it if you are having trouble.

Thank you,
Kelsey

Comment by Deyan Petrov [ 13/Aug/21 ]

It does not work also for indexes on child object fields, e.g. child.prop1

 
{
  child:

{     "prop1": "bla"     }

}

I guess everything with a dot in the index does not work.

Comment by Deyan Petrov [ 13/Aug/21 ]

It seems that the script does not support indexes on array fields, e.g. if you have an array trxs in your document and you want to filter on trxs.extRef for example ...

 

{

  trxs: [

   

{         "extRef": "bla"     }

  ]

}

Comment by Deyan Petrov [ 13/Aug/21 ]

Getting "We are unauthorized to access xxxxxxxxxxx. Skipping collection.." where xxxxxxxxxxx is the collection name. My user is atlasAdmin@atlas ... what could be the issue?

 

The underlying error is

 

Error: command failed: {Error: command failed: { "ok" : 0, "errmsg" : "FieldPath field names may not contain '.'. Consider using $getField or $setField.", "code" : 16412, "codeName" : "Location16412", "$clusterTime" : { "clusterTime" : Timestamp(1628834220, 50), "signature" :

Unknown macro: { "hash" }

}, "operationTime" : Timestamp(1628834220, 50)} : aggregate failed

Comment by Jonathan Streets (Inactive) [ 28/Jul/21 ]

Author:

{'name': 'Henrik Edin', 'email': 'henrik.edin@mongodb.com', 'username': 'henrikedin'}

Message: Revert "SERVER-56509 Wrap unique index insertion _keyExists call in a WT cursor reconfigure"

This reverts commit c5ac2eb1ea145693e1c6b974e88a2cfc18780134.
Branch: master
https://github.com/mongodb/mongo/commit/ecba23449a26f3d266ffacda7eb98d8386267606

Comment by Jonathan Streets (Inactive) [ 28/Jul/21 ]

Author:

{'name': 'Henrik Edin', 'email': 'henrik.edin@mongodb.com', 'username': 'henrikedin'}

Message: Revert "SERVER-56509 Wrap unique index insertion _keyExists call in a WT cursor reconfigure"

This reverts commit ae2da27652e552f101559466d165b82a3c122d71.
Branch: v4.4
https://github.com/mongodb/mongo/commit/83b8bb8b6b325d8d8d3dfd2ad9f744bdad7d6ca0

Comment by Jonathan Streets (Inactive) [ 28/Jul/21 ]

Author:

{'name': 'Henrik Edin', 'email': 'henrik.edin@mongodb.com', 'username': 'henrikedin'}

Message: Revert "SERVER-56509 Wrap unique index insertion _keyExists call in a WT cursor reconfigure"

This reverts commit 297e2977ef3e394e02d61aedc954c9aaadc37e73.
Branch: v5.0
https://github.com/mongodb/mongo/commit/6d9ec525e78465dcecadcff99cce953d380fedc8

 

Generated at Thu Feb 08 05:45:54 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.