Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Fixed
Priority: Blocker - P1
Fix Version/s: 4.4.8, 5.0.2, 5.1.0-rc0
Affects Version/s: 4.4.7, 5.0.0-rc2
Component/s: None
Labels:
None

Backwards Compatibility:
Fully Compatible
Case:
Confidence Status:
None
Work Order:
0
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Issue Summary as of August 17, 2021

ISSUE DESCRIPTION AND IMPACT
For collections that have additional unique indexes apart from the default _id index, ~~SERVER-56509~~ introduced a regression which may result in documents being inserted that violate those unique indexes’ uniqueness constraint. These documents will be replicated successfully from the primary to the secondaries.

If this bug is exercised and multiple documents exist in a collection violating a unique index constraint, subsequent delete operations using the affected unique index will only modify half (rounding up) of the affected documents being targeted per execution. This is a result of internal optimizations that rely on uniqueness. Query and update operations are not affected and will return all targeted documents.

DIAGNOSIS AND REMEDIATION
This issue affects MongoDB 4.4.7, 5.0.0 and 5.0.1. This issue is resolved in 4.4.8 and 5.0.2.

Deployments on the affected versions that rely on unique indexes apart from the _id index should be upgraded to MongoDB 4.4.8 or 5.0.2 as soon as possible.

After upgrading to a version that is not impacted by this bug, users can determine whether they have been impacted by using the validate() command to validate all collections or by running the attached script, findUniquenessViolations.js

This script iterates through every database and collection in the cluster looking for unique indexes that are not the _id index. For each unique index that it finds, it will perform an operation to list:

each index key value that is incorrectly duplicated
the _ids of each document with that key value

As this script will potentially perform multiple index scans, we would recommend issuing it against a secondary to minimize production impact.

Running the script

Here is an example invocation of the script (you may also use the legacy mongo shell), which will output results to results.txt in the current directory:

mongosh <mongouri>/?readPreference=secondary --username admin --authenticationDatabase admin findUniquenessViolation.js | tee results.txt

2021-08-16T17:16:09.593Z Searching node for any documents that violate a unique index...

2021-08-16T17:16:09.622Z Searching for duplicates in test.c that has 1 unique index(es).
2021-08-16T17:16:09.718Z Found 4 document(s) in 'test.c' (index: {"t":1,"i":1}) with duplicate values with key: {"t":99,"i":99}
 1  {"_id":{"$oid":"610425674cd0a4f976c591c3"}}
 2  {"_id":{"$oid":"6104256a4cd0a4f976c5dc49"}}
 3  {"_id":{"$oid":"6104256b4cd0a4f976c60294"}}
 4  {"_id":{"$oid":"6104256c4cd0a4f976c62aae"}}

2021-08-16T17:16:09.722Z Found 4 documents that violate a unique index, affecting 1 collection(s) in 1 database(s).

Result JSON:

[{"database":"test","collection":"c","key":{"t":99,"i":99},"docs":[{"_id":{"$oid":"611a9d43eb3440c0b1efa28d"}},{"_id":{"$oid":"611a9d46eb3440c0b1eff424"}},{"_id":{"$oid":"611a9d47eb3440c0b1f018ed"}},{"_id":{"$oid":"611a9d48eb3440c0b1f0413e"}}]}]

You can inspect the affected documents by querying on the provided _ids. Depending on the results and application logic, it may be safe to remove the duplicated documents, otherwise more involved reconciliation may be required.

For example, in this case, reviewing the affected documents we can see that they all match:

> db.c.find({$or:[ { "_id" : ObjectId("610425674cd0a4f976c591c3") }, { "_id" : ObjectId("6104256a4cd0a4f976c5dc49") }, { "_id" : ObjectId("6104256b4cd0a4f976c60294") }, { "_id" : ObjectId("6104256c4cd0a4f976c62aae") } ]})
{ "_id" : ObjectId("610425674cd0a4f976c591c3"), "t" : 99, "i" : 99, "x" : "x" }
{ "_id" : ObjectId("6104256a4cd0a4f976c5dc49"), "t" : 99, "i" : 99, "x" : "x" }
{ "_id" : ObjectId("6104256b4cd0a4f976c60294"), "t" : 99, "i" : 99, "x" : "x" }
{ "_id" : ObjectId("6104256c4cd0a4f976c62aae"), "t" : 99, "i" : 99, "x" : "x" }

Therefore, based on our knowledge of the application, we can safely remove all but one using the _id:

> db.c.remove({$or:[ { "_id" : ObjectId("610425674cd0a4f976c591c3") }, { "_id" : ObjectId("6104256a4cd0a4f976c5dc49") }, { "_id" : ObjectId("6104256b4cd0a4f976c60294") } ]})
WriteResult({ "nRemoved" : 3 })
> db.c.find({$or:[ { "_id" : ObjectId("610425674cd0a4f976c591c3") }, { "_id" : ObjectId("6104256a4cd0a4f976c5dc49") }, { "_id" : ObjectId("6104256b4cd0a4f976c60294") }, { "_id" : ObjectId("6104256c4cd0a4f976c62aae") } ]})
{ "_id" : ObjectId("6104256c4cd0a4f976c62aae"), "t" : 99, "i" : 99, "x" : "x" }

Additional Option: Specifying namespaces to query
After running the script, you may notice that databases and namespaces have been skipped for reasons such as not being authorized to read a collection.

We were unable to access these locations:
Databases
["admin","config","local",”db_1”]

Namespaces
["admin.system.keys","db_0.coll_0","db_0.coll_1"]

You may want to run the script only against namespaces that have been skipped. You can do this by modifying the script and providing an array of namespaces with the format ‘database_name.collection_name’ in the namespace variable. Namespaces containing the admin, local, and config databases are unlikely to contain duplicate documents and may be ignored. For all other namespaces, verify that the user running the script has sufficient permissions to read the namespace.

findUniquenessViolation.js snippet

// ------------------------------------------------------------------------------------
// Populate this array with specific namespaces to scan only them for duplicates,
// using the format 'database_name.collection_name'
// ------------------------------------------------------------------------------------

namespaces = ["db_0.coll_0","db_0.coll_1"];

Additional Option: Automatic cleanup

If you are absolutely certain that inserted documents will be materially similar, this script can be leveraged to delete all but either the newest or oldest of each set of duplicates.

This option is disabled default and is only suitable if the contents of the duplicate documents are materially similar for your use-case.

Warning: Use this facility only with extreme care as documents targeted by the script will be permanently deleted and does not back up or output the contents of those documents.

To use this script to clean up duplicate documents without regard for application-specific logic, uncomment the declaration of cleanupType in the script and set that variable to either delete_oldest or delete_newest.

This ticket will track the reverts of ~~SERVER-56509~~ which was released in 5.0.0-rc2 and 4.4.7

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

findUniquenessViolation.js
4 kB
Aug 02 2021 02:54:33 PM UTC
findUniquenessViolations_latest.js
9 kB
Aug 23 2021 02:32:35 PM UTC
output-example.txt
2 kB
Aug 02 2021 02:54:59 PM UTC

is caused by

WT-7264 Creating a new configuration for search near that allows it to exit quickly when searching for prefixes

Closed

SERVER-56509 Wrap unique index insertion _keyExists call in a WT cursor reconfigure.

Closed

related to

SERVER-58943 Add more test coverage for unique indexes

Closed

Assignee:: Jonathan Streets (Inactive)
Reporter:: Jonathan Streets (Inactive)
Participants:: Deyan Petrov, Edwin Zhou, Jonathan Streets, Kelsey Schubert
Votes:: 0 Vote for this issue
Watchers:: 33 Start watching this issue

Created:: Jul 28 2021 09:11:06 PM UTC
Updated:: Jul 02 2025 04:19:26 PM UTC
Resolved:: Jul 28 2021 09:31:27 PM UTC
Confidence Status Last Update:: 28/Jul/21 9:12 PM

Details

Description

Attachments

Attachments

Issue Links

Forms

Activity

People

Dates