[SERVER-46810] Broken E11000 duplicate key error when unique index contains collation Created: 11/Mar/20  Updated: 29/Oct/23  Resolved: 15/Apr/20

Status: Closed
Project: Core Server
Component/s: Querying, Usability
Affects Version/s: None
Fix Version/s: 4.2.7, 4.4.0-rc4, 4.7.0

Type: Bug Priority: Major - P3
Reporter: Dmitry Lukyanov (Inactive) Assignee: David Storch
Resolution: Fixed Votes: 2
Labels: collation, qexec-team
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Related
related to SERVER-60298 Explain can include ICU collation key... Closed
related to SERVER-50454 Avoiding sending the "keyValue" field... Closed
is related to SERVER-26050 Unique key violation for index with a... Backlog
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.4, v4.2
Sprint: Query 2020-04-20
Participants:

 Description   

The E11000 duplicate key exception contains wrong details which sometimes lead to the exception on the driver since we cannot parse it.

Consider the following queries:

// prepare collection
db.coll.createIndex( { "testValue" : 1 }, { unique : true, collation: { locale : 'sv', strength : 1, numericOrdering : true } } )
{
        "createdCollectionAutomatically" : true,
        "numIndexesBefore" : 1,
        "numIndexesAfter" : 2,
        "commitQuorum" : 1,
        "ok" : 1
}

Then, try to add two the same documents one by one:

  • More simple case:

    // 1 attempt
     db.coll.insertOne({ "testValue": "def"})
    {
            "acknowledged" : true,
            "insertedId" : ObjectId("5e695401da7e6a4f1b712884")
    }
    // 2 attempt
     db.coll.insertOne({ "testValue": "def"})
    {"t":{"$date":"2020-03-11T21:11:30.702Z"},"s":"E", "c":"QUERY",   "id":0,"ctx":"js","msg":"{}","attr":{"message":"WriteError({\n\t\"index\" : 0,\n\t\"code\" : 11000,\n\t\"errmsg\" : \"E11000 duplicate key error collection: test.coll index: testValue_1 dup key: { testValue: \\\"/13\\\" }\",\n\t\"op\" : {\n\t\t\"_id\" : ObjectId(\"5e695402da7e6a4f1b712885\"),\n\t\t\"testValue\" : \"def\"\n\t}\n}) :\nWriteError({\n\t\"index\" : 0,\n\t\"code\" : 11000,\n\t\"errmsg\" : \"E11000 duplicate key error collection: test.coll index: testValue_1 dup key: { testValue: \\\"/13\\\" }\",\n\t\"op\" : {\n\t\t\"_id\" : ObjectId(\"5e695402da7e6a4f1b712885\"),\n\t\t\"testValue\" : \"def\"\n\t}\n})\nWriteError@src/mongo/shell/bulk_api.js:458:48\nmergeBatchResults@src/mongo/shell/bulk_api.js:855:49\nexecuteBatch@src/mongo/shell/bulk_api.js:919:13\nBulk/this.execute@src/mongo/shell/bulk_api.js:1163:21\nDBCollection.prototype.insertOne@src/mongo/shell/crud_api.js:264:9\n@(shell):1:1"}}
    

    This case contains only issue with wrong exception details:

    testValue_1 dup key: { testValue: \\\"/13\\\" }
    

  • Worse case:

    // 1 attempt
     db.coll.insertOne({ "testValue": "abcdefghijkl123456789"})
    {
            "acknowledged" : true,
            "insertedId" : ObjectId("5e695418da7e6a4f1b712886")
    }
    // 2 attempt
    db.coll.insertOne({ "testValue": "abcdefghijkl123456789"})
    {"t":{"$date":"2020-03-11T21:11:53.807Z"},"s":"E", "c":"QUERY",   "id":0,"ctx":"js","msg":"{}","attr":{"message":"WriteError({\n\t\"index\" : 0,\n\t\"code\" : 11000,\n\t\"errmsg\" : \"E11000 duplicate key error collection: test.coll index: testValue_1 dup key: { testValue: \\\")+-/13579;=?\\u000f�\\r9\\u000fe��\\\" }\",\n\t\"op\" : {\n\t\t\"_id\" : ObjectId(\"5e695419da7e6a4f1b712887\"),\n\t\t\"testValue\" : \"abcdefghijkl123456789\"\n\t}\n}) :\nWriteError({\n\t\"index\" : 0,\n\t\"code\" : 11000,\n\t\"errmsg\" : \"E11000 duplicate key error collection: test.coll index: testValue_1 dup key: { testValue: \\\")+-/13579;=?\\u000f�\\r9\\u000fe��\\\" }\",\n\t\"op\" : {\n\t\t\"_id\" : ObjectId(\"5e695419da7e6a4f1b712887\"),\n\t\t\"testValue\" : \"abcdefghijkl123456789\"\n\t}\n})\nWriteError@src/mongo/shell/bulk_api.js:458:48\nmergeBatchResults@src/mongo/shell/bulk_api.js:855:49\nexecuteBatch@src/mongo/shell/bulk_api.js:919:13\nBulk/this.execute@src/mongo/shell/bulk_api.js:1163:21\nDBCollection.prototype.insertOne@src/mongo/shell/crud_api.js:264:9\n@(shell):1:1"}}
    

    In this case we will see the wrong error part:

    testValue_1 dup key: { testValue: \\\")+-/13579;=?\\u000f�\\r9\\u000fe��\\\" }
    

    but the worst thing that we won't be able to parse this error on the driver side. And as a result this error will trigger a new deserialization exception which will fully hide the original one.

It looks like this ticket is related to https://jira.mongodb.org/browse/SERVER-26050



 Comments   
Comment by Githook User [ 07/May/20 ]

Author:

{'name': 'David Storch', 'email': 'david.storch@mongodb.com', 'username': 'dstorch'}

Message: SERVER-46810 Hex encode collation keys and invalid UTF-8 in dup key error messages

Also changes the error message to include the index's
collation, in order to help users interpret the collation
keys.

(cherry picked from commit 9dbaf78c605a576db8b15895c32aedc3e07d7ec8)
Branch: v4.2
https://github.com/mongodb/mongo/commit/0e8998afb31300f39bc63af4a745a437b07ff977

Comment by Githook User [ 06/May/20 ]

Author:

{'name': 'David Storch', 'email': 'david.storch@mongodb.com', 'username': 'dstorch'}

Message: SERVER-46810 Hex encode collation keys and invalid UTF-8 in dup key error messages

Also changes the error message to include the index's
collation, in order to help users interpret the collation
keys.

(cherry picked from commit 9ca0c11865bba19999932a57069daed84a4577ca)
Branch: v4.4
https://github.com/mongodb/mongo/commit/9dbaf78c605a576db8b15895c32aedc3e07d7ec8

Comment by David Storch [ 15/Apr/20 ]

In order to ensure that the error message returned to drivers is always valid UTF-8, the fix merged to the master branch is to hex encode the value of the collation key (or any other invalid UTF-8) before appending it to the error message.

Note that the keyValue field returned to drivers may still contain strings with invalid UTF-8. However, my understanding is that drivers currently ignore keyValue, so this should not result in errors being surfaced to the application.

Since indexes with the non-simple collation store collation keys rather than raw strings, it is still the case that the value of the key exposed in the error message may not be meaningful to the user. In the future, we could consider an additional improvement which would expose the raw string whose collation key resulted in a duplicate key error. This may be more meaningful from the user's perspective. This would be a bit more complex to implement, since duplicate key errors are produced by the storage layer, which has no knowledge of the collation implementation or of the raw string that resulted in the duplicated collation key.

Comment by Githook User [ 15/Apr/20 ]

Author:

{'name': 'David Storch', 'email': 'david.storch@mongodb.com', 'username': 'dstorch'}

Message: SERVER-46810 Hex encode collation keys and invalid UTF-8 in dup key error messages

Also changes the error message to include the index's
collation, in order to help users interpret the collation
keys.
Branch: master
https://github.com/mongodb/mongo/commit/9ca0c11865bba19999932a57069daed84a4577ca

Generated at Thu Feb 08 05:12:30 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.