[SERVER-31943] Two documents can share the same documentKey when using a non-simple collation. Created: 13/Nov/17  Updated: 20/Nov/17  Resolved: 20/Nov/17

Status: Closed
Project: Core Server
Component/s: Querying, Sharding, Write Ops
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Charlie Swanson Assignee: David Storch
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File repro.js    
Issue Links:
Duplicate
duplicates SERVER-30970 Don't allow single-updates that aren'... Backlog
Operating System: ALL
Steps To Reproduce:

Download the attached 'repro.js', and run

python buildscripts/resmoke.py repro.js

Sprint: Query 2017-12-04
Participants:

 Description   

When a collection with a non-default collation is sharded the chunks are arranged, migrated and targeted using the simple collation. This can mean that two documents that have identical shard keys according to the collation can live on different shards. For example, when using a case-insensitive collation, the chunk containing the shard key "abc" can live on a different shard than the chunk containing the shard key "ABC" - even though these two documents might be considered equal when using the case-insensitive collation. This means that the combination of shardKey and _id is not unique, since there is nothing to enforce that a document with shard key "ABC" has a different _id than the document with shard key "abc".

See the attached repro.js for an example of the impact of this - a non-multi write will update two documents.



 Comments   
Comment by David Storch [ 20/Nov/17 ]

This is a duplicate of SERVER-30970. While it is true that the attached repro.js script has a non-multi write which updates two documents, this is only possible due to the targeting behavior described in SERVER-30970. Namely, the update targeting code is willing to scatter a non-multi update to all shards so long as it is update-by-_id. This is done under the assumption that the _id field is globally unique—an assumption that is simply unsound. The repro script creates a document with _id:0 on two different shards, which goes unchecked by the system.

The fact that documents with the shard key "abc" and "ABC" can live on different shards when the collection has the case-insensitive default collation is not a bug. No matter what the collection's default collation is, the shard key is required to have the simple collation. This is enforced by the shardCollection command, and is documented here:

https://docs.mongodb.com/manual/reference/command/shardCollection/#collation

Under the simple collation, the strings "abc" and "ABC" are unequal, and therefore are permitted to reside on different shards.

Generated at Thu Feb 08 04:28:40 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.