[SERVER-31943] Two documents can share the same documentKey when using a non-simple collation. Created: 13/Nov/17 Updated: 20/Nov/17 Resolved: 20/Nov/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Querying, Sharding, Write Ops |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Charlie Swanson | Assignee: | David Storch |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Operating System: | ALL | ||||||||
| Steps To Reproduce: | Download the attached 'repro.js', and run
|
||||||||
| Sprint: | Query 2017-12-04 | ||||||||
| Participants: | |||||||||
| Description |
|
When a collection with a non-default collation is sharded the chunks are arranged, migrated and targeted using the simple collation. This can mean that two documents that have identical shard keys according to the collation can live on different shards. For example, when using a case-insensitive collation, the chunk containing the shard key "abc" can live on a different shard than the chunk containing the shard key "ABC" - even though these two documents might be considered equal when using the case-insensitive collation. This means that the combination of shardKey and _id is not unique, since there is nothing to enforce that a document with shard key "ABC" has a different _id than the document with shard key "abc". See the attached repro.js for an example of the impact of this - a non-multi write will update two documents. |
| Comments |
| Comment by David Storch [ 20/Nov/17 ] |
|
This is a duplicate of SERVER-30970. While it is true that the attached repro.js script has a non-multi write which updates two documents, this is only possible due to the targeting behavior described in SERVER-30970. Namely, the update targeting code is willing to scatter a non-multi update to all shards so long as it is update-by-_id. This is done under the assumption that the _id field is globally unique—an assumption that is simply unsound. The repro script creates a document with _id:0 on two different shards, which goes unchecked by the system. The fact that documents with the shard key "abc" and "ABC" can live on different shards when the collection has the case-insensitive default collation is not a bug. No matter what the collection's default collation is, the shard key is required to have the simple collation. This is enforced by the shardCollection command, and is documented here: https://docs.mongodb.com/manual/reference/command/shardCollection/#collation Under the simple collation, the strings "abc" and "ABC" are unequal, and therefore are permitted to reside on different shards. |