[SERVER-86332] Reformat shardCollection to make Unique and Collations Clearer Created: 06/Feb/24  Updated: 07/Feb/24

Status: Needs Scheduling
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Matt Panton Assignee: Backlog - Catalog and Routing
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Catalog and Routing
Participants:

 Description   

The shard collection command takes two options unique and collation that do not refer/modify the collection, but are rather options for modifying the shard key index.

For example if I create a collection:

use test
db.runCommand(
   {
     create: "nonDefaultCollationCollection",
     collation: { "locale": "nl" }
   }
 )

I've set the default collation for the collection to nl(Dutch) and the default _id index has inherited that collation. db.nonDefaultCollationCollection.getIndexes() returns:

[
  {
    v: 2,
    key: { _id: 1 },
    name: '_id_',
    collation: {
      locale: 'nl',
      caseLevel: false,
      caseFirst: 'off',
      strength: 3,
      numericOrdering: false,
      alternate: 'non-ignorable',
      maxVariable: 'punct',
      normalization: false,
      backwards: false,
      version: '57.1'
    }
  }
] 

To shard the collection on a non-_id field I am required by the server to pass a simple locale document to shardCollection.

db.adminCommand(
   {
     shardCollection: "test.nonDefaultCollationCollection",
     key: { randomField: 1 },
     unique: true,
     collation: { locale: "simple" }
   }
 )

Running db.nonDefaultCollationCollection.getIndexes() now shows that the index on randomField has the default simple collation due to the lack of a collation document and is enforcing uniqueness 

[
  {
    v: 2,
    key: { _id: 1 },
    name: '_id_',
    collation: {
      locale: 'nl',
      caseLevel: false,
      caseFirst: 'off',
      strength: 3,
      numericOrdering: false,
      alternate: 'non-ignorable',
      maxVariable: 'punct',
      normalization: false,
      backwards: false,
      version: '57.1'
    }
  },
  {
    v: 2,
    key: { randomField: 1 },
    name: 'randomField_1',
    unique: true
  }
] 

To enhance consistency with the create command and what is actually happening within the server when executing shardCollection the unique and collation options should be moved to an field that encapsulates both options much like the current timeseries field has multiple options currently. 



 Comments   
Comment by Max Hirschhorn [ 07/Feb/24 ]

The shardCollection command does more than create an index to support the shard key pattern if such an index doesn't already exist and the collection is empty. The shardCollection command also records the partitioning scheme on the config server. How data is partitioned must be aware of the collation because routing decisions involve comparing values which may be/contain strings and are thus needing to be compared with the collator for correctness.

The requirement to run the shardCollection command with {collation: {locale: "simple"}} is due to the convention of commands which accept a collation (e.g. find, update, delete) implying to use the collection's default collation when the parameter is omitted. However due to PM-1930 not being complete the simple collation is the only option for partitioning a collection.

Can more be said here on what is inconsistent with the create command?

Generated at Thu Feb 08 07:00:01 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.