Sharded collections permit unique indexes with non-"simple" collations, leading to uniqueness violations

XMLWordPrintableJSON

    • Catalog and Routing
    • Fully Compatible
    • ALL
    • v8.2, v8.0, v7.0
    • CAR Team 2026-01-05, CAR Team 2026-01-19, CAR Team 2026-02-02, CAR Team 2026-02-16, CAR Team 2026-03-02, CAR Team 2026-03-16
    • 0
    • 3
    • 🟥 DDL
    • None
    • None
    • None
    • None
    • None
    • None

      Issue Status as of May 14, 2026

      ISSUE DESCRIPTION AND IMPACT

      A sharded collection in more than one shard can silently violate a unique index when that index uses a non-simple collation and the shard key is equal or a strict prefix of the key pattern. Different shards can each admit documents whose indexed values are distinct at the raw string level (i.e. under the simple collation) but equal under the index collation, so multiple logically “duplicate” keys can exist under a unique index. This breaks invariants like “one account per email” or “one active subscription per user”, but does not cause data loss or sharding metadata corruption.
      For example, a collection sharded on { a: 1 } with a unique index { a: 1 } and a case-insensitive collation (e.g. { locale: "en_US", strength: 2 }) can accept both { a: "YES" } and { a: "yes" } on different shards, even though these compare equal under the index collation.

      DIAGNOSIS AND AFFECTED VERSIONS

      This issue is present in all MongoDB versions that support sharding, unique indexes, and non-simple collations (3.4+). It can occur only if all of the following are true:

      • Topology: Sharded cluster with two or more shards.
      • Collection: Sharded.
      • Unique Index: There is at least one unique index where the shard key is equal or a strict prefix of the index key pattern and whose effective collation is non-simple (e.g. locale-specific or case-insensitive).
      • Workload: Inserts or updates can produce values that are distinct byte-wise but equivalent under the index collation (e.g. "YES" vs "yes") and route and store those values to different shards.
        A deployment is not impacted if any of the following hold:
      • It does not use sharded clusters, or its sharded clusters each have only a single shard.
      • It does not use unique indexes on sharded collections.
      • For every sharded collection, all unique indexes where the shard key is equal or is a strict prefix of the key pattern use the simple collation (i.e., there is no such unique index with a non-simple collation).

      REMEDIATION AND WORKAROUNDS

      Prevent new exposure going forward

      • Until upgraded, avoid creating new unique indexes with non-simple collations where the shard key equals or is a strict prefix of the index key, and avoid schema changes (e.g. via `collMod`) that would introduce such indexes on sharded collections. Where possible, keep global uniqueness constraints aligned with the simple collation or enforce uniqueness on non-simple collation in unsharded collections only.
      • Upgrade to fixed versions: The server-side fix is included in 7.0.32, 8.0.21, 8.2.7 and 8.3.0, deployments should upgrade to at least these patch versions in their series (or later) so that new incompatible combinations are rejected.
      • On fixed versions, the following entry points now fail when they would create or preserve an incompatible shape:
      • shardCollection fails if the requested shard key conflicts with an existing unique, non-simple-collation index where the shard key equals/prefixes the index key pattern.
      • createIndex fails when asked to create a unique index with a non-simple collation whose key starts with the shard key.
      • collMod, reshardCollection and refineCollectionShardKey fail when a modification would introduce or preserve such a combination.

      Detect and remediate existing exposure

      For clusters that may already have affected schemas, operators should:

      • Identify sharded collections that have unique indexes with non-simple collations where the shard key equals or prefixes the index key pattern.
      • For those collections, run targeted scans that group by the collated key value (under the same collation as the index) to detect duplicates; this can be implemented via ad-hoc scripts or helper tooling.
      • If duplicates are found, customers must de-duplicate at the application layer (deciding which document(s) to keep) and may need to gate writes or schedule maintenance windows for large/high-traffic collections; there is no cheap, server-level remediation that automatically restores nor ensures uniqueness for sharded collections with non-simple collations. Only unsharded collections can reliably enforce unique indexes with non-simple collations after cleanup.
      • Rebuild affected indexes.
        Helpers scripts that could be taken as base have been provided in this repo.

      —-----------------------------------------------------

      Original description

      This is a beginning of time bug (since MongoDB 3.4) from when collation was first introduced.

      A unique index { key: { a: 1 }, unique: true, collation: { locale: "en_US", strength: 2 } } does not permit both of the documents { a: "YES" } and { a: "yes" } to be inserted. Yet sharded collections are always partitioned according to the "simple" collation independent of the collection's default collation. This means a partitioning scheme of { key: { a: 1 }, collation: { locale: "simple" } } would treat these two shard key values as distinct and possibly place them on separate shards. In such a scenario, the unique indexes { key: { a: 1 }, unique: true, collation: { locale: "en_US", strength: 2 } } on each of the two shards would only contain one of these two shard key values though both documents can exist simultaneously in the whole sharded cluster. Therefore global uniqueness enforcement cannot be implied from local uniqueness enforcement.

      The core problem is neither createIndexes or shardCollection account for the collation of unique indexes. In particular, ShardKeyPattern::isIndexUniquenessCompatible() accepts only the index key pattern as an input.

            Assignee:
            Marcos José Grillo Ramirez
            Reporter:
            Max Hirschhorn
            Votes:
            0 Vote for this issue
            Watchers:
            16 Start watching this issue

              Created:
              Updated:
              Resolved: