[SERVER-80203] Normalization of time-series meta field can break insert targeting Created: 17/Aug/23  Updated: 04/Dec/23  Resolved: 28/Sep/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 7.1.1, 7.2.0-rc0, 5.0.22, 7.0.3, 6.0.12

Type: Bug Priority: Major - P3
Reporter: Arun Banala Assignee: Gregory Noma
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Problem/Incident
Related
is related to SERVER-81523 Normalization of time-series meta fie... Backlog
is related to SERVER-83855 Expand test coverage for time-series ... Open
is related to SERVER-52967 support metadata field for time-serie... Closed
is related to SERVER-55484 Improve efficiency of BucketCatalog::... Closed
is related to SERVER-54647 Utilize KeyString in BucketCatalog in... Closed
is related to SERVER-66685 Fetch archived buckets from disk and ... Closed
Assigned Teams:
Storage Execution NAMER
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v7.1, v7.0, v6.0, v5.0
Sprint: Execution NAMR Team 2023-09-18, Execution NAMR Team 2023-10-02
Participants:
Case:

 Description   
Issue summary SERVER-80203

Issue Summary
This is a time series sharded cluster operation routing issue which can result in metadata inconsistencies. Documents affected by this issue may be written to the wrong shard, such that it may not be returned by queries and may be subject to later deletion.

This affects time series sharded collections starting in MongoDB version 5.0.6 through versions 5.0.21, 6.0.11 and 7.0.2 and Rapid Release version 7.1.1.

Issue Description and Impact
Documents inserted into a sharded Time Series collection may be routed to an incorrect shard and become un-owned by any shard if:

  • The document's time series metaField contains an embedded document/object composed of multiple fields and the shard key of the collection includes that object. Examples include:
    • A metaField value of { "a" : 1, "b" : 1 } when the shard key is the metaField.
    • A metaField value of { "a" : 1, "b" : {"c": 1, "d": 1} } when the shard key includes metaField.b.
  • At insert time, the fields in the embedded document or object are not provided in alphabetic (lexicographic) order. Importantly, app-supplied key order within documents is not guaranteed by all drivers.
    • The same shard does not own the two chunks that own both:
    • The alphabetically (lexicographically) ordered version of the embedded document
    • The provided version of the embedded document.

This occurs because:

  • A mongos routers incorrectly route documents to shards using the provided metaField value. For example, { "b" : 1, "a" : 1 } is routed to the shard that owns the chunk range for { "b" : 1, "a" : 1 }.
  • At insert time, mongod nodes normalize to alphabetic/lexicographic order the metaField values that are embedded documents. For example, { "b" : 1, "a" : 1 }, becomes { "a" : 1, "b" : 1 }.

When the shard that receives a vulnerable document does not own the chunk range for the normalized form of the shard key / metaField value, the document is orphaned and effectively lost. For example - a shard which has the chunk range { "b" : 1 }{ "$maxKey" : 1 } could receive the document with metaField { "b" : 1, "a" : 1 } even though the document is persisted with the metaField { "a" : 1, "b" : 1 }.

Note that when the correct chunk range and "incorrect" chunk range are owned by the same shard, this issue is self-healing.

Documents orphaned in this way:

  • Will not be returned by queries issued through a mongos.
  • May be deleted under the following circumstances:
    • Orphaned documents are normalized in such a way that they fall into a chunk range with a pending chunk deletion task.
    • A chunk migration to a destination shard containing orphans in the given range is aborted, resulting in the creation of a chunk deletion task over the chunk range in which the orphaned documents exist.

Workaround

Upgrading to MongoDB versions 5.0.22, 6.0.12, or 7.0.3 prevents the issue from occurring, but remediation is still required. See the Remediation section below for further guidance.

Please reach out to MongoDB Support if you are unable to upgrade to a version containing a fix for this issue.

Remediation

We recommend taking the following actions, in order, to identify and preserve orphaned documents for later recovery. Please review all steps carefully before proceeding.

  1. Disable the balancer using sh.stopBalancer().
  2. Upgrade the cluster to the latest maintenance release (See the Release Notes for information on the latest versions).
  3. Follow the guidance at MongoDB Support Tools - Sharded Time Series Orphan Check to identify and recover orphaned time series documents.

Please reach out to MongoDB Support if you have any questions or issues with performing the steps above.

Documentation



 Comments   
Comment by Githook User [ 16/Oct/23 ]

Author:

{'name': 'Gregory Noma', 'email': 'gregory.noma@gmail.com', 'username': 'gregorynoma'}

Message: SERVER-80203 Normalize metadata for time-series insert targeting

(cherry picked from commit 1a77b9c2182a01870262fac6b9e4c6df2dc05fa5)
Branch: v7.1
https://github.com/mongodb/mongo/commit/a24f8791f900c6d2dad5425930f45e3341f067a9

Comment by Githook User [ 04/Oct/23 ]

Author:

{'name': 'Gregory Noma', 'email': 'gregory.noma@gmail.com', 'username': 'gregorynoma'}

Message: SERVER-80203 Normalize metadata for time-series insert targeting

(cherry picked from commit 48ce9d9b4cc7c623cc1f4a67db300ca7d3c800a5)
Branch: v5.0
https://github.com/mongodb/mongo/commit/9cb9725acb96fcb5e5f1f7f42f0030bd8d654286

Comment by Githook User [ 03/Oct/23 ]

Author:

{'name': 'Gregory Noma', 'email': 'gregory.noma@gmail.com', 'username': 'gregorynoma'}

Message: SERVER-80203 Normalize metadata for time-series insert targeting
Branch: v6.0
https://github.com/mongodb/mongo/commit/48ce9d9b4cc7c623cc1f4a67db300ca7d3c800a5

Comment by Maria Prinus [ 02/Oct/23 ]

The release team discussed this issue and based on it we decided not to block 7.1.0 with this ticket. This change must be backported to 7.0 (done in BACKPORT-16944), and we recommend to backport this change to 7.1.1 if we have to release one.

Comment by Githook User [ 02/Oct/23 ]

Author:

{'name': 'Gregory Noma', 'email': 'gregory.noma@gmail.com', 'username': 'gregorynoma'}

Message: SERVER-80203 Normalize metadata for time-series insert targeting

(cherry picked from commit 1a77b9c2182a01870262fac6b9e4c6df2dc05fa5)
Branch: v7.0
https://github.com/mongodb/mongo/commit/26a9f2f4b1ff19cc5bceeee172f8ea452bc83f00

Comment by Githook User [ 28/Sep/23 ]

Author:

{'name': 'Gregory Noma', 'email': 'gregory.noma@gmail.com', 'username': 'gregorynoma'}

Message: SERVER-80203 Normalize metadata for time-series insert targeting
Branch: master
https://github.com/mongodb/mongo/commit/1a77b9c2182a01870262fac6b9e4c6df2dc05fa5

Comment by Maria Prinus [ 05/Sep/23 ]

Downgrading to P3 to mark this one as a non-blocking ticket for the releases.

Comment by Maria Prinus [ 31/Aug/23 ]

britt.snyman@mongodb.com confirmed that this is not a blocker for MongoDB 5.0.21-rc0

Generated at Thu Feb 08 06:42:57 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.