[SERVER-79446] `insert` ignores `collectionUUID` for time-series collections Created: 28/Jul/23  Updated: 08/Nov/23  Resolved: 28/Sep/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 7.2.0-rc0, 7.0.3, 6.0.12

Type: Bug Priority: Major - P3
Reporter: Felipe Gasper Assignee: Gregory Noma
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
related to SERVER-82924 Time-series collections should indica... Open
Assigned Teams:
Storage Execution NAMER
Backwards Compatibility: Fully Compatible
Backport Requested:
v7.0, v6.0
Sprint: Execution NAMR Team 2023-10-02
Participants:

 Description   

Demonstration:

db = db.getSiblingDB("test");
db.createCollection( "weather", { timeseries: { timeField: "timestamp" } })
db.runCommand({insert:"weather", collectionUUID: UUID("12345678-1234-1234-1234-f776a983614f"), documents:[{timestamp: new Date()}]});

This risks data corruption in mongosync, which relies on `collectionUUID` to indicate when a collection has been renamed or dropped.

Previously I thought this was just a matter of the server deprioritizing CollectionUUIDMismatch errors (see JIRA history), but it’s worse.

Expected behavior: I would expect CollectionUUIDMismatch to be returned as with non-time-series collections.



 Comments   
Comment by Githook User [ 05/Oct/23 ]

Author:

{'name': 'Gregory Noma', 'email': 'gregory.noma@gmail.com', 'username': 'gregorynoma'}

Message: SERVER-79446 Respect `collectionUUID` for time-series inserts

(cherry picked from commit 05c59ba19ef3eddb62cc856129ceb8f8a627d29d)
Branch: v6.0
https://github.com/mongodb/mongo/commit/34ca91c95ec2090976813a1f6979b7765b6fc323

Comment by Githook User [ 05/Oct/23 ]

Author:

{'name': 'Gregory Noma', 'email': 'gregory.noma@gmail.com', 'username': 'gregorynoma'}

Message: SERVER-79446 Respect `collectionUUID` for time-series inserts

(cherry picked from commit 05c59ba19ef3eddb62cc856129ceb8f8a627d29d)
Branch: v7.0
https://github.com/mongodb/mongo/commit/61111e2131bbd9fd57f9ecace54381df246253d7

Comment by Githook User [ 28/Sep/23 ]

Author:

{'name': 'Gregory Noma', 'email': 'gregory.noma@gmail.com', 'username': 'gregorynoma'}

Message: SERVER-79446 Respect `collectionUUID` for time-series inserts
Branch: master
https://github.com/mongodb/mongo/commit/05c59ba19ef3eddb62cc856129ceb8f8a627d29d

Comment by Felipe Gasper [ 26/Sep/23 ]

Another instance of the failure.

Comment by Felipe Gasper [ 22/Sep/23 ]

I’m changing this to a bug because I just found it in another context.

In this test run mongosync’s DDL applier lagged a CRUD applier. Thus, this series of events on the source:

  • create time-series “apricot”
  • drop time-series “apricot”
  • create capped “apricot”
  • insert to capped “apricot”

… became this on the destination:

  • create time-series “apricot”
  • create temporary capped “apricot” (mongosync.tmp.ecf17d21-e25b-49f7-a460-299f6c000d82)
  • insert to capped “apricot”

… which we normally expect to yield a CollectionUUIDMismatch. Mongosync can then repeat the insert to the temporary collection name, and all is well.

Instead, though, we’re getting:

{
  n: 1,
  writeErrors: [
    {
      index: 0,
      code: 2,
      errmsg: "'time' must be present and contain a valid BSON UTC datetime value"
    }
  ],
  ok: 1,
  '$clusterTime': {
    clusterTime: Timestamp({ t: 1695401406, i: 2 }),
    signature: {
      hash: Binary(Buffer.from("0000000000000000000000000000000000000000", "hex"), 0),
      keyId: Long("0")
    }
  },
  operationTime: Timestamp({ t: 1695401400, i: 1 })
}

Code 2, “BadValue”, also happens to be one that Mongosync doesn’t normally handle. We could special-case it, but we’d be relying on the fact that non-time-series documents lack one or more required time-series fields, which needn’t be the case.

Thus, there’s a chance of data corruption.

To reproduce this, just create a time-series collection and try to insert into it with a document {foo:1} and a collectionUUID like UUID("5ef55d9f-a049-48a0-88b1-f776a983614a"). You’ll get an error like the above. Then try to insert

CollectionUUIDMismatch really seems like it should supersede document-specific errors. Alternatively, in the specific case of time-series, it might be reasonable to create a distinct, time-series-specific error that specifically says, “collectionUUID is invalid for time-series collections”.

Comment by Felipe Gasper [ 15/Aug/23 ]

connie.chen@mongodb.com It doesn’t appear to, but it leaves us in a tight spot because we have to assume that OperationNotSupportedInTransaction only happens to mongosync in the context of a time-series collection. That happens to be true right now but seems not at all unlikely to change in the future.

Comment by Felipe Gasper [ 28/Jul/23 ]

Example disparate responses (“weather” is a time-series collection):

test> db.runCommand({insert: "normal", collectionUUID: UUID("fb360baf-65ad-4597-b843-bb7120cec349"), documents:[{foo:"bar"}]})
{
  n: 0,
  writeErrors: [
    {
      index: 0,
      code: 361,
      errmsg: 'Collection UUID does not match that specified',
      db: 'test',
      collectionUUID: new UUID("fb360baf-65ad-4597-b843-bb7120cec349"),
      expectedCollection: 'normal',
      actualCollection: null
    }
  ],
  ok: 1
}
test> db.runCommand({insert: "weather", collectionUUID: UUID("fb360baf-65ad-4597-b843-bb7120cec349"), documents:[{foo:"bar"}]})
{
  n: 1,
  writeErrors: [
    {
      index: 0,
      code: 2,
      errmsg: "'timestamp' must be present and contain a valid BSON UTC datetime value"
    }
  ],
  ok: 1
}

Generated at Thu Feb 08 06:40:59 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.