[SERVER-82028] DDL operations on timeseries collection during tenant migration can crash the recipient due to an invariant failure. Created: 10/Oct/23  Updated: 08/Nov/23  Resolved: 07/Nov/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 7.1.1, 7.2.0-rc0, 7.0.4

Type: Bug Priority: Major - P3
Reporter: Suganthi Mani Assignee: Wei Hu
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v7.1, v7.0
Sprint: Execution NAMR Team 2023-10-16, Execution Team 2023-10-30, Execution Team 2023-11-13
Participants:
Linked BF Score: 120

 Description   

When the timeseries collection doesn't start with the expected 'system.buckets' prefix, we invariant. During the replay of createCollection oplog entry on an inconsistent data (i.e, oplog application mode kInitialSync), if a collection already exits with the same name, we temporarily rename it to temp name (starting with "tempXXX.create") before creating the new collection. So, a timerseries collection db.system.bucket.coll can get temporarily renamed to non-timeseries temp collection db.tempXXX.create. If a user runs the 'listCollection' command before the oplog could finish catching up to make the data consistent, they may observe that the timeseries collection starts with 'tempXXX' rather than 'system.buckets,' potentially triggering an invariant failure.

Though both tenant migration and logical initial sync behaves similarly, considering the fact that we don't allow external user commands when node is in initial sync, this issue only impacts tenant migration where the cloud can run `listCollections` on the recipient primary to get the list of collections to drop on recipient before retrying the migration, following the initial migration attempt failed after partially migrating donor data.

Since this a bug can cause server crash in Serverless, this will affect other tenants located in the recipient. To be noted, The tenant migration will be retired in 7.2, so this is an issue only when with 7.1 and below.



 Comments   
Comment by Githook User [ 08/Nov/23 ]

Author:

{'name': 'Wei Hu', 'email': 'wei.hu@mongodb.com', 'username': 'wh5a'}

Message: SERVER-82028 When renaming a time series collection, the temp name should start with system.buckets prefix
Branch: v7.0
https://github.com/mongodb/mongo/commit/04fd16d7a4adf5a795e69832c17fa1bb5c8bf9fd

Comment by Githook User [ 07/Nov/23 ]

Author:

{'name': 'Wei Hu', 'email': 'wei.hu@mongodb.com', 'username': 'wh5a'}

Message: SERVER-82028 When renaming a time series collection, the temp name should start with system.buckets prefix
Branch: v7.1
https://github.com/mongodb/mongo/commit/9fe1ad3febe7ed0eb38fb5d7406bec02deb37496

Comment by Githook User [ 07/Nov/23 ]

Author:

{'name': 'Wei Hu', 'email': 'wei.hu@mongodb.com', 'username': 'wh5a'}

Message: SERVER-82028 When renaming a time series collection, the temp name should start with system.buckets prefix
Branch: master
https://github.com/mongodb/mongo/commit/db361ce7749ee478f501cf8814d8c5f58100ff1c

Comment by Suganthi Mani [ 24/Oct/23 ]

We decided to go with this option "createCollection code path renames to temp collection, such that it follows the timeseries namespace rules."

Comment by Suganthi Mani [ 10/Oct/23 ]

Fix options: This can be either fixed by making the listCollection not to list temp collections or createCollection code path renames to temp collection, such that it follows the timeseries namespace rules. If we go with the first option, then we need SERVER-82039.

Generated at Thu Feb 08 06:48:01 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.