[SERVER-82028] DDL operations on timeseries collection during tenant migration can crash the recipient due to an invariant failure. Created: 10/Oct/23 Updated: 08/Nov/23 Resolved: 07/Nov/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 7.1.1, 7.2.0-rc0, 7.0.4 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Suganthi Mani | Assignee: | Wei Hu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Operating System: | ALL | ||||||||
| Backport Requested: |
v7.1, v7.0
|
||||||||
| Sprint: | Execution NAMR Team 2023-10-16, Execution Team 2023-10-30, Execution Team 2023-11-13 | ||||||||
| Participants: | |||||||||
| Linked BF Score: | 120 | ||||||||
| Description |
|
When the timeseries collection doesn't start with the expected 'system.buckets' prefix, we invariant. During the replay of createCollection oplog entry on an inconsistent data (i.e, oplog application mode kInitialSync), if a collection already exits with the same name, we temporarily rename it to temp name (starting with "tempXXX.create") before creating the new collection. So, a timerseries collection db.system.bucket.coll can get temporarily renamed to non-timeseries temp collection db.tempXXX.create. If a user runs the 'listCollection' command before the oplog could finish catching up to make the data consistent, they may observe that the timeseries collection starts with 'tempXXX' rather than 'system.buckets,' potentially triggering an invariant failure. Though both tenant migration and logical initial sync behaves similarly, considering the fact that we don't allow external user commands when node is in initial sync, this issue only impacts tenant migration where the cloud can run `listCollections` on the recipient primary to get the list of collections to drop on recipient before retrying the migration, following the initial migration attempt failed after partially migrating donor data. Since this a bug can cause server crash in Serverless, this will affect other tenants located in the recipient. To be noted, The tenant migration will be retired in 7.2, so this is an issue only when with 7.1 and below. |
| Comments |
| Comment by Githook User [ 08/Nov/23 ] |
|
Author: {'name': 'Wei Hu', 'email': 'wei.hu@mongodb.com', 'username': 'wh5a'}Message: |
| Comment by Githook User [ 07/Nov/23 ] |
|
Author: {'name': 'Wei Hu', 'email': 'wei.hu@mongodb.com', 'username': 'wh5a'}Message: |
| Comment by Githook User [ 07/Nov/23 ] |
|
Author: {'name': 'Wei Hu', 'email': 'wei.hu@mongodb.com', 'username': 'wh5a'}Message: |
| Comment by Suganthi Mani [ 24/Oct/23 ] |
|
We decided to go with this option "createCollection code path renames to temp collection, such that it follows the timeseries namespace rules." |
| Comment by Suganthi Mani [ 10/Oct/23 ] |
|
Fix options: This can be either fixed by making the listCollection not to list temp collections or createCollection code path renames to temp collection, such that it follows the timeseries namespace rules. If we go with the first option, then we need |