[SERVER-73599] Investigate (for WT) implications of opening a checkpoint cursor returns an error in MongoDB Created: 03/Feb/23  Updated: 29/Oct/23  Resolved: 13/Mar/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 7.0.0-rc0

Type: Task Priority: Major - P3
Reporter: Yuhong Zhang Assignee: Yuhong Zhang
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on WT-10545 Return an error when checkpoint curso... Closed
Problem/Incident
is caused by WT-10545 Return an error when checkpoint curso... Closed
Related
related to SERVER-74529 Use checkpoint id to ensure validatin... Closed
Assigned Teams:
Storage Execution
Backwards Compatibility: Fully Compatible
Sprint: Execution Team 2023-03-20
Participants:
Linked BF Score: 105

 Description   

An error will be returned when opening a checkpoint cursor if the timestamps of a file diverge after WT-10545. We will need to handle this changes in MongoDB, specifically, for background validation.



 Comments   
Comment by Githook User [ 16/Mar/23 ]

Author:

{'name': 'Yuhong Zhang', 'email': 'yuhong.zhang@mongodb.com', 'username': 'YuhongZhang98'}

Message: SERVER-73599 Only handle ENOENT error when not able to open a checkpoint cursor
Branch: master
https://github.com/mongodb/mongo/commit/0c3c07d29e21a2ee16676b90d164060713818b4a

Comment by Yuhong Zhang [ 15/Mar/23 ]

etienne.petrel@mongodb.com pointed out that WT_NOTFOUND error returned from the open_cursor() call will be mapped to ENOENT. Will push another commit to remove the extra error code handling.

Comment by Githook User [ 10/Mar/23 ]

Author:

{'name': 'Yuhong Zhang', 'email': 'yuhong.zhang@mongodb.com', 'username': 'YuhongZhang98'}

Message: SERVER-73599 Skip individually checkpointed indexes during background validation
Branch: master
https://github.com/mongodb/mongo/commit/58c32fd14692dde8c778f3c77cd530c85b54bc76

Comment by Etienne Petrel [ 07/Mar/23 ]

yuhong.zhang@mongodb.com, we changed the purpose of WT-10545; WiredTiger now returns an error if a checkpoint cursor cannot be opened in a specific scenario that involves bulk operations. We should re open this ticket to make sure the server can deal with this new error.

The new ticket WT-10715 will implement the solution where WiredTiger should open previous checkpoints if needed.

Comment by Yuhong Zhang [ 02/Mar/23 ]

WT-10545 decided to open the cursor on the previous checkpoint if there's a mismatch between table's time and global time. The fix will ensure validation for all tables to run on the same checkpoint. 

Comment by Yuhong Zhang [ 17/Feb/23 ]

Yes, we can move on to unblock the checkpoint thread and return a message to the user about the table not being validated.

Comment by Etienne Petrel [ 17/Feb/23 ]

Hi yuhong.zhang@mongodb.com, if the checkpoint cursor cannot be opened due to the reason described in BF-27521/WT-10545, we may want to return EBUSY.

On your side, if my understanding is correct, the checkpoint thread is blocked when you are performing validation. This means you will need to skip validation and let the checkpoint thread do a checkpoint before being able to run validation.

This is just theoretical for now but would that work?

cc: clarisse.cheah@mongodb.com

Generated at Thu Feb 08 06:25:08 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.