-
Type: Bug
-
Resolution: Won't Fix
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
8
-
Storage - Ra 2021-09-20
In working WT-7507, haribabu.kommi and I came across a problem inĀ test_prepare_hs03.
Any operation that requires exclusive access to an object (currently that list includes verify, salvage/rollback-to-stable, and upgrade), will first attempt to close all of the existing open handles and then open an exclusive handle on the object.
If there are dirty updates in the cache for the object, as part of closing all open handles we call __wt_txn_checkpoint(), which hits this code:
/* * Don't flush data from modified trees independent of system-wide checkpoint when either there * is a stable timestamp set or the connection is configured to disallow such operation. * Flushing trees can lead to files that are inconsistent on disk after a crash. */ if (btree->modified && !bulk && !__wt_btree_immediately_durable(session) && (S2C(session)->txn_global.has_stable_timestamp || (!F_ISSET(S2C(session), WT_CONN_FILE_CLOSE_SYNC) && !metadata))) return (__wt_set_return(session, EBUSY));
and returns EBUSY, and the operation fails.
This is easy to reproduce with test_prepare_hs03, and haribabu.kommi believes he's seen it where MongoDB reports EBUSY returns from collection validation. (As MongoDB surfaces the collection validation operation through its API, it makes sense a MongoDB application could see this failure.)
alexander.gorrod, vamsi.krishna, we could potentially:
- force a database-wide checkpoint as part of an operation requiring exclusive access to the object (if EBUSY is returned from our attempt to close all open handles, we could do a database-wide checkpoint and then try again).
- document this away, although it's messy to do that because as soon as a checkpoint completes, then the failing operation can proceed, so it's a case of repeatedly trying until the operation succeeds.
- haribabu.kommi thinks that this code may be too pessimistic, and that maybe we can relax the constraints, that history-store means the check may no longer be required.
Anyway, can you folks weigh in on this one and give us some guidance?
- has to be done before
-
WT-8809 Document to do checkpoint if ebusy is returned for operations requiring exclusive access
- Closed
- related to
-
WT-8695 Remove file_close_sync config and disallow single file checkpoint
- Closed
-
SERVER-56882 unable to complete full validation on collection after failed hashed index insert
- Closed
-
WT-7902 Retry the alter command after a system wide checkpoint
- Closed
-
WT-8126 Mark btree as dirty only if not newly created when instantiating a deleted row-store leaf page
- Closed
-
WT-8813 Improve access to methods requiring an exclusive handle
- Backlog