-
Type: Task
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: None
-
2023-07-25 Absolute unit
There is a discussion about the validity of the fail point after reconciliation in eviction after WT-11242. Here's the quotes from the slack thread.
Chenhao:
I think there are two things. The no-fail point for reconciliation and no-fail point for eviction.
no-fail point for reconciliation is defined here:
/* * If we fail the reconciliation prior to calling __rec_write_wrapup then we can clean up our * state and return an error. * * If we fail the reconciliation after calling __rec_write_wrapup then we must panic as * inserting updates to the history store and then failing can leave us in a bad state. */ if (ret != 0) { WT_ASSERT_ALWAYS(session, addr == NULL || ref->addr != NULL, "Reconciliation trying to free the page that has been written to disk"); WT_IGNORE_RET(__rec_write_err(session, r, page)); WT_IGNORE_RET(__reconcile_post_wrapup(session, r, page, flags, page_lockedp)); /* * This return statement covers non-panic error scenarios; any failure beyond this point is * a panic. Conversely, no return prior to this point should use the "err" label. */ return (ret); }
no-fail point for eviction is defined here:
/* Update the reference and discard the page. */ if (__wt_ref_is_root(ref)) __wt_ref_out(session, ref); else if ((clean_page && !F_ISSET(conn, WT_CONN_IN_MEMORY)) || tree_dead) /* * Pages that belong to dead trees never write back to disk and can't support page splits. */ WT_ERR(__evict_page_clean_update(session, ref, flags)); else WT_ERR(__evict_page_dirty_update(session, ref, flags)); /* * We have loaded the new disk image and updated the tree structure. We can no longer fail after * this point. */
They are containing each other. May be that's the source of confusion? It is possible for the eviction to fail after a successful reconciliation.
Alex:
I think our difference is that I believe the first "line" in reconciliation spans to the end of eviction. You believe it is for a period during reconciliation, then there is another during eviction.
Chenhao:
I think it is hard to ensure we cannot fail after reconciliation. In split, we need to lock the parent after reconciliation.
/* * __wt_split_multi -- * Split a page into multiple pages. */ int __wt_split_multi(WT_SESSION_IMPL *session, WT_REF *ref, int closing) { WT_DECL_RET; __wt_verbose(session, WT_VERB_SPLIT, "%p: split-multi", (void *)ref); /* * Set the session split generation to ensure underlying code isn't surprised by internal page * eviction, then proceed with the split. */ WT_WITH_PAGE_INDEX(session, ret = __split_multi_lock(session, ref, closing)); return (ret); }
However, it is not guaranteed to succeed. If we want to ensure there is no failure after reconciliation, we may have to lock the parent before we do reconciliation. That would apply to all the pages not only the pages that are split because we don't know whether a page will split or not before we do the reconciliation.
I looked at other code paths. They all look fine. The only obstacle is the above code that needs to lock the parent page.
We agree on that we can move the fail point to where it actually fail in split. This will make the fail point more specific and clearer what it is really doing.
- related to
-
WT-11242 format failure: rec_upd_select selected update already written to history store
- Closed