Michael, I think the current code can race when WT_PAGE_MODIFY structures are allocated. For example, during row- and column-store search, we find the leaf page, and then, if it's a page-modify search, we have this code:
/* * Copy the leaf page's write generation value before reading the page. * Use a read memory barrier to ensure we read the value before we read * any of the page's contents. */ if (is_modify) { /* Initialize the page's modification information */ if (page->modify == NULL) WT_RET(__wt_page_modify_init(session, page)); WT_ORDERED_READ(cbt->write_gen, page->modify->write_gen); } cbt->page = page;
which calls this code:
/* * __wt_page_modify_init -- * A page is about to be modified, allocate the modification structure. */ int __wt_page_modify_init(WT_SESSION_IMPL *session, WT_PAGE *page) { if (page->modify == NULL) WT_RET(__wt_calloc_def(session, 1, &page->modify)); return (0); }
I think any number of threads can be running through this code, so I think there's a race.
I'm committing this change in my tree that I believe fixes the problem:
/* * __wt_page_modify_init -- * A page is about to be modified, allocate the modification structure. */ static inline int __wt_page_modify_init(WT_SESSION_IMPL *session, WT_PAGE *page) { WT_PAGE_MODIFY *modify; if (page->modify == NULL) { WT_RET(__wt_calloc_def(session, 1, &modify)); /* * Multiple threads of control may be searching and deciding * to modify a page, if we don't do the update, discard the * memory. */ if (!WT_ATOMIC_CAS(page->modify, NULL, modify)) __wt_free(session, modify); } return (0); }
It will get pushed when I push snapshots, but I wanted to call it out separately for your thoughts.
- is related to
-
WT-112 Drop followed by create fails
- Closed