Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-11107

Verify steps that cause OOO keys during insertion and deletion races

    • Type: Icon: Task Task
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Labels:
    • 8

      Summary

      Develop more confidence in the root cause of WT-10961.

      Detailed description

      WT-10961 deals with an issue where a deletion and insertion race and can cause a key to be inserted out of order. The deletion causes the leftmost page in a subtree to be deleted while we insert another set of keys in the new leftmost page. The insertions and deletions happen simultaneously as the btree changes structurally. An optimisation to reduce the number of comparisons in the btree traversal tracks the common prefix as the traversal descends. Since the btree also changes simultaneously, a bug is seen in the optimisation, which can result in incorrect comparison leading to keys getting inserted in an incorrect order.

      To debug WT-10961, we analysed a few failures, and from the given state of the tree, as seen in the core file, we developed a theory. Though the steps proposed in the theory are in the general ballpark of the root cause, especially with bugs seen in the deeper trees we need to verify further.

      WT-10961 is going ahead with a conservative fix that deals with the issue in general. But, with this ticket, we will like to dig deeper into verifying the exact steps that could cause the races and the incorrect insertions.

      A definition of done

      Verified the RCA theory proposed in WT-10961. Post that if needed to optimise the fix for WT-10961, create a separate ticket.

            Assignee:
            backlog-server-storage-engines [DO NOT USE] Backlog - Storage Engines Team
            Reporter:
            sulabh.mahajan@mongodb.com Sulabh Mahajan
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: