Catch EBUSY errors in a retry loop on find operations

    • Type: Task
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Storage Engines, Storage Engines - Server Integration
    • 0
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      There have been a few BFs with a node fasserting on an EBUSY error during a find operation due to a lock free read happening concurrently will full validation and being blocked by WT::verify (See linked BFs). We addressed these by changing the validate test hook to have full:false since full validation isn't run on active nodes. However, after speaking with StorEx and gregory.wlodarek@mongodb.com, we discussed that a more robust solution would be a catch/retry loop in SESI code so that such errors from the WT layer get caught and retried there, rather than propagating up to callers in higher layers. 

      We originally discussed having this retry loop or additional concurrency control in the replication code, but that was determined to be too specific/complex of a solution to this problem and a layering violation. For more context on this, see SERVER-110331 and SERVER-109658.

      Gregory mentioned that this is a known issue and that there was a similar ticket filed for storage engines a few years ago, but it got lost, so I'm filing this new ticket (please feel free to close this as a duplicate or link it if the original ticket is found!). 

            Assignee:
            Unassigned
            Reporter:
            Ruchitha Rajaghatta
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated: