Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-3362

Cursor opens should never block for the duration of a checkpoint

    • Type: Icon: Improvement Improvement
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • WT2.9.3, 3.2.15, 3.4.6, 3.5.9
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None
    • Storage 2017-06-19

      In WT-3207 we fixed a situation where a thread could spin on a handle lock during checkpoints (including while holding the schema lock, blocking many other operations).

      It appears that there may be some similar (but less common) source of stalls during checkpoints in a recent case with the fix for WT-3207 in place.

      bruce.lucas commented:

      • in every case there was a failed table drop and resulting closing of all cursors, and then a stall until the end of the checkpoint.
      • the stall coincides with very high cpu utilization and context switch rate, and notably 3 M "pthread mutex shared lock write-lock calls" per second for the duration of the stall.
      • unlike before - "time waiting for the table lock" never budges from 0 so I guess that counter is no longer hooked up in the patch build?

      Looking at the code for that counter one thing that could explain this is a call to __wt_try_writelock in a tight loop. This appears to be a pure CPU loop, i.e. no calls to sched_yield, as we don't see kernel CPU utilization.

      Try to reproduce this situation: insert a sleep into checkpoints, run with aggressive sweeping, try a combination of drops, creates and cursor opens. No operation should block for the duration of the checkpoint.

            Assignee:
            michael.cahill@mongodb.com Michael Cahill (Inactive)
            Reporter:
            michael.cahill@mongodb.com Michael Cahill (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

              Created:
              Updated:
              Resolved: