Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-7991

improve row/byte-count information in split-heavy workloads

    • Type: Icon: Improvement Improvement
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None

      A potential feature in progress is a new WT_SESSION API that returns row- and byte-count information for the object as a whole as well as cursor ranges in the object (see WT-7408).

      A weakness in the cursor range implementation is some number of "failures" that is, cursor range requests that will return WT_NOTFOUND when they can't return useful information because the tree is dynamic enough there's no available row/byte-count information for the subtree we're interested in.

      First, this is an internal configuration, that is, there's a count of "missing information", and once that count exceeds a threshold, the call will return failure. Second, this is workload and query dependent (for example, a heavy insert load w/o a lot of accompanying reconciliation, or requests for small cursor ranges where a single unavailable row/byte-count seriously messes up the accuracy). The only workaround for applications is to repeat the query (as reconciliation of the pages will fill in the missing information when the page is eventually written, for example, by the next checkpoint).

      We may want to tune the threshold for failure (if a database is changing rapidly, then even if we could return the correct point-in-time number, by the time the application could act on the result, that result might be off by a lot, so there's not a lot of penalty for guessing at an answer).

      We may also want to fix this by tracking row/byte-count information across splits. We could do that by adding 16B per WT_REF and aggregate row/byte-count information through splits. It's not a trivial change and there's a space cost, but it's not terribly difficult, either.

            Assignee:
            backlog-server-storage-engines [DO NOT USE] Backlog - Storage Engines Team
            Reporter:
            keith.bostic@mongodb.com Keith Bostic (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: