Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-7929

Investigate a solution to avoid FTDC stalls during checkpoint

    • Storage Engines
    • 8
    • TheMoon-StorEng - 2023-09-19, NachoCheese - 2023-10-03, Joker - StorEng - 2023-10-17
    • v7.1, v7.0, v6.0, v5.0, v4.4

      In WT-7534, we investigated why FTDC stalls when a checkpoint occurs. Since checkpointing and retrieving statistics require a lock on the tables they work on, they cannot happen at the same time. The lock required by a checkpoint is WT_WITH_TABLE_READ_LOCK and the locks required by the statistics processing are WT_WITH_SCHEMA_LOCK and WT_WITH_TABLE_WRITE_LOCK in __wt_curstat_table_init.

      Reproducing the issue:

      The issue can be reproduced through the many-coll-test. It is possible to add a sleep inside WT_WITH_TABLE_READ_LOCK when the checkpoint requires it to emphasize the stalls. See this comment:

      https://jira.mongodb.org/browse/WT-7534?focusedCommentId=3967202&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-3967202

      Scope of the ticket:

      • Find a solution to avoid those stalls coming from:
        • __wt_schema_open_indices
          • Suggestion: Before calling __wt_schema_open_indices, can we know if there are any indices ? Can we give the caller the possibility to skip those indices ?
        • And __wt_schema_get_table

      Definition of done:

      Agree on the best solution for the issue and create a new ticket to implement the solution.

        1. with code change.png
          25 kB
          Monica Ng
        2. without code change.png
          24 kB
          Monica Ng

            Assignee:
            monica.ng@mongodb.com Monica Ng
            Reporter:
            etienne.petrel@mongodb.com Etienne Petrel
            Votes:
            0 Vote for this issue
            Watchers:
            35 Start watching this issue

              Created:
              Updated:
              Resolved: