Consider statistics improvement to handle collStats anti-pattern

    • Type: Improvement
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Storage Engines
    • None
    • None

      There are times when either end users or tools do a collStat command on all collections/tables.  This opens a huge number of new dhandles and opens the btree for any file: uri.  Up to now, I think the advice is "don't do that".

      One thing to note - dhandles hold the statistics array.  That means that when a dormant btree is reopened, any statistic that tracks "how many times has operation {insert/update/remove/...} been performed on this btree" is set to 0 - there is no memory of these statistics stored on disk.  So the only statistics that are non-zero are statistics that are set in the act of reading in the root page of the btree.  Statistics like 'maximum internal page size' are generally not changing, and are determined by configuration in the metadata (not stored in the Btree itself).  So opening the Btree to get this info seems like a waste.

      One option to consider is a config when opening a cursor that indicates that no dormant Btree should be reopened.

      Another interesting idea is to have a new WT maintained table that contains the last known statistics for any Btree that has been closed. Call it WiredTigerStats.wt . Entries in the table are indexed by URI, with the value being the stats array (compressed).  When a Btree is closed or if its stat array is discarded, its stat array is written to the stats btree.  Then, when a stat is done on a URI - we see if the dhandle is already in memory and its stat array hasn't been discarded (**).  If so, we have the stats we need using the usual route.  If not, we get the stats from the WT stats table.  It does mean one more Btree insert on every file close but no extra I/O implied.

      This has the very desirable characteristic that ongoing stats, like those that track how many inserts, etc. will be retained across btree opens.  They could be retained across restarts if we wanted to do the extra work to make this a durable table, otherwise it could be retained in memory for the life of the connection.

      (**) Currently we don't discard the stats when the btree is closed, but we should – see WT-10208.

            Assignee:
            [DO NOT USE] Backlog - Storage Engines Team
            Reporter:
            Donald Anderson
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: