Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-10208

Consider ways to free statistics array for dormant data handles

    • Type: Icon: Improvement Improvement
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Labels:
    • 8
    • StorEng - Defined Pipeline

      Data handles (dhandles) for files have a statistics array.  Statistics are not allocated for other kinds of data handles.  There are currently 265 statistics tracked for each data handle.  Each statistic uses 8 bytes, so sizeof(WT_DSRC_STATS) == 2120 bytes.  But... each file dhandle actually has 23 of these allocated - having a prime number, like 23, allows each session to have a bucket of stats that is less likely to interfere with other sessions that may be incrementing a statistic for the same dhandle.  So a file cursor has 48760 bytes, just for statistics.

      When a collection stops being actively used for a time, the dhandle sweep thread will see that, close the file and btree.  However, it must keep the dhandle around because there are references to it from session caches.  It takes a while for each session (via the session sweep) to notice that the dhandle is closed and thus remove the dhandle from the session cache.  And if there's lots of sessions, some that are sweeping slowly (for various reasons, e.g. HELP-39798), the dhandle may take a long time to be removed.  When there are hundreds of thousands of collections (each would have at least two file: dhandles) it adds up.  In the help ticket there are 300K collections, that's 600K files. 48K * 600K = 28.8G.  Just for the statistics.  (* note: I did these calculations based on the current number of stats being tracked, rather than the number of stats in 4.2.23.  So undoubtedly it's a smaller number).  In that ticket, there's evidence that the vast majority of dhandles have had their underlying file closed, and they are waiting for the references from sessions to be resolved.  There's more going on in the ticket and we probably need to be more aggressive in our session sweeping.  But there's a cost to being overly aggressive, so there will always be some lag in getting the dhandles closed.

      It seems like at the time we close the files, we could free the storage for the stats.  If the dhandle gets "reopened", the stats will need to be recreated.  The close of the dhandle and its (re)open hold an exclusive lock.  We should be able to guarantee that there be no code actively updating the stats while the file is closed.  Freeing this memory timely is a simple change that could have a big effect when there is a sweep lag.

            Assignee:
            backlog-server-storage-engines [DO NOT USE] Backlog - Storage Engines Team
            Reporter:
            donald.anderson@mongodb.com Donald Anderson
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated: