Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-11925

Audit the potential for races in setting connection, dhandle flags

    • Type: Icon: Task Task
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • StorEng - Defined Pipeline

      Looking at uses of F_SET on the connection flags, especially, it appears that there's generally little if any protection against races.  If, for example, these two lines are running in two threads at about the same time:

      ./evict/evict_lru.c:    F_SET(conn, WT_CONN_EVICTION_RUN);
      
      ./cursor/cur_backup.c:        F_SET(conn, WT_CONN_INCR_BACKUP);

      at the end, one or the other might complete its statement and the flag would not be set.  (See WT-11099 for a specific example).  There are times that locks are held which would prevent another thread from modifying the same flag, for example:

      ./conn/conn_reconfig.c-    /* Serialize reconfiguration. */
      ./conn/conn_reconfig.c-    __wt_spin_lock(session, &conn->reconfig_lock);
      ./conn/conn_reconfig.c:    F_SET(conn, WT_CONN_RECONFIGURING); 

      That same F_SET can only happen in one thread at a time.  But that F_SET might still race with one of the F_SETs shown previously, and either one of the flags might not be set.

      Ditto for F_CLR, that is, any F_SET or F_CLR can race with any F_SET or F_CLR, which means flags could be errantly left on, as well as off.

      The risk seems pretty high that a flag could get set or unset wrongly, and miscellaneous trouble is sure to result.

      One solution is to use ATOMIC flag macros everywhere for connection flags, it's unlikely to be a big performance hit.  (We probably need to make 32-bit versions of the F_*_ATOMIC macros.  Readers of the flag, like F_ISSET, may or may not need atomic versions, it may be situational.

      This same audit should be ideally be done on other flag fields that are shared among threads, like for WT_DHANDLE, WT_BLOCK as starters.  That could be spun off as different tickets.

            Assignee:
            backlog-server-storage-engines [DO NOT USE] Backlog - Storage Engines Team
            Reporter:
            donald.anderson@mongodb.com Donald Anderson
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated: