Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-12535

Optimize time spent in __wt_update_obsolete_check()

    • Storage Engines
    • 8
    • 2024-03-19 - PacificOcean, 2024-04-02 - GreatMugshot, 나비 (nabi) - 2024-04-16, Nick - 2024-04-30

      __wt_update_obsolete_check() iterates over an update chain checking for obsolete updates. On workloads that with long update chains, we have observed a noticeable amount of time spent on these list traversals. 

      This ticket proposes several lines of optimization. 

      Here is the core loop:

       

         367	    for (first = NULL, count = 0; upd != NULL; upd = upd->next, count++) {
         368	        if (upd->txnid == WT_TXN_ABORTED)
         369	            continue;
         370	
         371	        /*
         372	         * WiredTiger internal operations such as Rollback to stable and prepare transaction
         373	         * rollback adds a globally visible tombstone to the update chain to remove the entire key.
         374	         * Treating these globally visible tombstones as obsolete and trimming update list can cause
         375	         * problems if the update chain is getting accessed somewhere. To avoid this problem, skip
         376	         * these globally visible tombstones from the update obsolete check were generated from
         377	         * prepare transaction rollback and not from RTS, because there are no concurrent operations
         378	         * run in parallel to the RTS to be affected.
         379	         */
         380	        if (upd->txnid == WT_TXN_NONE && upd->start_ts == WT_TS_NONE &&
         381	          upd->type == WT_UPDATE_TOMBSTONE && upd->next != NULL &&
         382	          upd->next->txnid == WT_TXN_ABORTED && upd->next->prepare_state == WT_PREPARE_INPROGRESS)
         383	            continue;
         384	
         385	        if (__wt_txn_upd_visible_all(session, upd)) {
         386	            if (first == NULL && WT_UPDATE_DATA_VALUE(upd))
         387	                first = upd;
         388	        } else
         389	            first = NULL;
         390	
         391	        /* Cannot truncate the updates if we need to remove the updates from the history store. */
         392	        if (F_ISSET(upd, WT_UPDATE_TO_DELETE_FROM_HS))
         393	            first = NULL;
         394	    }
      

       

      There are a few possible optimizations here.

      1. Linked list traversal is slow because it has poor locality – each link accesses a different, and unpredictable piece of memory. At the beginning of each iteration (i.e., before line #368), we could tell the CPU to prefetch the next list element, probably via __builtin_prefetch. It won't solve the problem, but will shave some cycles off the memory stall when we actually need that element. (h/t mathias@mongodb.com)
      2. We only take the second continue, at line #383, if upd->next->txnid == WT_TXN_ABORTED. That means that on the next iteration through the loop, the condition at line #368 will be true and we will continue again. So we could save a bit of time by advancing the loop variables at line #383 and skipping an iteration. (This is also a risky change, since it makes the code more brittle. So we probably don't want to do this unless we believe we're seeing a lot of tombstones on obsolete checks.)
      3. It appears that we call __wt_update_obsolete_check(), via __wt_update_serial(), on almost every insert. That means that on a frequently-update key, we could wind up making frequent traversals down a long update chain. Maybe we could apply a heuristic here and not check for obsolete elements on every insert? 

       

            Assignee:
            haribabu.kommi@mongodb.com Haribabu Kommi
            Reporter:
            keith.smith@mongodb.com Keith Smith
            Votes:
            0 Vote for this issue
            Watchers:
            15 Start watching this issue

              Created:
              Updated: