test/format (mode=switch) [Elegant stepdown bugs] split-gen assertion in __wt_evict fires for internal pages after elegant step-down

XMLWordPrintableJSON

    • Storage Engines - Transactions
    • 79.246
    • SE Transactions - 2026-07-03
    • 3

      Context

      This bug was discovered during testing on the dedicated elegant step-down feature branch (https://github.com/wiredtiger/wiredtiger/compare/develop...wt-17785-enable-elegant-stepdown-mainine). Currently, step-down restarts for test/format. In this branch, we are replacing the restart with a synchronous, elegant step-down triggered via reconfigure(role=follower). This ticket captures one of the bugs as a result of elegant stepdown.

      Root Cause

      When a leader dirties or splits an internal page before stepping down, that page remains resident in the cache after step-down. The WT-17794 fix (clearing dirty state on outdated disagg read-only pages) only handles leaf pages; internal pages are left in an unresolvable state. When the eviction server or an application-assist thread later selects such an internal page for eviction, it hits the split-generation safety assertion in __wt_evict (evict_page.c:501):

      WT\_ASSERT\(session,
          closing || \!F\_ISSET\(ref, WT\_REF\_FLAG\_INTERNAL\) ||
          F\_ISSET\(session\->dhandle, WT\_DHANDLE\_DEAD | WT\_DHANDLE\_EXCLUSIVE\) ||
          \!\_\_wt\_gen\_active\(session, WT\_GEN\_SPLIT, page\->pg\_intl\_split\_gen\)\);
      

      The assertion fires because after step-down the dhandle is marked read-only and outdated but neither dead nor exclusive, and the split generation of the leader's internal page is still active.

      Evergreen Task / Link

      https://spruce.corp.mongodb.com/version/6a420baae62e3f000732f9f0/tasks (3 occurrences)
      https://spruce.corp.mongodb.com/version/6a42393e3def22000738b429/tasks (4 occurrences)

      Logs & Stack Trace

      Fires from the eviction server thread:

      file:T00001.wt\_stable, eviction\-server: \[WT\_VERB\_DEFAULT\]\[ERROR\]: *wt\_evict, 501: WiredTiger assertion failed: 'closing || \!F\_ISSET\(ref, WT\_REF\_FLAG\_INTERNAL\) || F\_ISSET\(session\->dhandle, WT\_DHANDLE\_DEAD | WT\_DHANDLE\_EXCLUSIVE\) || \!*wt\_gen\_active\(session, WT\_GEN\_SPLIT, page\->pg\_intl\_split\_gen\)'
      file:T00001.wt\_stable, eviction\-server: \[WT\_VERB\_DEFAULT\]\[ERROR\]: \_\_wt\_abort, 32: aborting WiredTiger library
      
      #3  \_\_wt\_abort \(session=0x30dc3fc8f000\) at src/os\_common/os\_abort.c:32
      #4  \_\_wt\_evict \(session=0x30dc3fc8f000, ref=0x30dc35bfa8c0, previous\_state=3, flags=0\) at src/evict/evict\_page.c:501
      #5  \_\_wti\_evict\_page \(session=0x30dc3fc8f000, is\_server=false\) at src/evict/evict\_dispatch.c:254
      #6  \_\_wti\_evict\_lru\_pages \(session=0x30dc3fc8f000, is\_server=false\) at src/evict/evict\_queue.c:140
      #7  \_\_evict\_thread\_run \(session=0x30dc3fc8f000, thread=0x30dc3fe44ff0\) at src/evict/evict\_thread.c:117
      #8  \_\_thread\_run \(arg=0x30dc3fe44ff0\) at src/support/thread\_group.c:32
      

      Also fires from an application-assist thread during transaction rollback:

      #3  \_\_wt\_evict \(session=0x71d43fca3800, ref=0x71d43fe0c6e0, previous\_state=3, flags=0\) at src/evict/evict\_page.c:501
      #4  \_\_wti\_evict\_page \(session=0x71d43fca3800, is\_server=false\) at src/evict/evict\_dispatch.c:254
      #5  \_\_wti\_evict\_app\_assist\_worker \(session=0x71d43fca3800\) at src/evict/evict\_dispatch.c:385
      #6  \_\_wt\_evict\_app\_assist\_worker\_check at src/evict/evict\_inline.h:990
      #7  \_\_wt\_txn\_rollback at src/txn/txn.c:2220
      #8  \_\_session\_rollback\_transaction at src/session/session\_api.c:2084
      #9  rollback\_transaction at test/format/ops.c:664
      #10 ops at test/format/ops.c:1437
      

      Observed consistently across 3 separate patch runs on the elegant step-down branch (3--4 occurrences per run). Always on disagg-switch variants. Same family as WT-17794 -- that fix cleared dirty state on outdated disagg read-only leaf pages; a parallel fix is needed for internal pages.

            Assignee:
            Shoufu Du
            Reporter:
            Sid Mahajan
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: