-
Type:
Bug
-
Resolution: Fixed
-
Priority:
Major - P3
-
Affects Version/s: None
-
Component/s: Checkpoints
-
None
-
Storage Engines - Persistence
-
SE Persistence backlog
-
None
There is a test case that tests a slow-locking implementation that is encountering a case where the coordination between the checkpoint server and workers does not seem right.
The threads are:
[2026/03/13 16:03:58.308] Id Target Id Frame [2026/03/13 16:03:58.308] * 1 Thread 0xffff9c472040 (LWP 133466) "python3" 0x0000ffff9acbefb4 in __futex_abstimed_wait_cancelable64 () from /lib64/libc.so.6 [2026/03/13 16:03:58.308] 2 Thread 0xffff96322b80 (LWP 133715) "log-wrlsn-serve" 0x0000ffff9acbefb4 in __futex_abstimed_wait_cancelable64 () from /lib64/libc.so.6 [2026/03/13 16:03:58.308] 3 Thread 0xffff96b32b80 (LWP 133714) "log-close-serve" 0x0000ffff9acbefb4 in __futex_abstimed_wait_cancelable64 () from /lib64/libc.so.6 [2026/03/13 16:03:58.308] 4 Thread 0xffff97342b80 (LWP 133615) "tiered-server" 0x0000ffff9acbefb4 in __futex_abstimed_wait_cancelable64 () from /lib64/libc.so.6 [2026/03/13 16:03:58.308] 5 Thread 0xffff98362b80 (LWP 133613) "checkpoint-p 4" 0x0000ffff9acbefb4 in __futex_abstimed_wait_cancelable64 () from /lib64/libc.so.6 [2026/03/13 16:03:58.308] 6 Thread 0xffff98b72b80 (LWP 133612) "checkpoint-p 3" 0x0000ffff9acbefb4 in __futex_abstimed_wait_cancelable64 () from /lib64/libc.so.6 [2026/03/13 16:03:58.308] 7 Thread 0xffff99382b80 (LWP 133611) "checkpoint-p 2" 0x0000ffff9acbefb4 in __futex_abstimed_wait_cancelable64 () from /lib64/libc.so.6
Showing 3 checkpoint worker threads. Those worker threads are all idle:\
[2026/03/13 16:03:58.357] Thread 7 (Thread 0xffff99382b80 (LWP 133611) "checkpoint-p 2"): [2026/03/13 16:03:58.357] #0 0x0000ffff9acbefb4 in __futex_abstimed_wait_cancelable64 () from /lib64/libc.so.6 [2026/03/13 16:03:58.357] #1 0x0000ffff9acc1e78 [PAC] in pthread_cond_timedwait@@GLIBC_2.17 () from /lib64/libc.so.6 [2026/03/13 16:03:58.357] #2 0x0000ffff9a20093c [PAC] in __wt_cond_wait_signal (session=session@entry=0x5166ff663638, cond=0x5166ffe1e990, usecs=1000000, run_func=run_func@entry=0xffff9a0fbc00 <__checkpoint_parallel_thread_chk>, signalled=signalled@entry=0xffff9938223f) at /data/mci/ac01179377cfde5eac92c5e038c7ad64/wiredtiger/src/os_posix/os_mtx_cond.c:115 [2026/03/13 16:03:58.357] #3 0x0000ffff9a0fc7f4 in __checkpoint_parallel_thread_run (session=0x5166ff663638, thread=<optimized out>) at /data/mci/ac01179377cfde5eac92c5e038c7ad64/wiredtiger/src/checkpoint/checkpoint_parallel.c:212 [2026/03/13 16:03:58.357] #4 0x0000ffff9a29ef98 in __thread_run (arg=0x5166ffe1c960) at /data/mci/ac01179377cfde5eac92c5e038c7ad64/wiredtiger/src/support/thread_group.c:32 [2026/03/13 16:03:58.357] #5 0x0000ffff9acc2834 in start_thread () from /lib64/libc.so.6 [2026/03/13 16:03:58.357] #6 0x0000ffff9ac66e5c [PAC] in thread_start () from /lib64/libc.so.6 [2026/03/13 16:03:58.357] Thread 6 (Thread 0xffff98b72b80 (LWP 133612) "checkpoint-p 3"): [2026/03/13 16:03:58.357] #0 0x0000ffff9acbefb4 in __futex_abstimed_wait_cancelable64 () from /lib64/libc.so.6 [2026/03/13 16:03:58.357] #1 0x0000ffff9acc1e78 [PAC] in pthread_cond_timedwait@@GLIBC_2.17 () from /lib64/libc.so.6 [2026/03/13 16:03:58.357] #2 0x0000ffff9a20093c [PAC] in __wt_cond_wait_signal (session=session@entry=0x5166ff663da0, cond=0x5166ffe1e990, usecs=1000000, run_func=run_func@entry=0xffff9a0fbc00 <__checkpoint_parallel_thread_chk>, signalled=signalled@entry=0xffff98b7223f) at /data/mci/ac01179377cfde5eac92c5e038c7ad64/wiredtiger/src/os_posix/os_mtx_cond.c:115 [2026/03/13 16:03:58.357] #3 0x0000ffff9a0fc7f4 in __checkpoint_parallel_thread_run (session=0x5166ff663da0, thread=<optimized out>) at /data/mci/ac01179377cfde5eac92c5e038c7ad64/wiredtiger/src/checkpoint/checkpoint_parallel.c:212 [2026/03/13 16:03:58.357] #4 0x0000ffff9a29ef98 in __thread_run (arg=0x5166ffe1c9b0) at /data/mci/ac01179377cfde5eac92c5e038c7ad64/wiredtiger/src/support/thread_group.c:32 [2026/03/13 16:03:58.357] #5 0x0000ffff9acc2834 in start_thread () from /lib64/libc.so.6 [2026/03/13 16:03:58.357] #6 0x0000ffff9ac66e5c [PAC] in thread_start () from /lib64/libc.so.6 [2026/03/13 16:03:58.357] Thread 5 (Thread 0xffff98362b80 (LWP 133613) "checkpoint-p 4"): [2026/03/13 16:03:58.357] #0 0x0000ffff9acbefb4 in __futex_abstimed_wait_cancelable64 () from /lib64/libc.so.6 [2026/03/13 16:03:58.357] #1 0x0000ffff9acc1e78 [PAC] in pthread_cond_timedwait@@GLIBC_2.17 () from /lib64/libc.so.6 [2026/03/13 16:03:58.357] #2 0x0000ffff9a20093c [PAC] in __wt_cond_wait_signal (session=session@entry=0x5166ff664508, cond=0x5166ffe1e990, usecs=1000000, run_func=run_func@entry=0xffff9a0fbc00 <__checkpoint_parallel_thread_chk>, signalled=signalled@entry=0xffff9836223f) at /data/mci/ac01179377cfde5eac92c5e038c7ad64/wiredtiger/src/os_posix/os_mtx_cond.c:115 [2026/03/13 16:03:58.357] #3 0x0000ffff9a0fc7f4 in __checkpoint_parallel_thread_run (session=0x5166ff664508, thread=<optimized out>) at /data/mci/ac01179377cfde5eac92c5e038c7ad64/wiredtiger/src/checkpoint/checkpoint_parallel.c:212 [2026/03/13 16:03:58.357] #4 0x0000ffff9a29ef98 in __thread_run (arg=0x5166ffe1ca00) at /data/mci/ac01179377cfde5eac92c5e038c7ad64/wiredtiger/src/support/thread_group.c:32 [2026/03/13 16:03:58.357] #5 0x0000ffff9acc2834 in start_thread () from /lib64/libc.so.6 [2026/03/13 16:03:58.357] #6 0x0000ffff9ac66e5c [PAC] in thread_start () from /lib64/libc.so.6
At the same time, a thread doing connection close is waiting on a semaphore (which presumably the workers should signal):
[2026/03/13 16:03:59.145] Thread 1 (Thread 0xffff9c472040 (LWP 133466) "python3"): [2026/03/13 16:03:59.145] #0 0x0000ffff9acbefb4 in __futex_abstimed_wait_cancelable64 () from /lib64/libc.so.6 [2026/03/13 16:03:59.145] #1 0x0000ffff9accada0 [PAC] in __new_sem_wait_slow64.constprop.0 () from /lib64/libc.so.6 [2026/03/13 16:03:59.145] #2 0x0000ffff9a2011dc [PAC] in __wt_semaphore_wait (session=session@entry=0x5166ff66acb8, sem=sem@entry=0x5166ff8f3578) at /data/mci/ac01179377cfde5eac92c5e038c7ad64/wiredtiger/src/os_posix/os_mtx_sem.c:68 [2026/03/13 16:03:59.145] #3 0x0000ffff9a0fd188 in __wt_checkpoint_parallel_finish (session=session@entry=0x5166ff66acb8) at /data/mci/ac01179377cfde5eac92c5e038c7ad64/wiredtiger/src/checkpoint/checkpoint_parallel.c:393 [2026/03/13 16:03:59.145] #4 0x0000ffff9a0d2984 in __wt_sync_file (session=session@entry=0x5166ff66acb8, syncop=syncop@entry=WT_SYNC_CHECKPOINT) at /data/mci/ac01179377cfde5eac92c5e038c7ad64/wiredtiger/src/btree/bt_sync.c:293 [2026/03/13 16:03:59.145] #5 0x0000ffff9a103e04 in __checkpoint_tree (session=session@entry=0x5166ff66acb8, is_checkpoint=is_checkpoint@entry=true, cfg=0xffffea49f290) at /data/mci/ac01179377cfde5eac92c5e038c7ad64/wiredtiger/src/checkpoint/checkpoint_txn.c:2781 [2026/03/13 16:03:59.145] #6 0x0000ffff9a105584 in __checkpoint_tree_helper (session=0x5166ff66acb8, cfg=0xffffea49f290) at /data/mci/ac01179377cfde5eac92c5e038c7ad64/wiredtiger/src/checkpoint/checkpoint_txn.c:2943 [2026/03/13 16:03:59.145] #7 __checkpoint_apply_to_dhandles (session=0x5166ff66acb8, cfg=0xffffea49f290, op=<optimized out>) at /data/mci/ac01179377cfde5eac92c5e038c7ad64/wiredtiger/src/checkpoint/checkpoint_txn.c:338 [2026/03/13 16:03:59.145] #8 __checkpoint_db_internal (session=0x5166ff66acb8, cfg=0xffffea49f290) at /data/mci/ac01179377cfde5eac92c5e038c7ad64/wiredtiger/src/checkpoint/checkpoint_txn.c:1544 [2026/03/13 16:03:59.145] #9 __checkpoint_db_wrapper (session=session@entry=0x5166ff66acb8, cfg=cfg@entry=0xffffea49f290) at /data/mci/ac01179377cfde5eac92c5e038c7ad64/wiredtiger/src/checkpoint/checkpoint_txn.c:1954 [2026/03/13 16:03:59.145] #10 0x0000ffff9a107c80 in __wt_checkpoint_db (session=0x5166ff66acb8, cfg=cfg@entry=0xffffea49f290, waiting=waiting@entry=true) at /data/mci/ac01179377cfde5eac92c5e038c7ad64/wiredtiger/src/checkpoint/checkpoint_txn.c:2035 [2026/03/13 16:03:59.145] #11 0x0000ffff9a2abdf4 in __wt_txn_global_shutdown (session=session@entry=0x5166ff662000, cfg=cfg@entry=0xffffea49f370) at /data/mci/ac01179377cfde5eac92c5e038c7ad64/wiredtiger/src/txn/txn.c:2623 [2026/03/13 16:03:59.145] #12 0x0000ffff9a111058 in __conn_close (wt_conn=0x5166ff8f2000, config=<optimized out>) at /data/mci/ac01179377cfde5eac92c5e038c7ad64/wiredtiger/src/conn/conn_api.c:1255
Note that this failure happened after the changes in WT-16909 which addressed a similar (but possibly opposite?) issue.
- is related to
-
WT-16909 Update WiredTiger Release Notes for SPM-4630
-
- Needs Scheduling
-