-
Type:
Bug
-
Resolution: Fixed
-
Priority:
Major - P3
-
Affects Version/s: None
-
Component/s: Reconciliation
-
None
-
Storage Engines, Storage Engines - Transactions
-
SE Transactions - 2025-06-06, SE Transactions - 2025-06-20, SE Transactions - 2025-07-04, SE Transactions - 2025-07-18
-
5
-
(copied to CRM)
-
0
-
v8.1, v8.0
Description:
We observed a significant repl lag (lasting over an hour) in one of the help tickets. The issue resolved automatically once but mostly a manual host restart is required to stop the lagging on the node, such lag is not acceptable in production environments.
Upon analysing the FTDC data, we found that the repl thread was stalled trying to access a page held under an exclusive lock by eviction. Hence, the cause appears to be slowness in eviction. Based on the FTDC data and flame graphs collected over a 4-minute trace window eviction slowness seems to be caused by reconciliation moving updates to the HS.
Problem:
Currently, we lack sufficient visibility into how reconciliation progresses while moving updates to the HS. This makes it difficult to diagnose and respond to performance issues like replication lag when they occur.
Action Items:
- Define what diagnostic data should be collected (e.g., time spent, number of updates, retries).
- Add logging or expose new FTDC metrics.
- Evaluate the need for backporting.
- is related to
-
WT-14340 Make conn->flags atomic
-
- Closed
-
-
WT-14750 Failure in __wt_page_inmem: "encountered an illegal file format or internal value: 0x0"
-
- Closed
-
-
WT-14848 Fix missing WT_RET in live_restore_fs.c
-
- Closed
-
-
WT-14864 Remove duplicate TSAN warnings from metric script
-
- Closed
-
-
WT-14872 clang-analyzer gives misleading output
-
- Closed
-
-
WT-14919 Coverity analysis defect 175312: Unused value
-
- Closed
-
-
WT-14929 Coverity analysis defect 174890: Resource leak
-
- Closed
-
-
WT-14935 Solve: SUMMARY: ThreadSanitizer: data race /home/ec2-user/work/git/wiredtiger-arm/src/support/mtx_rw.c:168:71 in __read_blocked
-
- Closed
-
-
WT-14980 Disagg table ID namespacing not correctly feature-gated
-
- Closed
-
-
WT-14865 Create a parser script for wiredtiger config string in turtle files
-
- Closed
-
-
SERVER-106431 Update version cursor config to new format
-
- Closed
-
-
WT-14695 Merge page deltas into develop
-
- Closed
-
-
WT-14696 Merge precise checkpoint into develop
-
- Closed
-
-
WT-14697 Merge disagg testing code into develop
-
- Closed
-
-
WT-14698 Merge remaining disagg code into develop
-
- Closed
-
-
WT-14826 Write the prepare timestamp and prepared id to disk with preserve prepared config
-
- Closed
-
-
WT-14828 Ensure we set the prepare id when preparing a transaction if preserve prepare config is on
-
- Closed
-
-
WT-14833 Fix TCMalloc build/propagation for some stress tests
-
- Closed
-
-
WT-14858 Forbid to prepare a transaction before the stable timestamp if preserve_prepare config is on
-
- Closed
-
-
WT-14869 Pack prepared ts and prepared id correctly to cell format and unpack them accordingly
-
- Closed
-
-
WT-14878 Assign prepared id and prepard ts on page deltas
-
- Closed
-
-
WT-14901 Enable all examples regular testing with TSAN by suppressing all the warnings
-
- Closed
-
-
WT-14951 Merge newer disagg code into develop
-
- Closed
-
-
WT-14978 Add diagnostic information to durable timestamp assertion
-
- Closed
-
-
WT-14837 Add metric to measure execution time of block_first_srch()
-
- Closed
-
-
WT-14727 update the workgen latency metrics to print the bucket count for us, ms and secs
-
- Closed
-
-
WT-14832 Add read operations to test/model
-
- Closed
-
-
WT-9931 Reader took an order of magnitude longer for when all history store records were invisible
-
- Closed
-
-
WT-14719 Update cache workloads to adapt to stat name change for cache_eviction_trigger_clean_reached
-
- Closed
-
-
WT-14896 Failed: s-outdated-fixmes on ~ Infrequent checks [WiredTiger (develop) @ 2b0ed0cf]
-
- Closed
-
-
WT-14946 Disable incompatible tests between disagg and tiered
-
- Closed
-
-
WT-14947 Suppress perf critical warnings
-
- Closed
-
-
WT-14953 test_layered17 spinlock abort - pthread_mutex_lock: (null): Invalid argument
-
- Closed
-
-
WT-14954 Make test_truncate02 more reliable
-
- Closed
-
-
WT-14955 failed: format-failure-configs-test on ubuntu2004 [wiredtiger @ a4f10c8e]
-
- Closed
-
-
WT-14956 AssertionError in test_rollback01.py: no rollback occurred on cursor->next() for disagg
-
- Closed
-
- related to
-
WT-14619 Merge layered tables into develop
-
- Closed
-
-
WT-14562 Dump all extent list blocks when we do a corrupt block dump
-
- Closed
-
-
WT-12337 Review and fix WT_ASSERTs in packing_inline.h
-
- Closed
-
-
WT-13985 Did not run a sweep for 60 minutes in test_prepare_hs01 for CS
-
- Closed
-
-
WT-14648 Fix log subsystem returning EBUSY from conn->close()
-
- Closed
-
-
WT-13038 task-timed-out: csuite-timestamp-abort-test-s3 on ubuntu2004
-
- Closed
-