Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
None

Assigned Teams:

Storage Engines - Transactions
Total Hours with Assigned Team:
1,901.964
Sprint:
SE Transactions - 2026-07-03
Story Points:
None

While implementing parallel checkpoints, or at least trying to resolve test failures on the branch I came across a failure in test_drop01, the test logs:

[pid:69414]: ERROR in test_drop01.test_drop01.test_drop_hs_truncate
F
======================================================================
FAIL: test_drop01.test_drop01.test_drop_hs_truncate
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/ubuntu/other-wt-repro/test/suite/wttest.py", line 161, in wrapper
    return func(self, *args, **kwargs)
  File "/home/ubuntu/other-wt-repro/test/suite/wttest.py", line 309, in _callTestMethod
    method()
  File "/home/ubuntu/other-wt-repro/test/suite/test_drop01.py", line 68, in test_drop_hs_truncate
    self.assertEqual(self.get_hs_size(), 2)
AssertionError: 0 != 2

----------------------------------------------------------------------
Ran 1 test in 78.737s

FAILED (failures=1)
[Inferior 1 (process 69414) exited with code 01]

This assertion is saying it counted zero records in the history store when it should've seen two. Now initially I believed this failure to be a history store reconciliation oddity that parallel checkpoints weren't playing nice with, but it turns out the actual cause is quite different.

This failure mode can be reproduced on develop by modifying the test slightly:

diff --git a/test/suite/test_drop01.py b/test/suite/test_drop01.py
index 2f6417ab15..1d7c2efcd9 100644
--- a/test/suite/test_drop01.py
+++ b/test/suite/test_drop01.py
@@ -39,11 +39,13 @@ class test_drop01(wttest.WiredTigerTestCase):
         self.session.commit_transaction('commit_timestamp=' + self.timestamp_str(timestamp))

     def get_hs_size(self):
-        cursor = self.session.open_cursor("file:WiredTigerHS.wt")
+        sess = self.conn.open_session()
+        cursor = sess.open_cursor("file:WiredTigerHS.wt")
         size = 0
         while cursor.next() == 0:
             size += 1
         cursor.close()
+        sess.close()
         return size

     uri = 'table:test_drop01'

Essentially, by creating a new session prior to counting the records in the history the test fails. The reason for this is that the cursor opened by the session does not get WT_CURSTD_IGNORE_TOMBSTONE set on it and thus treats every history store record as invisible. This indicates a bug, I think every cursor opened on the history store should have that flag set. Now the reason this passes on develop currently is that the history store cursor opened by the checkpoint prior to counting the history store records never has WT_CURSTD_IGNORE_TOMBSTONE un-set from it. So effectively when the checkpoint completes the cursor is cached and then re-used when re-opened later in the test. This also is likely a bug.

I'm not entirely clear on the semantics expected when a user opens a cursor directly on the history store. I don't think MongoDB does this either so I don't think there are external implications but effectively the bug impacts what records are visible in the history store, and hides them artificially so its a somewhat concerning issue.

The fix may be as simple as adding a WT_CURSTD_IGNORE_TOMBSTONE in cur_file open, and clearing that flag on cache.

Scope:

Investigate validity of issue
Fix, and add regression testing

Assignee:: [DO NOT USE] Backlog - Storage Engines Team
Reporter:: Luke Pearson
Votes:: 0 Vote for this issue
Watchers:: 1 Start watching this issue

Created:: Mar 06 2026 09:59:22 PM UTC
Updated:: May 21 2026 09:56:55 PM UTC

Details

Description

Attachments

Activity

People

Dates