-
Type:
Bug
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
Security Level: Public (Available to anyone on the web)
-
Storage Engines - Foundations
-
3,905.588
-
None
-
None
test_layered04 (as with many other tests) has sleeps in it. Take out the sleeps, and it fails. It does something like:
- open cursor, insert 150k records, close cursor. (sleep occasionally during insertion)
- sleep 1
- open cursor, count records, close cursor
- make sure the number of records read == 150k
Remove the sleeps and it fails. The more sleeping, the closer to 150k.
test_layered06 fails almost at the same point, it's only slightly different.
- open a second connection to be a follower, and a session
- the follower session is not used before the crash.
- open a cursor, insert 300k records, reset and close the cursor (sleep occasionally during insertion)
- sleep 2
- open cursor, count records, close cursor
In the last step, it gets a crash
[1731354394:7176][21453:0xfffff7ff54c0], test_oligarch06.test_oligarch06.test_oligarch06(100k), file:test_oligarch06.wt_stable, WT_CURSOR.next: [WT_VERB_DEFAULT][ERROR]: __block_disagg_read_checksum_err, 48: test_oligarch06.wt_stable: read checksum error for 2286B block at page 225, ckpt 1: block header checksum of 1948959266 (2) doesn't match expected checksum of 884395128 (1)
The relevant part is that the number in parens next to the checksum is the reconciliation_id. So the "rec-id" found in PALI was 2, and we were asking for 1. This is all for the file_id, same page_id, same checkpoint_id (checkpoint 1). I've confirmed in the LMDB storage that we have three versions of this page, at rec-ids 0, 1 and 2. They are all full versions, no deltas.
How can it be that we are asking for an earlier reconcilation version than we've previously written?
Theory: Is it possible that we didn't update the new checksum/rec-id pair in the cookie in the parent internal page? So we're asking for the old one, since we didn't record the new one.
UPDATE: These tests are now named test_layered04.py, test_layered06.py etc.
- is depended on by
-
WT-14427 [ds-09.04][Storage Engines (Core)] 100% hygiene plan execution
-
- Open
-