[SERVER-54722] Improve Sorter checksumming Created: 23/Feb/21  Updated: 19/Jan/23  Resolved: 19/Jan/23

Status: Closed
Project: Core Server
Component/s: Performance
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Mathias Stearn Assignee: Backlog - Storage Execution Team
Resolution: Won't Do Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to WT-7236 Cache the result of the wiredtiger_cr... Closed
Assigned Teams:
Storage Execution
Participants:

 Description   

Right now we checksum each KV pair independently using murmur3 and then validate after finishing a file that the checksum matched. There are a few issues with this:

  • Murmur3 is a mediocre hash function at this point, both for perf and error detection
    • Fix: use crc32c (from wiredtiger)
  • Hashing small chunks of data is slower than hashing big chunks
    • Fix: We are already producing buffers of data for compression purposes. We should do the checksumming on the big buffer either before or after compression. Doing it before compression makes sure that the decompression produces the right result, but doing it after compression both checksums less data and avoids sending garbage into the decompressor. Since we trust Snappy to decompress correctly when fed good input, I think checksumming after compression make sense.
  • We wait until we finish with whole files to check the checksums. 1) this wastes work if we could have aborted earlier 2) it risks sending garbage data to consumers who aren't prepared for it. 3) It assumes we will actually reach the end of the file. Consumers like TopKSorter are unlikely to do so.
    • Fix: Check the checksum immediately after reading a chunk from the file (and after decompression, if the checksum was computed prior to compression).


 Comments   
Comment by Mathias Stearn [ 25/Feb/21 ]

Linking to WT-7236 since how it gets resolved may influence how this ticket is implemented. Alternatively, if this ticket is done first, you will need to do your own caching of the returned function pointer (not that that is hard, just something to be aware of).

Generated at Thu Feb 08 05:34:18 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.