Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-12293

Optimize our crc32 implementation for x86

    • Type: Icon: Task Task
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • StorEng - Refinement Pipeline

      mathias@mongodb.com pointed out that we could utilize a feature of the Intel X86 CPU implementation to optimize our crc32 implementation.

      On x86 (at least on intel) the story is a bit different. There, the crc32c instruction has a 3 cycle latency, but 3 can execute in parallel as long as they are independent, so the standard pattern is to do each chunk as 3 parallel streams and do a merge operation. See WT-2121 for some discussion. It may be worth reopening that and trying again since you are already set up to test it. You can find a BSD-or-GPL licensed implementation written by intel engineers linked to from that ticket, but to save you a hop, here you go.

      This ticket is to explore the feasibility of this option and see how much performance gain we can potentially achieve.

            Assignee:
            backlog-server-storage-engines [DO NOT USE] Backlog - Storage Engines Team
            Reporter:
            chenhao.qu@mongodb.com Chenhao Qu
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated: