Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-12361

Revisit __wt_lex_compare_skip impl for arm

    • Type: Icon: Improvement Improvement
    • Resolution: Won't Fix
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Btree
    • Storage Engines
    • 8
    • 2024-02-20_A_near-death_puffin, 2024-03-05 - Claronald, 2024-03-19 - PacificOcean, 2024-04-02 - GreatMugshot, 나비 (nabi) - 2024-04-16

      I noticed that this code tries to use arm neon SIMD to compare 16 bytes at a time. However, it only does it when that arguments are aligned, even though the simd load instructions have no alignment requirements. At the very least we should remove the alignment check.

      But I think we should do much more.

      • When doing only 16 bytes at a time, it should be faster in regular GPRs rather than SIMD since arm has an instruction to load 16 bytes into a pair of GPRs.
      • The tail bytes should be handled by loading the last 16 bytes, rather than byte-at-a-time
      • Once we've found a difference, we should compute matchp and the compare results from the loaded data rather than using the byte loop.
      • Use at most 2 loads when len < 16 (and ignore matchp since it isn't worth it in that case)
      • Try to nudge the compiler into using conditional instructions such as CSET/CSEL/CSINC rather than branching for unpredictable branchs to avoid mispredict penalty

      I've put this all together in this godbolt. Clang seems to do slightly better than gcc, but gcc isn't bad with this code.

      The x86 path could also use a review. It could use some of these techniques, but the >=16 byte path should just use vectors since they are cheaper there. And we should eliminate the alignment check there too since the perf advantage of movdqa over movdqu has disapeared on modern CPUs.

            Assignee:
            chenhao.qu@mongodb.com Chenhao Qu
            Reporter:
            mathias@mongodb.com Mathias Stearn
            Votes:
            0 Vote for this issue
            Watchers:
            16 Start watching this issue

              Created:
              Updated:
              Resolved: