Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: Cursors, Performance
Labels:
- performance

Assigned Teams:

Storage Engines, Storage Engines - Foundations
Total Hours with Assigned Team:
1,638.593
Sprint:
None
Story Points:
None

This project developed a methodology for making evidence-based choices about field layout in WiredTiger's hot internal structs, and demonstrated it on WT_SESSION_IMPL and WT_CURSOR_BTREE. The methodology uses:

analysis via AI to identify which fields are touched together on hot paths
some rearrangement of struct fields, guided also by taste: we want to preserve logical groupings
WT_STRUCT_LAYOUT() macros that document and enforce cache-line group boundaries via compiler directives.

This POC shows several wins in targeted workloads, like consistent 2+% improvement in ecommerce workloads and consistent 1.7% improvement across 5 sub-workloads of mixed_workloads_locust. More importantly, targeted analysis of workload regressions, followed by struct adjustments, has reduced these regressions substantially. This targeted work on regressions is a powerful focusing strategy, as resulting changes generally also help workloads broadly. This POC has ended due to time constraints, but there is strong indications that most or all of the stable regressions can be reduced to noise levels. Even more exciting is that future gains are possible. WT_DATA_HANDLE, WT_BTREE, and WT_CONNECTION_IMPL are all candidates for reordering. Btree internal structs should also be examined for potential wins, although it's likely that gains on smaller structs may be more difficult without growing their size.

Another part of future work is preserving hard-fought gains. It is all too easy for any WT PR to insert a new field in the middle of a well crafted layout. There are straightforward ways to prevent/detect this from happening. Again, the POC was too short to develop these.

A few layout strategies were used, and are relatively easy to apply given the AI analysis:

Once we know fields that are used together at around the same time, we can group them together taking advantage of locality in the L1 cache.
On identifying fields that are generally hot, group them in their own hot cache line(s).
Knowing fields shared among multiple threads leads us to grouping them with cold fields. This can eliminate the false sharing anti-pattern.

A summary of 6 validation patches run at the same time (3 baseline, 3 with the changes) will be attached in the comments, as well as code differences.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

session-order-poc.pdf
455 kB
May 18 2026 11:31:20 AM UTC

Assignee:: [DO NOT USE] Backlog - Storage Engines Team
Reporter:: Donald Anderson
Votes:: 0 Vote for this issue
Watchers:: 1 Start watching this issue

Created:: May 18 2026 11:30:33 AM UTC
Updated:: May 22 2026 10:36:30 PM UTC

Details

Description

Attachments

Attachments

Activity

People

Dates