|
My early experiments showed some promising results, but I wasn't able to reproduce them reliably. I think that variations in the function layout of the output binary were causing most of the performance differences I observed. In one test, I saw a 6% difference between the two versions I tested, even though I accidentally configured the test in a way that prevented it from using the column store index at all!
To reduce the impact of i-cache effects on tests, I created a single binary that switches between the optimized and unoptimized versions of the columnar decoding function based on a feature flag. Testing on that binary did not show a significant change in performance. Based on that, I'm putting this back on the backlog with the expectation that we probably won't implement it. Time permitting, it may be worthwhile to take a few more measurements once we have more benchmarks in place.
|