Improve efficiency of FieldRef parsing during optimization

XMLWordPrintableJSON

    • Type: Improvement
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Query Optimization
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Motivation

      The FieldRef structure is used throughout the QO codebase, in places to modify field paths, but in places for simpler operations, like checking if a field path has multiple components.

      Profiling shows this path as hot in code which uses it. Including visiting MatchExpressions (where the path is parsed to check various properties).

      This was originally noticed in a benchmark for pipeline optimisation where a $match is pushed down past 200x $addFIelds stages.

      It is possible to strategically defer FieldRef parsing in some cases, for example, by checking if path.find('.') -> then FieldRef parse in the code in the visitPathExpression.

      However, such performance fixes are brittle. Instead, from my testing it looks like we can swap out the implementation for something simpler and faster

      The current implementation uses a cached field length, a preallocated vector of string fields of field parts, a lazy computed final result and a subpath damage vector: 

      • FieldIndex _cachedSize 
      • boost::small_vector<optional<StringView>, 4> _parts
      • std::string _dotted
      • std::vector<std::string> _replacements

      In my testing we can swap out the internals for a single flat char array while keeping the interface the same, and achieve 1x to 5x performance gains with no losses.

      I've tested various scenarios of mutating parts of field paths of various lengths and number of segments and saw virtually no regression. 

      In the $match pushdown benchmark the fraction of time spent in visiting the match expression three fell by 21% (from 32% of all time to 25% of all time, shaving off 1ms). 

      Proposal

      Replace the FieldRef implementation with a simper flat char array + index. This seems to perform better even on mixed workload benchmarks (add/remove/replace proportion of components then read) and the optimsiation benchmark above, likely due to better cache locality and auto-vectorisation by the compiler.

      Additionally, it appears we can extract the index + read-only logic into a separate base class, again without breaking existing uses, while allowing new uses to avoid constructing a FiedlRef structure (copying the path) but still benefit from indexed access (get the first component etc.).

            Assignee:
            Unassigned
            Reporter:
            Vesko Karaganev
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated: