[SERVER-26752] $indexOfBytes (and OfCP) should use Boyer-Moore algorithm Created: 24/Oct/16 Updated: 21/Sep/23 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Aggregation Framework |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Asya Kamsky | Assignee: | Backlog - Query Execution |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | neweng, pull-request | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Assigned Teams: |
Query Execution
|
| Participants: |
| Description |
|
$indexOfCP and $indexOfBytes can share code (they currently don't) and can use Boyer-Moore to improve performance. |
| Comments |
| Comment by Mathias Stearn [ 24/Oct/16 ] |
|
We are already using boost::boyer_moore_search to do phrase matching in full-text search. It should be easy to use it here too. This implementation should check if the needle is constant and use the object interface to avoid building the lookup-table for every execution (it may only be worth using Boyer-Moore in this common case and not for the dynamic-needle case, I'm not sure). It should also special-case when the needle is 1 byte and use the CharT overload of std::string::find since that will be much faster. |