[SERVER-81515] Add tokenize() function on the ValueBlock interface Created: 27/Sep/23 Updated: 29/Oct/23 Resolved: 12/Oct/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 7.2.0-rc0 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Ian Boros | Assignee: | Parker Felix |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Backwards Compatibility: | Fully Compatible |
| Participants: |
| Description |
|
Add a method to the ValueBlock interface which "tokenizes" the input. That is, it identifies the set of unique items in the block. It should return: For example: Input ValueBlock
ValueBlock->tokenize() returns:
The default implementation can use a basic hashing algorithm (make sure to use the same hasher that the HashAgg stage uses). We should also have a special implementation for MonoBlock which is optimized. Eventually we will add an optimized version for Homogeneous blocks and possibly RLE compressed blocks. |
| Comments |
| Comment by Githook User [ 12/Oct/23 ] |
|
Author: {'name': 'Parker Felix', 'email': 'parker.felix@mongodb.com', 'username': 'parker-felix'}Message: |
| Comment by Ian Boros [ 27/Sep/23 ] |
|
Assigning this to Parker for whenever ongoing work finishes. There is no rush on this one either. |