[SERVER-54485] FTS indexes pass c-strings to tokenizer so they ignore data in strings past the first nul byte Created: 12/Feb/21 Updated: 03/Jun/21 Resolved: 03/Jun/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Index Maintenance, Querying |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Mathias Stearn | Assignee: | Mickey Winters |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Operating System: | ALL |
| Sprint: | Query Execution 2021-03-08, Query Execution 2021-03-22, Query Execution 2021-04-05, Query Execution 2021-04-19, Query Execution 2021-05-03, Query Execution 2021-05-17, Query Execution 2021-06-14 |
| Participants: |
| Description |
|
"Luckily" the same bug exists in both indexing and querying. Unfortunately, because it is in indexing, it will require a version bump to fix and we would need to keep the old code around.
This also adds a relatively slight overhead because it is computing strlen rather than just using the size stored in the BSONElement. But that cost is trivial relative to the actual tokenization, so I don't think it is a good motivation alone to fix this. |
| Comments |
| Comment by Mickey Winters [ 03/Jun/21 ] |
|
Thanks for flagging this! After investigation this does not seem to cause data corruption or server crashes. There are 2 main correctness issues.
If you perform a normal find query that would otherwise match the first document then you will get back {msg: "foo \0 bar"} because the problem is with indexing/querying not with storage. This problem occurs because
In order for a user to encounter this bug they would have to be trying to FTS indexes on unclean data that may have null bytes in it. Although this bug bothers me at least for now there doesn't seem to be much justification to warrant the time required to fix this. |