[SERVER-78480] Avoid unnecessary string copies in NumberParser Created: 27/Jun/23 Updated: 29/Oct/23 Resolved: 09/Oct/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 7.2.0-rc0 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Matt Boros | Assignee: | Patrick Freed |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | perf, techdebt | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Assigned Teams: |
Service Arch
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Sprint: | Service Arch 2023-10-02, Service Arch 2023-10-16 | ||||||||
| Participants: | |||||||||
| Description |
|
NumberParser is used in BSONElement, the SBE VM, and many _gen files. This class uses a helper function parseNumberFromStringHelper, where I see two unnecessary string copies: #1 and #2. In an extreme case I came up with, parsing one million numbers from {$match: {a: 1, b: 1, c: 1, ...}} originally took 4 seconds, and with a few lines change to avoid a copy it took 2 seconds instead, and the patch was green. We should also look at the Decimal128 constructors, they appear to make unnecessary copies as well. |
| Comments |
| Comment by Githook User [ 06/Oct/23 ] | ||||||||||||||||||||||||
|
Author: {'name': 'Patrick Freed', 'email': 'patrick.freed@mongodb.com', 'username': 'patrickfreed'}Message: | ||||||||||||||||||||||||
| Comment by Patrick Freed [ 04/Oct/23 ] | ||||||||||||||||||||||||
|
The copies in the double path are actually necessary, since the underlying conversion function being used (strtod) requires a null-terminated string, which is not guaranteed for StringData::rawData. The newer from_chars doesn't have this requirement, but unfortunately it seems to be slower despite not having the copy.
benchmark: https://github.com/patrickfreed/mongo/blob/parse-number-bm-old/src/mongo/base/parse_number_bm.cpp (the 10k refers to the min/max number being parsed, the parameter is the number of random numbers parsed) The copies in the Decimal128 path were not necessary though, and removing them demonstrated some performance benefits, so I've gone ahead with just that. | ||||||||||||||||||||||||
| Comment by Kyle Suarez [ 11/Jul/23 ] | ||||||||||||||||||||||||
|
We think that Service Arch owns this so punting this ticket over. |