[SERVER-81098] Make statementId in oplog faster Created: 15/Sep/23 Updated: 24/Jan/24 Resolved: 11/Nov/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 7.3.0-rc0 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Matthew Russotto | Assignee: | Matthew Russotto |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | perf-8.0, perf-tiger, perf-tiger-handoff, perf-tiger-q4 | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Assigned Teams: |
Replication
|
||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||
| Sprint: | Repl 2023-11-13 | ||||||||||||
| Participants: | |||||||||||||
| Linked BF Score: | 155 | ||||||||||||
| Description |
|
For historical reasons the statement ID in the oplog can be absent, a single integer, or an array of integers. We currently represent this in the IDL class with a boost::optional<std::variant<int, vector<int>>>, which is inconvenient and inefficient. We have accessors which convert this to vector<int>, which makes it more convenient but means we do a heap allocation every time we access statement ID. Instead of doing this, we should represent the value in the IDL class with a vector or boost_smallvector, and the accessors can provide references. On serialization we can convert to absent/BSON integer/BSON vector depending on the number of values in the vector (we currently do this in the convenience layer). |
| Comments |
| Comment by Githook User [ 11/Nov/23 ] | ||||||||||||||||||||||||||||||
|
Author: {'name': 'Matthew Russotto', 'email': 'matthew.russotto@mongodb.com', 'username': 'mtrussotto'}Message: | ||||||||||||||||||||||||||||||
| Comment by Matthew Russotto [ 09/Nov/23 ] | ||||||||||||||||||||||||||||||
Parsing is slightly improved, access (which reads every element in the statementID array) is somewhat improved, and GetDurableReplOperationSize becomes O(1) as it should be. The small amount of improvement in access likely means the slowest part was bringing the data into L1 cache; after doing the unnecessary copy, the data was already in L1 cache and so was very fast to access. | ||||||||||||||||||||||||||||||
| Comment by Matthew Russotto [ 08/Nov/23 ] | ||||||||||||||||||||||||||||||
|
"before" benchmark on 8.0 (ARM virtual workstation; very approximate)
Note GetDurableOperationSize; this should be a fast O(1) operation but it turns out it is not. The ParseOplogEntry benchmarks were already vastly improved by | ||||||||||||||||||||||||||||||
| Comment by Matthew Russotto [ 07/Nov/23 ] | ||||||||||||||||||||||||||||||
|
It turns out IDL isn't using BM_ParseOplogEntryWithNStatementIds/0 440 ns 440 ns 1622675 | ||||||||||||||||||||||||||||||
| Comment by Matthew Russotto [ 07/Nov/23 ] | ||||||||||||||||||||||||||||||
|
Just parsing the command is a bit expensive. If I make that an IDL enum it will take advantage of | ||||||||||||||||||||||||||||||
| Comment by Matthew Russotto [ 18/Sep/23 ] | ||||||||||||||||||||||||||||||
|
For instance, exclusive of statement ID checking (a different bug in
After
I believe the reason is a call to get statement IDs in getDurableReplOperationSize creates a copy. | ||||||||||||||||||||||||||||||
| Comment by Matthew Russotto [ 15/Sep/23 ] | ||||||||||||||||||||||||||||||
|
The new code after I fixed an issue with doing an unnecessary floating point conversion:
Same code, also verifying field IDs using NumberParser (like IDL normally does)
Same code, verifying field IDs using std::from_chars
So there may be some gains to be had with the IDL elsewhere by not verifying array IDs or verifying them faster. But also the new code is strictly faster than the old in the normal case (0-1 statement ids). Need to try boost::small_vector. | ||||||||||||||||||||||||||||||
| Comment by Matthew Russotto [ 15/Sep/23 ] | ||||||||||||||||||||||||||||||
|
Same benchmark, current code
There's clearly some noise; the NoStatementId version isn't doing anything different. |