[SERVER-56520] Time-series inserts spent a lot of time in wt_calc_modify but don't generate deltas Created: 30/Apr/21 Updated: 29/Oct/23 Resolved: 10/Jun/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 5.0.0-rc2, 5.1.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Geert Bosch | Assignee: | Yuhong Zhang |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||
| Backport Requested: |
v5.0
|
||||||||||||||||||||||||
| Steps To Reproduce: | Run TSBS with cpu-only workload. Run db.point_data.stats() and look at the number of cursor modify operations. It doesn't increase during the work load, but attaching a profiles shows significant time spent in wiredtiger_calc_modify. |
||||||||||||||||||||||||
| Sprint: | Execution Team 2021-05-31, Execution Team 2021-06-14 | ||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Description |
|
We spend 25% of time on attempting to compute WT_MODIFY deltas, but never succeed to the limit of 16 changes. We should either support deltas with more than 16 changes, or find a way to avoid useless calculations. Probably it is best to directly change the doc_diff updates into deltas. |
| Comments |
| Comment by Vivian Ge (Inactive) [ 06/Oct/21 ] |
|
Updating the fixversion since branching activities occurred yesterday. This ticket will be in rc0 when it’s been triggered. For more active release information, please keep an eye on #server-release. Thank you! |
| Comment by Githook User [ 10/Jun/21 ] |
|
Author: {'name': 'Yuhong Zhang', 'email': 'danielzhangyh@gmail.com', 'username': 'YuhongZhang98'}Message: (cherry picked from commit f3e5e05277309a8394a33aef9303dfb7ff4c0a4a) |
| Comment by Yuhong Zhang [ 09/Jun/21 ] |
|
We are resolving the ticket by skipping the "wiredtiger_calc_modify()" for time-series updates now. We experimented with the original idea which converts the diff in BSON format to "WT_MODIFY" directly, but the assumption that updating with deltas in wiredtiger storage engine is more efficient is not completely true. The implementation in wiredtiger makes it not ideal to integrate the approach for now. So we are keeping the logic in The next step will be removing the special handling in this ticket after implementing a more general heuristic in SERVER-57482. This work is not scheduled for 5.0 as it will also affect non-timeseries workload. Ultimately, after locating the issue in wiredtiger and work with the storage team, we will still want to take advantages of knowing the structure of our data and make update in wiredtiger smarter. |
| Comment by Githook User [ 09/Jun/21 ] |
|
Author: {'name': 'Yuhong Zhang', 'email': 'danielzhangyh@gmail.com', 'username': 'YuhongZhang98'}Message: |