[SERVER-56520] Time-series inserts spent a lot of time in wt_calc_modify but don't generate deltas Created: 30/Apr/21  Updated: 29/Oct/23  Resolved: 10/Jun/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 5.0.0-rc2, 5.1.0-rc0

Type: Bug Priority: Major - P3
Reporter: Geert Bosch Assignee: Yuhong Zhang
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Related
related to SERVER-57159 Generate DamageVector from docDiff wi... Closed
related to SERVER-57504 Minimize the number of damages create... Closed
related to SERVER-57482 Adaptively call wiredtiger_calc_modify Backlog
is related to SERVER-57101 Prove generating DamageVector from do... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v5.0
Steps To Reproduce:

Run TSBS with cpu-only workload. Run db.point_data.stats() and look at the number of cursor modify operations. It doesn't increase during the work load, but attaching a profiles shows significant time spent in wiredtiger_calc_modify.

Sprint: Execution Team 2021-05-31, Execution Team 2021-06-14
Participants:

 Description   

We spend 25% of time on attempting to compute WT_MODIFY deltas, but never succeed to the limit of 16 changes. We should either support deltas with more than 16 changes, or find a way to avoid useless calculations.

Probably it is best to directly change the doc_diff updates into deltas.



 Comments   
Comment by Vivian Ge (Inactive) [ 06/Oct/21 ]

Updating the fixversion since branching activities occurred yesterday. This ticket will be in rc0 when it’s been triggered. For more active release information, please keep an eye on #server-release. Thank you!

Comment by Githook User [ 10/Jun/21 ]

Author:

{'name': 'Yuhong Zhang', 'email': 'danielzhangyh@gmail.com', 'username': 'YuhongZhang98'}

Message: SERVER-56520 Time-series inserts spent a lot of time in wt_calc_modify but don't generate deltas

(cherry picked from commit f3e5e05277309a8394a33aef9303dfb7ff4c0a4a)
Branch: v5.0
https://github.com/mongodb/mongo/commit/b2ab180f9556370a2a9ecbf77b7041415cf7fb9c

Comment by Yuhong Zhang [ 09/Jun/21 ]

We are resolving the ticket by skipping the "wiredtiger_calc_modify()" for time-series updates now. We experimented with the original idea which converts the diff in BSON format to "WT_MODIFY" directly, but the assumption that updating with deltas in wiredtiger storage engine is more efficient is not completely true. The implementation in wiredtiger makes it not ideal to integrate the approach for now. So we are keeping the logic in SERVER-57101, SERVER-57159 and SERVER-57504 with unit tests in the codebase.

The next step will be removing the special handling in this ticket after implementing a more general heuristic in SERVER-57482. This work is not scheduled for 5.0 as it will also affect non-timeseries workload.

Ultimately, after locating the issue in wiredtiger and work with the storage team, we will still want to take advantages of knowing the structure of our data and make update in wiredtiger smarter.

Comment by Githook User [ 09/Jun/21 ]

Author:

{'name': 'Yuhong Zhang', 'email': 'danielzhangyh@gmail.com', 'username': 'YuhongZhang98'}

Message: SERVER-56520 Time-series inserts spent a lot of time in wt_calc_modify but don't generate deltas
Branch: master
https://github.com/mongodb/mongo/commit/f3e5e05277309a8394a33aef9303dfb7ff4c0a4a

Generated at Thu Feb 08 05:39:29 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.