[SERVER-74113] Investigate Long/Medium snapshot scan regressions Created: 16/Feb/23  Updated: 29/Mar/23

Status: Open
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Mihai Andrei Assignee: Backlog - Query Execution
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Query Execution
Sprint: QE 2023-03-06, QE 2023-03-20
Participants:
Story Points: 3

 Description   

db.sbe_perf_latest.aggregate([{ $match: { tags: {$ne: "ignore"}, test: {$in: ["Long.Scan.Snapshot.Scan.21.21", "Medium.Scan.Snapshot.Scan.15.15"] }} }, {$project: {signed_percent_change_stable: 1, test: 1, signed_z_score_change_stable: 1,  _id: 0}}, {$sort: {signed_percent_change_stable: 1}}]).pretty()
[
  {
    signed_percent_change_stable: -24.66040878744127,
    signed_z_score_change_stable: -3.169921403018614,
    test: 'Medium.Scan.Snapshot.Scan.15.15'
  },
  {
    signed_percent_change_stable: -24.211982539751745,
    signed_z_score_change_stable: -2.9066299053588978,
    test: 'Long.Scan.Snapshot.Scan.21.21'
  },
  {
    signed_percent_change_stable: -20.030570534513117,
    signed_z_score_change_stable: -1.6357795683938086,
    test: 'Medium.Scan.Snapshot.Scan.15.15'
  },
  {
    signed_percent_change_stable: -18.831185311502324,
    signed_z_score_change_stable: -1.6432838109504526,
    test: 'Long.Scan.Snapshot.Scan.21.21'
  },
  {
    signed_percent_change_stable: -17.19448683124061,
    signed_z_score_change_stable: -2.1648182774504536,
    test: 'Long.Scan.Snapshot.Scan.21.21'
  },
  {
    signed_percent_change_stable: -12.60341078706027,
    signed_z_score_change_stable: -2.241378239643826,
    test: 'Medium.Scan.Snapshot.Scan.15.15'
  }
] 



 Comments   
Comment by Ivan Fefer [ 29/Mar/23 ]

So this scans are from LLTMixed workload:
It does two things:

  1. Run a lot of single document fast queries: insert, delete, update and find.
  2. It also runs a long-lived scan of the whole collection in a transaction.

In perf reports most time is spent in WiredTired cursors. I looked for a difference in how classic and sbe handle cursors, but there is no significant changes there. The only thing was that classic creates cursors during first call of doWork and SBE creates cursors during open().

However in SERVER-74063 I found out that in such workloads even slight changes to when cursors are created and/or released can lead to performance changes. I tried moving cursor creating to first getNext call instead of open for scan and ix_scan SBE stages, but it didn't show any improvements.

On that I am out of ideas and didn't have a change to investigate more. 

Comment by David Storch [ 28/Mar/23 ]

ivan.fefer@mongodb.com as we discussed in standup today, could you provide a status update on this ticket? What do you currently know about the root cause of these regressions? Do you have any potential fixes or experiments planned?

Comment by Ivan Fefer [ 03/Mar/23 ]

perf reports show that the most time is spent in WiredTiger for this workload. Because there are a lot of very short queries and writes in this workload, I tried to create cursors as late as possible and close them as soon as possible, but it didn't show any statistically significant change.

Looks like SERVER-72803 will also affect this queries, so I will un-assign it from me until we see the result from the linked ticket.

Generated at Thu Feb 08 06:26:32 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.