[SERVER-69987] Investigate big_collection regressions in SBE Created: 26/Sep/22 Updated: 29/Oct/23 Resolved: 18/Jan/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 6.3.0-rc0 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Mihai Andrei | Assignee: | Anna Wawrzyniak |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | pm2697-m3 | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||
| Sprint: | QE 2022-10-31, QE 2022-11-14, QE 2022-11-28, QE 2022-12-12, QE 2022-12-26, QE 2023-01-09, QE 2023-01-23 | ||||||||||||
| Participants: | |||||||||||||
| Linked BF Score: | 35 | ||||||||||||
| Story Points: | 10 | ||||||||||||
| Description |
|
This task includes, but is not limited to, investigating the following tests:
|
| Comments |
| Comment by Githook User [ 18/Jan/23 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Author: {'name': 'Drew Paroski', 'email': 'drew.paroski@mongodb.com', 'username': 'paroski'}Message: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Anna Wawrzyniak [ 03/Nov/22 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
https://jira.mongodb.org/browse/PM-2451 Looks like it would solve this problem (option 3) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Anna Wawrzyniak [ 02/Nov/22 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Performance comparison for big collections: https://docs.google.com/spreadsheets/d/1ZluIWD522RdxJScTxIjZKNNRr-kgKpZMl4pr35-KqEo/edit?usp=sharing
Both prototypes fix the regression. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Anna Wawrzyniak [ 02/Nov/22 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
This issue is caused by unnecessary copying bson objects by save/restore stage that happens at the GetMore command boundary. The larger the document and the smaller the batch size (in number of documents count) the larger the overhead of making the document copy per batch that GetMore command computes. Details: In case of simple scans, or plans that consist of streaming stages a above scan (makeBson, filter, project, traverse, limit, etc) the stored document returned by scan stage will never be accesed and the copy that is made in saveState will be thrown away. Possible solutions: 1) Do nothing and accept the overhead and regression compared to classis for cases with large documents and small batch size. 2) Advise customers to use larger batch size. This may not be practical when collection contains large documents. A batch size of 100 documents of 16MB would result in 1.6gb per batch which may not be a good choice. 3) Change storage api to guarantee some form of "stable pointers" to returned documents, that survive context switch and yielding. A storage that supports mvcc or copy-on-write might be able to satisfy that requirement even if page was modified. However, in certain cases a copy might still need to be made (for example when the old page needed to be collected for some reased). In such case, QE would need still need be able to switch to the new document pointer and possibly restore all views/subtrees of such document. 4) Modify the save/restoreState logic in SBE to avoid unnecessary copies where it is known that the slots holding the document will not be accessed until subsequent getNext(). GetMore always performs getNext() as first operation after restoreState on root, so that invariant is true for root and for all its streamed inner-most children. Such save/restore logic extension would need ability for GetMore to notify the stage that slots will not longer be accessed until following getNext() and then that information would need to be propagated through the sbe stage tree to identify all stages that can safely discard their state when performing save/restore. Prototype #1: https://github.com/10gen/mongo/pull/new/anna.wawrzyniak/save_restore This extends the saveState to include a "bool discardPublicState" parameter that indicates that the public slots of the stage will not be accessed until the subsequent getNext(). The streaming stages propagate the discardPublicState to children when appropriate. The default implementation conservatively assumes discardPublicState = false.
Prototype #2: https://github.com/10gen/mongo/pull/new/anna.wawrzyniak/save_restore2 This utilizes the existing mechanism of marking slots as not needed used by yielding. Stages already use disableSlotAccess to indicate that slots are no longer needed: a) non-recursive - typically called from getNext method to indicate that the slots are no longer needed and they will be recomputed when getNext completes A GetMore command executor could use disableSlotAccess() method that the slots are no longer required until subsequent getNext call. However, the non-recursibe version of disableSlotAccess only marks the parent stage, but does not propagate that information to children. In case of streaming stages, that information could be propagated to children when appropriate and preventing unnecessary slot copying when safe to do so. The prototype avoids the potential square complexity when disableSlotAccess propagates to its children in getNext method, by using lazy evaluation. Only when saveState is called, the subtree actually computes whether stages have slot access enabled/disabled.
Performance:
|