[SERVER-20770] Consolidate the two code paths for running find commands in the mongo shell Created: 05/Oct/15 Updated: 08/Jun/23 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Querying, Shell |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | David Storch | Assignee: | Backlog - Query Execution |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | move-stm, storch, tech-debt | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||
| Assigned Teams: |
Query Execution
|
||||||||||||||||||||
| Sprint: | QE 2023-05-29 | ||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Linked BF Score: | 0 | ||||||||||||||||||||
| Description |
|
The shell uses DBCommandCursor to iterate results returned from the "cursor-generating commands" (find, aggregate, listIndexes, listCollections). The implementation is primarily in Javascript: Rewriting this functionality in C++ would most likely make this code run faster. It would also ensure that the shell does not consume excessive memory, since memory required to buffer each batch of results could be reclaimed after the batch is exhausted. Currently, the Javascript engine's garbage collector may not run frequently enough to reclaim memory in a timely manner, especially for queries returning many megabytes worth of results. |
| Comments |
| Comment by Kevin Cherkauer [ 08/Jun/23 ] |
|
I have attached file sv20770.tar.gz containing Git patch files for the 14 commits on PR 13066. These are against SHA 72c47177dd75d7451d623a04f0423971b55ebf07: 0001-SERVER-20770-Consolidate-the-two-code-paths-for-runn.patch |
| Comment by Kevin Cherkauer [ 08/Jun/23 ] |
|
PR (not delivered) of the partial work (against master branch, which was 7.1.0-rc0 at the time) I did before this was paused on 2023-06-08 (dev branch cherkauer_sv20770): https://github.com/10gen/mongo/pull/13066 This has 14 commits on it, but the best results were obtained on the eighth one (labeled "sv20770 ps08..."). Commits after that were attempts to correctly emulate the set_read_preference_secondary.js override used by resmoke.py --suite=aggregation_secondary_reads jstests/aggregation/sources/collStats/query_exec_stats.js These changes fixed that test but caused about 15 other tests in the suite to start failing in patch builds. (Prior to these changes, query_exec_stats.js was the only test failing in this suite on the C++ code path.) Search for the "kjc" comment markers in query.js to see how I forced all "find" commands to take the C++ code path, without removing the JS code path. This can easily be flipped back and forth by changing whether the marked if() test in DBQuery.prototype._exec is "if (false)" (forces C++ code path for everything) or "if (!this._isExhaustCursor())" (original behavior of C++ code path for exhaust cursors and JS code path for everything else). |
| Comment by Kevin Cherkauer [ 08/Jun/23 ] |
|
I worked on this for three weeks, but now we are pausing it. Short summary:
Profile of this project is high-cost (lots of effort), high-risk (potential to break a large number of tests not visible to patch builds), and low-payoff (if successful, tests might run faster and some JS code might be retired, but new JS code to emulate overrides in the "C++" code path will have been created). I will attach artifacts for the work I did up to this point. |
| Comment by David Storch [ 21/Jul/22 ] |
|
I looked at this in a bit more detail this week. It's going to be easier to implement it in two steps. The first step is to simplify the shell as described by |
| Comment by David Storch [ 05/Jul/22 ] |
|
I think we should consider this ticket as a quick win. Given that the concept of "legacy reads" versus "read commands" has gone away, there is no longer a reason to have two paths for running queries in the shell. At the moment, the C++ path is specifically for exhaust queries and the JS one is for all other find commands. It should be doable to make all commands use the C++ code path that is currently reserved for exhaust queries. |
| Comment by David Storch [ 06/Dec/21 ] |
|
brooke.miller I'd like to keep this ticket open, but on the Query Execution team's backlog. I think this remains valuable as a code improvement for a code path which is heavily used in order to test the server. Also, I believe that the changes the Query Execution team did to get rid of OP_QUERY in 5.1 (along with the other legacy op codes) should make this ticket a bit easier. Sending to the QE triage queue. |
| Comment by Brooke Miller [ 03/Dec/21 ] |
|
The shell is deprecated to external users, so please feel free to re-open if you are a core-db contributor and would like this feature for correctness testing purposes. |
| Comment by Githook User [ 06/Oct/15 ] |
|
Author: {u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}Message: SERVER-20770 garbage collect every 10 MB in itcount() instead of every 10000 documents Ensures that the shell will not consume too much memory while using |