[SERVER-20770] Consolidate the two code paths for running find commands in the mongo shell Created: 05/Oct/15  Updated: 08/Jun/23

Status: Backlog
Project: Core Server
Component/s: Querying, Shell
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: David Storch Assignee: Backlog - Query Execution
Resolution: Unresolved Votes: 0
Labels: move-stm, storch, tech-debt
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File sv20770.tar.gz    
Issue Links:
Related
related to SERVER-58491 Consolidate the C++ native cursor imp... Backlog
related to SERVER-65955 Shell's special query path for exhaus... Closed
related to SERVER-68213 Stop generating an OP_QUERY-style que... Closed
is related to SERVER-23219 DBCommandCursor doesn't route getMore... Closed
Assigned Teams:
Query Execution
Sprint: QE 2023-05-29
Participants:
Linked BF Score: 0

 Description   

The shell uses DBCommandCursor to iterate results returned from the "cursor-generating commands" (find, aggregate, listIndexes, listCollections). The implementation is primarily in Javascript:

https://github.com/mongodb/mongo/blob/bd76599a88e8062c634eeb6918bb30f868bc6042/src/mongo/shell/query.js#L650-L804

Rewriting this functionality in C++ would most likely make this code run faster. It would also ensure that the shell does not consume excessive memory, since memory required to buffer each batch of results could be reclaimed after the batch is exhausted. Currently, the Javascript engine's garbage collector may not run frequently enough to reclaim memory in a timely manner, especially for queries returning many megabytes worth of results.



 Comments   
Comment by Kevin Cherkauer [ 08/Jun/23 ]

I have attached file sv20770.tar.gz containing Git patch files for the 14 commits on PR 13066. These are against SHA 72c47177dd75d7451d623a04f0423971b55ebf07:

0001-SERVER-20770-Consolidate-the-two-code-paths-for-runn.patch
0002-sv20770-ps02-Experiment-treating-everything-as-a-C-e.patch
0003-sv20770-ps03-Undo-ps02.patch
0004-sv20770-ps04-If-slaveOk-option-given-use-readPrefere.patch
0005-sv20770-ps05-Undo-ps04.patch
0006-sv20770-ps06-Set-default-readPreference-primaryPrefe.patch
0007-sv20770-ps07-Code-format.patch
0008-sv20770-ps08-Make-client-side-C-parser-include-comme.patch
0009-sv20770-ps09-C-mimic-JS-self-mod-code-set_read_prefe.patch
0010-sv20770-p10-Remove-instrumentation-other-cleanups.patch
0011-sv20770-p11-Move-constants-to-make-visible-in-should.patch
0012-sv20770-ps12-Experiment-leave-ps11-refector-in-but-d.patch
0013-sv20770-ps13-Reduce-scope-of-ps09-fix-from-global-to.patch
0014-sv20770-ps14-Add-prepareCommandRequest-call-for-non-.patch

Comment by Kevin Cherkauer [ 08/Jun/23 ]

PR (not delivered) of the partial work (against master branch, which was 7.1.0-rc0 at the time) I did before this was paused on 2023-06-08 (dev branch cherkauer_sv20770):

https://github.com/10gen/mongo/pull/13066

This has 14 commits on it, but the best results were obtained on the eighth one (labeled "sv20770 ps08..."). Commits after that were attempts to correctly emulate the set_read_preference_secondary.js override used by

resmoke.py --suite=aggregation_secondary_reads jstests/aggregation/sources/collStats/query_exec_stats.js

These changes fixed that test but caused about 15 other tests in the suite to start failing in patch builds. (Prior to these changes, query_exec_stats.js was the only test failing in this suite on the C++ code path.)

Search for the "kjc" comment markers in query.js to see how I forced all "find" commands to take the C++ code path, without removing the JS code path. This can easily be flipped back and forth by changing whether the marked if() test in DBQuery.prototype._exec is "if (false)" (forces C++ code path for everything) or "if (!this._isExhaustCursor())" (original behavior of C++ code path for exhaust cursors and JS code path for everything else).

Comment by Kevin Cherkauer [ 08/Jun/23 ]

I worked on this for three weeks, but now we are pausing it. Short summary:

  1. Putting all find calls through the C++ path without other changes produces 230 suite failures in patch build.
  2. After three weeks of effort I have been able to fix 27 of these.
  3. Interpolating, this produces an estimate of 26 weeks to reach "Ready To Review," which is not the end. (Besides review there will be a long tail of BFs trickling in from tests that get broken but are not part of patch builds.)
  4. The main barrier is self-modifying code ("overrides") in the JS code path (jstests/libs/override_methods/ as well as manual overrides). These are used extensively: about 400 JS and YAML files load overrides from this library directory, and there are additional cases of custom overrides done manually.

Profile of this project is high-cost (lots of effort), high-risk (potential to break a large number of tests not visible to patch builds), and low-payoff (if successful, tests might run faster and some JS code might be retired, but new JS code to emulate overrides in the "C++" code path will have been created).

I will attach artifacts for the work I did up to this point.

Comment by David Storch [ 21/Jul/22 ]

I looked at this in a bit more detail this week. It's going to be easier to implement it in two steps. The first step is to simplify the shell as described by SERVER-68213. Once this is complete, it could be easier to unify the exhaust and non-exhaust paths in the shell. I'm going to work on SERVER-68213 instead of this ticket, so I will put this ticket back into the triage queue.

Comment by David Storch [ 05/Jul/22 ]

I think we should consider this ticket as a quick win. Given that the concept of "legacy reads" versus "read commands" has gone away, there is no longer a reason to have two paths for running queries in the shell. At the moment, the C++ path is specifically for exhaust queries and the JS one is for all other find commands. It should be doable to make all commands use the C++ code path that is currently reserved for exhaust queries.

Comment by David Storch [ 06/Dec/21 ]

brooke.miller I'd like to keep this ticket open, but on the Query Execution team's backlog. I think this remains valuable as a code improvement for a code path which is heavily used in order to test the server. Also, I believe that the changes the Query Execution team did to get rid of OP_QUERY in 5.1 (along with the other legacy op codes) should make this ticket a bit easier.

Sending to the QE triage queue.

Comment by Brooke Miller [ 03/Dec/21 ]

The shell is deprecated to external users, so please feel free to re-open if you are a core-db contributor and would like this feature for correctness testing purposes.

Comment by Githook User [ 06/Oct/15 ]

Author:

{u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}

Message: SERVER-20770 garbage collect every 10 MB in itcount() instead of every 10000 documents

Ensures that the shell will not consume too much memory while using
DBCommandCursor to iterate a large result set.
Branch: master
https://github.com/mongodb/mongo/commit/a7aa0d48c57a1ce1a82825e5bc6348f88fc10f35

Generated at Thu Feb 08 03:55:13 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.