[SERVER-84897] POC mqlrun Created: 02/Apr/18  Updated: 12/Jan/24  Resolved: 20/Apr/18

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: David Storch Assignee: Justin Seyster
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Sprint: Query 2018-04-09, Query 2018-04-23
Participants:

 Description   

Build a proof of concept for an executable called mqlrun. The executable accepts the following:

  • A set of input documents, probably encoded as a BSON file.
  • An input query, probably expressed as JSON or extended JSON (or BSON?). Only need to support aggregation (not find, update, or other pieces of MQL).

mqlrun uses the server's implementation of the aggregation execution system to output the result set after running the agg over the input documents.

$lookup/$graphLookup are out of scope, as are the metadata sources like $collStats, $indexStats, etc. I think text search and geoNear should be out of scope as well.

POC consists of the following work items:

  • Make a branch available for the POC in a fork of mongodb/mongo.
  • Build a new executable which depends only on the base/bson libraries and the appropriate execution libraries.
  • Make a new agg data source for scanning the input BSON file.


 Comments   
Comment by Justin Seyster [ 16/Apr/18 ]

I have pushed my current POC to the private 10gen/mongo Github repository on the 'jseyster/mqlrun' branch.

Comment by Justin Seyster [ 05/Apr/18 ]

A quick clarification: the temp directory is optional. mqlrun can execute without a temp directory, but if fails if any sort or group exceeds 1GB and there's no temp directory. Still, it would be a good idea to gate allowDiskUse=true behind the -t temp directory flag. I'll do that when I'm applying the rest of the changes from the code review.

Comment by David Storch [ 05/Apr/18 ]

eliot and schwerin, Justin has a rough POC working. (Thanks Justin!) The patch to the server code to build mqlrun is available in the linked CR. A few questions:

1) What artifacts do you want from us---just the patch? for us to push this to a branch? an Evergreen build? a binary?

2) Can this code be in a public branch? I assume so; there's nothing too magical here.

3) Right now the tool always sets allowDiskUse=true, so you have to provide a tmp directory where it can write data out for external sorts/groups. Is this ok for the POC?

Comment by Eliot Horowitz (Inactive) [ 02/Apr/18 ]

This looks right to me.
I guess outputting bson to stdout is fine.
It would be nice to have a --jsonOutput flag for debugging.

Comment by David Storch [ 02/Apr/18 ]

justin.seyster, let me know if you have any questions about the requirements for this, or would like to better understand the context. Remember, it's just a POC, so it doesn't have to be pretty.

Also, we estimated that this would take about three days to build. If you discover that it will take longer as you begin work on it, please let me know.

CC schwerin eliot

Generated at Thu Feb 08 06:56:17 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.