[SERVER-18210] Add query for document structure Created: 25/Apr/15 Updated: 06/Dec/22 Resolved: 18/Dec/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Aggregation Framework, Querying |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | New Feature | Priority: | Major - P3 |
| Reporter: | Yair Lenga | Assignee: | Backlog - Query Team (Inactive) |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | expression, stage | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Assigned Teams: |
Query
|
||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Description |
|
Please add a query that will return the "structure" of the document, instead of the data. The structure will allow application to understand the data that is embeded in the document, and then construct a query that will be efficient. In concept, this is similar to the JDBC meta data query. Given the schema-less of Mongo, I'm not sure that there is a perfect solution. The basic idea is to summarize the data. In theory, constructing a JSON schema from the data will work, but this is unrealistic approach. Some ideas:
Motivation: Having the ability to get (some) meta data, will reduce the amount of data that is loaded by a factor of 100X for our application. |
| Comments |
| Comment by Asya Kamsky [ 19/Jul/17 ] |
|
As we now have $objectToArray and $arrayToObject expressions along with $type, $size, $strLenCP, etc. I think this ticket can be closed. |
| Comment by Charlie Swanson [ 08/Mar/16 ] |
|
I think a better way to achieve this desired outcome would be to provide a way to get the keys out of an object, and possibly to reconstruct an object. If you can manipulate the field names and the corresponding values of an object, then the rest of the summarization could be done using $size, $strLen (code points or bytes, see
With those expressions, one could unwind an object, then do a $map over the (key, value) pairs, replacing the value with some summary of the value, then reconstruct the object with the new values. Obviously I haven't fully thought through what those would look like, but I think those would be more generally useful than an expression to summarize the data. asya does this sound reasonable to you? |
| Comment by Ramon Fernandez Marina [ 05/Mar/16 ] |
|
For those watching the ticket without knowledge of JIRA and our use of it, this is to let you know that this feature request has been sent to the Query team for consideration in their next round of planning. Any updates to this state will be posted on this ticket. Thanks, |
| Comment by Neville Dipale [ 25/Apr/15 ] |
|
I agree with you on the [Implement X in server so we don't implement it ourselves in client], and I think there are a number of other features which we as users would love to see. One approach to this is for the server-side scripting to be improved/overhauled with something that could allow us to create procedures/functions in server to achieve what we want. Otherwise 3 years down the line we'll have lots of arbitrary functions that shouldn't be in core, or people doing what you're trying to achieve not getting requested features and moving elsewhere. I think the [server-side scripting] bit would reduce the load on the Mongo team in the long-run because they wouldn't need to maintain a lot of extra functions. I would love to move a number of my scripts from the app into the server so I remove the network time cost that I incur every time I run certain queries |
| Comment by Yair Lenga [ 25/Apr/15 ] |
|
I believe that the major benefit of this feature is reduction in the amount of data that will be TRANSFERRED from the server to the client. By moving the functionality into the MongoDB server, the same amount of work will reduce the size of the data. I agree that detecting the structure of the whole collection is big. I hope to have the ability to summarize the structure a small subset, one document at a time. For my cases (large time series embedded in ~4MB documents), the metadata to describe the 4MB set was less then 4K (with int[5000] in the meta data, representing ~32K of int data). I will be happy with this saving, leaving the much harder problem (metadata for the collection as a whole) for the future. For server-based application, where the bulk of the processing is done on the Web server (Java, in my case), performing the processing in MongoDB will reduce the time for encoding, network transfer, decoding and memory requirement of the document. In my case, we noticed that this transfer/parse time is where most the time is spent. Reading the data in Mongo seems to be a small fraction of the data. For Browser-based appliation, where the data is transported thru the internet, into Javascript application, I believe the saving is going to be significantly higher, as network transfer rates are order of magnitude slower vs, server to mongodo connection, in addition to encryption/decription code. As far as caching, I can only comment about my planned usage - creating large time-based data sets: the calls are performed in response to interactive user requests against ranges that the user need. The critical thing is the time to deliver that data. It is unlikely the different users will ask for the structure of the same documents - I'm not sure if caching will help my specific application. |
| Comment by Neville Dipale [ 25/Apr/15 ] |
|
Won't this force Mongo to also load complete documents (entire disk read?) in order to return the structure? What happens when there are say 1 million documents with some differences to the extent that 10 or more different schemas exist? If it is possible to efficiently create the feature, the structure could perhaps be cached so that an entire collection scan is not performed each time the query is run. |