[SERVER-18209] Add QUERY option to limit ARRAY size Created: 25/Apr/15  Updated: 19/May/15  Resolved: 15/May/15

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: None
Fix Version/s: None

Type: New Feature Priority: Major - P3
Reporter: Yair Lenga Assignee: Ramon Fernandez Marina
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-18210 Add query for document structure Closed
Participants:

 Description   

Please add an option to limit the size of ANY array in the document.

Motivation:
Relted to SERVER-18207

Motivation:

Our application parses data from MongoDB repositories that are populated by other application. Documents can have potentially large arrays objects at arbirary locations. The application retrieve the data, present the user with "summary" (e.g., showing data from the first few entries, followed by "NNN more").

The full entry is expanded once the user choose a specific document, and part of the document.

The above change will reduce the amount of data that need to be fetched on the original calls. Currently, 100X data has transferred from Mongo, since there is no way to cap the size of the ARRAY.

When working with documents that can contain arbitrary data sets, the application is forced to read



 Comments   
Comment by Ramon Fernandez Marina [ 19/May/15 ]

Hi yair.lenga@gmail.com,

apologies if I missed something when reading your tickets. I'll take a closer look at the comment above and compare it with SERVER-18210 to see how it all fits together.

Regards,
Ramón.

Comment by Yair Lenga [ 16/May/15 ]

Ramon,

I think that there are significant differences and benefits for the functionality requested in 18207, 18208, 18209, compared with 18210.

The short version is that all three will efficient retrieval of large documents where the client does not have information about the schema, with a single round-trip requests to the server. Compare this with 18210, which will

  • Require much more complex processing of the data both in the server.
  • Require additional logic in the client side to take advantage of the schema information.
  • Require at least 2 round trips: One for the schema, one for top level data, and potentially additional round trips for deeply complex/nested data structures.
  • Require significant more development time to deliver

In practice, issues 18207, 18208, 18209 will deliver solutions that can be easily used by "light-weight" applications with minimal code changes. While 18210 is a much more complete solution it will apply to smaller set of application, where the extra effort of leveraging the information is justified.

I will appreciate if you can reconsider. I believe that 18207, 18208, 18209 can be implemented (relatively) quickly, and will provide a short term good solution, while 18210 is likely take much more time and provide long term benefits.

Regards
Yair.

Comment by Ramon Fernandez Marina [ 15/May/15 ]

Folding into SERVER-18210, as one would need to know if a field is an array before being able to use $slice.

Comment by Yair Lenga [ 03/May/15 ]

Ticket still showing "Waiting for User Input". See input above.

Comment by Yair Lenga [ 25/Apr/15 ]

The problem is similar to SERVER-18208.
The $slice will work, IF the document structure is known, and is not expanded. When having to retrieve data from a document with unknown structure (in my cases, blocks of data added when data is available), the only way I could find, to find the name of the new series is to extract the whole document - forcing a massive read of the data.

For example, if new data series are stored inside the document (like the one below), finding the tickers that have time series is difficult. The new operator will make it possible to run a fetch a limit. This fetch will return the keys of the document, and the item count. The code can on a "find" with a projection that will bring only the required data. At this point, '$slice' will help reducing the data set.

{
"APPLE": [ ... ],
"GOOGLE": [ ... ],
"MUTUAL-FUNDS"":

{ "FIDELITY": [ ... ], }

,
}

For documents with large time series, this method can reduce the network cost per document from MB to few K.

Comment by Ramon Fernandez Marina [ 25/Apr/15 ]

yair.lenga@gmail.com, I think the $slice operator for projection may be what you're looking for – can you please give it a try and see if it meets your needs?

Thanks,
Ramón.

Generated at Thu Feb 08 03:46:53 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.