[SERVER-14050] Exit BSON parsing early on retrieving data. Created: 26/May/14 Updated: 06/Dec/22 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Performance |
| Affects Version/s: | 2.6.1 |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | John Page | Assignee: | Backlog - Query Execution |
| Resolution: | Unresolved | Votes: | 1 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Assigned Teams: |
Query Execution
|
||||||||
| Participants: | |||||||||
| Description |
|
We treat BSON as having unique keys for all values. When searching for matching documents we use an early exit strategy, as soon as we find the value or values we are looking for we skip to the next document. However this only benefits us in a count - anything that does a projection, including aggregation does not take advantage of this and therefore processes, and more importantly pages in more data than is required. With the removal of covered indexes for aggregation this becomes more important. I suggest, when projecting, once we have the data we need we do not page the rest of the document in. |
| Comments |
| Comment by Asya Kamsky [ 03/Apr/18 ] |
|
john.page is this still relevant? This was done with MMAP and back when agg didn't use covered queries. Both are now history so I'm not sure if this ticket should still be considered. And if so, is it different from SERVER-3334? |
| Comment by John Page [ 13/Aug/14 ] |
|
Sorry - just discovered a whole load of tickets awaiting my input. I simply made a large set of records with a large number of fields, then queried it. Where the records are large enough to span several pages, all pages are pulled in for each record when they don't need to be, this is not related to readahead. Monitored using the tool above. |
| Comment by Daniel Pasette (Inactive) [ 28/May/14 ] |
|
the main use case is reflected here: SERVER-3334 |
| Comment by John Page [ 26/May/14 ] |
|
I tested this using git:10gen/mongo-labs/mongo-pageview which I wrote. Added a large number of 64K documents with 1,500 integer fields. Queried for a non existent field on a large collection in another database to evict the pages and verified with pageview. Then ran several counts, queries and aggregation to observer how much and what is paged in. I also verified the GOOD behaviour that where fields are in a subdocument the subdocument is not paged in - which is a performance optimisation/work-around that can be used at the moment. so f0: 1234 Will always page everything in except in a count - whatever the projection the whole record is paged in. count will stop paging when it finds the fields it needs . f1:1234 Will not page in anything in others (aside from the page at the start the previous data is in) This can make a hige different when running aggregations on larger records - and in MongoDB records are often quite large - anything over 8K, assuming 0 readahead and this is a winner and reducing paging, page eviction etc (not to mention CPU cyces for BSON parsing) all helps with performance. |