[SERVER-14050] Exit BSON parsing early on retrieving data. Created: 26/May/14  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: Performance
Affects Version/s: 2.6.1
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: John Page Assignee: Backlog - Query Execution
Resolution: Unresolved Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-3334 Stop processing bson fields in projec... Backlog
Assigned Teams:
Query Execution
Participants:

 Description   

We treat BSON as having unique keys for all values.

When searching for matching documents we use an early exit strategy, as soon as we find the value or values we are looking for we skip to the next document.

However this only benefits us in a count - anything that does a projection, including aggregation does not take advantage of this and therefore processes, and more importantly pages in more data than is required. With the removal of covered indexes for aggregation this becomes more important.

I suggest, when projecting, once we have the data we need we do not page the rest of the document in.



 Comments   
Comment by Asya Kamsky [ 03/Apr/18 ]

john.page is this still relevant? This was done with MMAP and back when agg didn't use covered queries. Both are now history so I'm not sure if this ticket should still be considered. And if so, is it different from SERVER-3334?

Comment by John Page [ 13/Aug/14 ]

Sorry - just discovered a whole load of tickets awaiting my input.

I simply made a large set of records with a large number of fields, then queried it. Where the records are large enough to span several pages, all pages are pulled in for each record when they don't need to be, this is not related to readahead. Monitored using the tool above.

Comment by Daniel Pasette (Inactive) [ 28/May/14 ]

the main use case is reflected here: SERVER-3334

Comment by John Page [ 26/May/14 ]

I tested this using git:10gen/mongo-labs/mongo-pageview which I wrote.

Added a large number of 64K documents with 1,500 integer fields.

Queried for a non existent field on a large collection in another database to evict the pages and verified with pageview. Then ran several counts, queries and aggregation to observer how much and what is paged in.

I also verified the GOOD behaviour that where fields are in a subdocument the subdocument is not paged in - which is a performance optimisation/work-around that can be used at the moment.

so

f0: 1234
f1: 1234
.
.
.
f1500: 1234

Will always page everything in except in a count - whatever the projection the whole record is paged in. count will stop paging when it finds the fields it needs .

f1:1234
f2:1234
others: {
f3:1234
.
.
.
f1500:1234
}

Will not page in anything in others (aside from the page at the start the previous data is in)

This can make a hige different when running aggregations on larger records - and in MongoDB records are often quite large - anything over 8K, assuming 0 readahead and this is a winner and reducing paging, page eviction etc (not to mention CPU cyces for BSON parsing) all helps with performance.

Generated at Thu Feb 08 03:33:42 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.