[SERVER-31280] Implementing paging using range queries, latency of last query is much higher Created: 27/Sep/17  Updated: 27/Oct/23  Resolved: 12/Jan/18

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: 3.4.9
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Alessandro Gherardi Assignee: Backlog - Query Team (Inactive)
Resolution: Gone away Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File explain.1     File explain.2    
Issue Links:
Related
related to SERVER-13732 Predicates in top-level implicit AND ... Closed
related to SERVER-24027 The planner does not consider reversi... Closed
Assigned Teams:
Query
Operating System: ALL
Participants:

 Description   

I'm trying to implement paging using range queries. I have a collection called events that has, among others, a time attribute (type date).

I defined a compound index with key

{"time" : 1, "_id" : 1}

. I use the script below to retrieve the events in pages.

The query latency is very good (< 20 ms) for all queries except the last one. An explain shows that, unlike the other queries, the last query does not use the compound index. It looks like this has something to do with the value of startTime being equal to or very close to the value of lastTime.

var startTime = ISODate("2017-09-18T00:00:00Z");
var endTime = ISODate("2017-09-18T00:30:00Z");
var pageSize = 555;
var maxTimeMS = 6000;
var dbName = "foo";
 
var totalLatency = 0;
var skip = 0;
var lastTime = null;
var lastEventId = null;
var firstPage = true;
while (true) {
  var start = new Date();
  var eventsRead = 0;
  if (firstPage) {
    firstPage = false;
 
    db.getSiblingDB(dbName).events.find({ time: { $gte: startTime, $lte: endTime}}).sort({ "time" : -1 , "_id" : -1}).limit(pageSize).maxTimeMS(maxTimeMS).forEach(function(s) {
      eventsRead++;
      lastTime = s.time;
      lastEventId = s._id;
    } );
  } else {
    db.getSiblingDB(dbName).events.find({ time: { $gte: startTime, $lte: endTime}, "$or" : [ { "time" : { "$lt" : lastTime}} , { "time" : lastTime , "_id" : { "$lt" : lastEventId }} ] }).sort({ "time" : -1 , "_id" : -1}).limit(pageSize).maxTimeMS(maxTimeMS).forEach(function(s) {
      eventsRead++;
      lastTime = s.time;
      lastEventId = s._id;
    } );
  }
  var elapsed = new Date() - start;
  totalLatency += elapsed;
  skip = skip + eventsRead;
  print("page " + skip + ", elapsed " + elapsed);
  if (eventsRead < pageSize) {
    break;
  }
}
 
print("Total events read: " + skip + ", total latency: " + totalLatency);



 Comments   
Comment by Alessandro Gherardi [ 02/Jan/18 ]

Thanks. Feel free to close.

Comment by David Storch [ 07/Nov/17 ]

Hi agherardi,

SERVER-13732 cannot be backported to 3.4. However, we have put out our third release candidate for the 3.6 release series (3.6.0-rc2). It won't be long until 3.6 GA is released, which will contain the fix.

Best,
Dave

Comment by Alessandro Gherardi [ 31/Oct/17 ]

Hi Kelsey,
Any chance that SERVER-13732 will be back-ported to 3.4?

Thanks,
Alessandro

Comment by Alessandro Gherardi [ 13/Oct/17 ]

Hi Kelsey,
I downloaded 3.6.0-rc0 and indeed the issue appears to be fixed.

Is there any chance that SERVER-13732 will be back-ported to 3.4?

Thanks,
Alessandro

Comment by Kelsey Schubert [ 13/Oct/17 ]

Hi agherardi,

MongoDB 3.6.0-rc0 was announced today, which is the first release candidate for MongoDB 3.6. Please feel free to use it in testing environments.

Kind regards,
Kelsey

Comment by Alessandro Gherardi [ 09/Oct/17 ]

Hi Kelsey,
Thanks for the update. If the timing is right, upgrading to 3.6 is a viable option for us.

Is the 3.5.X development branch what's going to become 3.6? Do you have an estimate for when 3.6 is going to be released?

Alessandro

Comment by Kelsey Schubert [ 09/Oct/17 ]

Hi agherardi,

Thank you for uploading the data, we're able to reproduce this issue and are continuing to investigate.

When mongod first executes query it will evaluate different plans (e.g. consider using different indexes) to retrieve the requested documents. To improve performance, it will cache the best plan and use this going forward. If a query is slower than expected, than it will reevaluate its plan.

In MongoDB 3.4.2, we introduced SERVER-24027, which has allows the query planner to identify the plan MERGE_SORT, which saves a few ms per query, but, unfortunately, performs poorly on the last query in your example.

In this case, it appears that mongod is not reevaluating its plan while executing the last query in your script.

If you execute one additional final query at the end of your script,

db.getSiblingDB(dbName).events.find({ time: { $gte: startTime, $lte: endTime}, "$or" : [ { "time" : { "$lt" : lastTime}} , { "time" : lastTime , "_id" : { "$lt" : lastEventId }} ] }).sort({ "time" : -1 , "_id" : -1}).limit(pageSize).itcount()

I expect that it will cause mongod to reevaluate its cached query plan and cache a new plan in its place, and subsequent runs should not have large discrepancy in performance.

Please note that the queries you are executing are much more performant in MongoDB 3.6, which generates tighter index bounds (SERVER-13732). As a result of this performance improvement, the query plans need not be reevaluated. Therefore, I would suggest upgrading to MongoDB 3.6 when it is available to resolve this issue.

Kind regards,
Kelsey

Comment by Alessandro Gherardi [ 09/Oct/17 ]

Hi Kelsey,
Any updates?

Thank you,
Alessandro

Comment by Alessandro Gherardi [ 29/Sep/17 ]

Thank you, Kelsey. I uploaded the data to the portal, as a zipped mongodump export.

Comment by Kelsey Schubert [ 29/Sep/17 ]

Hi agherardi,

Certainly, I've created a secure upload portal for you to use. Files uploaded to this portal are only visible to MongoDB employees investigating this issue and are routinely deleted after some time.

Thanks again,
Kelsey

Comment by Alessandro Gherardi [ 29/Sep/17 ]

Hi Kelsey,
After retrying my test, I realized that I was wrong about the winning plan: the winning plan appears to be the same (except for index boundaries values, as expected) for all queries, including the last one. However, the higher latency issue still exists.

I attached the explain(true) outputs for both the last and the second-to-last queries.

I can definitely provide you with our test data set. For privacy reasons, I'd prefer to upload the data to a non-public place that only Mongo support can access. Is that possible? If it is, can you please email host + credentials info to alessandro_DOT_gherardi_AT_yahoo_DOT_com.

Thank you,
Alessandro

Comment by Kelsey Schubert [ 28/Sep/17 ]

Hi agherardi,

Thank you for reporting this behavior. So we can better understand what is going on here. Would you please provide the output of explain(true) for penultimate query as well as the problematic final query?

Additionally, if you can provide a data set that we can run your script against that showcases this issue, it'll speed our investigation.

Thanks again for your help,
Kelsey

Comment by Alessandro Gherardi [ 27/Sep/17 ]

Here's the script output on my test DB:

page 555, elapsed 14
page 1110, elapsed 15
page 1665, elapsed 23
page 2220, elapsed 13
page 2775, elapsed 14
page 3330, elapsed 13
page 3885, elapsed 13
page 4440, elapsed 13
page 4995, elapsed 14
page 5550, elapsed 13
page 6105, elapsed 12
page 6660, elapsed 12
page 7215, elapsed 14
page 7770, elapsed 13
page 8164, elapsed 2047
Total events read: 8164, total latency: 2243

Generated at Thu Feb 08 04:26:32 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.