[SERVER-22214] CloneCollection with query doing COLLSCAN although there is an index on field Created: 18/Jan/16  Updated: 06/Dec/22  Resolved: 26/Jul/19

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: 3.2.0
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Andrey Borunov Assignee: Backlog - Query Team (Inactive)
Resolution: Done Votes: 0
Labels: query-44-grooming
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-19593 Allow collscans on $snapshot queries ... Closed
is related to SERVER-32174 Remove old snapshot query option Closed
Assigned Teams:
Query
Operating System: ALL
Participants:

 Description   

Hello!

I have a capped collection with about 150 million documents in it (20 Gb of data). The insertion rate to this collection is about 500-100 doc/sec. I have an index on one of the fields - "start", which is a number. I perform scheduled cloneCollection from another mongod instance (both instances version is 3.2) with a query like {start: {$gt: <some number>}}. The cloneCollection command takes very long, about 10 minutes and is slowly increasing with every next cloneCollection.

When I searched through the mongod.log, I found the entries like this:

2015-12-25T16:09:53.073+0600 I QUERY    [conn2951] query statserver-collector.statrecords query: { query: { _id: { $gt: 165091491.0 } }, $snapshot: true } planSummary: COLLSCAN cursorid:22150682963 ntoreturn:0 ntoskip:0 exhaust:1 keysExamined:0 docsExamined:121741412 keyUpdates:0 writeConflicts:0 numYields:952128 nreturned:101 reslen:17236 locks:{ Global: { acquireCount: { r: 1904258 } }, Database: { acquireCount: { r: 952129 } }, Collection: { acquireCount: { r: 952129 } } } 339569ms

I see the PlanSummary is COLLSCAN. As I understand, this means that the query is performing without an index, although there is an index on "start". When I type in mongo console the similar query:

db.mycollection.find({_id: {$gt: 165091491}})

the planSummary is IXSCAN and the query return rows very quickly.

The main thing is: we downgraded MongoDB from 3.2 to 3.0.8 (only one instance, we cloneCollection from), and this issue just disappear. We didn't change our code, steps are the same, and cloneCollection now finish in a few seconds.

So I think, the issue is somewhere in v3.2. I tried to search but didn't find any reported problems similar to this.



 Comments   
Comment by David Storch [ 26/Jul/19 ]

This case resolves around changes in behavior to the snapshot() find command option, but support for this option was removed in version 4.0 under SERVER-32174. Therefore, I don't think there is any action for us to take around this ticket. Closing as Gone Away.

Comment by J Rassi [ 20/Jan/16 ]

Just based on your description, perhaps the number of documents returned from your initial clone query { _id: { $gt: 165091491.0 } } has changed in the last week?

To diagnose further, you can emulate the query that cloneCollection issues against the remote server with find(...).shapshot(); if you run this query with explain("executionStats") and take a look at the "executionMillisEstimate" figure, you'll see a breakdown of where the execution time is going.

Comment by Andrey Borunov [ 20/Jan/16 ]

Thank you, Jason! I will try the given workaround.

One thing I also wanted to mention: my collection is capped and the limit is reached. So the number of documents in it is not changed (_id values are changed), and I assumed that the time to scan through them would be constant too, but it won't. The time was slowly increasing: first it was 5 minutes, a week later (we perform cloneCollection every 5-10 minutes) it was about 13 minutes. What is the possible reason for that? Is it normal behaviour?

Comment by J Rassi [ 19/Jan/16 ]

This indeed has changed in 3.2, as a consequence of SERVER-19593. The cloneCollection command sets the "snapshot" query option on queries issued against the remote server. Before 3.2, the "snapshot" query option always forced an index scan on the _id index, but as of 3.2 this query option forces a collection scan when used against a server configured with the WiredTiger storage engine.

We should consider whether or not we want to change the "cloneCollection" and "copydb" commands to no longer use the "snapshot" query option (note that the "clone" command has never used this query option, as far as I can tell). I'm moving this ticket to the "Needs Triage" state; we should make this decision sometime in the next few weeks.

Andrey: thanks for reporting this issue. As a workaround in the meantime, you could change your application to no longer use the "cloneCollection" command, and instead implement similar functionality on the client side by issuing a query with a {_id: {$gt: ...}} predicate on the "source" server, followed by a bulk insert on the "destination" server.

Comment by Andrey Borunov [ 18/Jan/16 ]

Sorry, another correction: no index on "start" field and we do not query by this field, as one can see in the log. Instead, we query by _id field, which is a number in our case and indeed has an index.

Comment by Andrey Borunov [ 18/Jan/16 ]

> The insertion rate to this collection is about 500-100 doc/sec

Correct: 500-1000 doc/sec

Generated at Thu Feb 08 03:59:44 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.