[SERVER-46918] in a sharded db, cursor timeout on a shard while mongos is consuming the cursor on another shard Created: 17/Mar/20  Updated: 15/Jun/20  Resolved: 15/Jun/20

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Remi Jolin Assignee: Mihai Andrei
Resolution: Duplicate Votes: 0
Labels: qexec-team
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-6036 Disable cursor timeout for cursors th... Closed
Sprint: Query 2020-06-01, Query 2020-06-15
Participants:

 Description   

Suppose you have a 2 shards db and a sharded collection with a shard key 'origin'. 2.6 M documents.

a sh.status() gives:

 

        {  "_id" : "test",  "primary" : "sh_0",  "partitioned" : true,  "version" : {  "uuid" : UUID("28b86279-c3e8-4432-b325-28136e353a85"),  "lastMod" : 1 } }        {  "_id" : "test",  "primary" : "sh_0",  "partitioned" : true,  "version" : {  "uuid" : UUID("28b86279-c3e8-4432-b325-28136e353a85"),  "lastMod" : 1 } }                
 
               test.sh_coll
                        shard key: { "origin" : 1 }
                        unique: false
                        balancing: true
                        chunks:
                                sh_0 5
                                sh_1 5
                        { "origin" : { "$minKey" : 1 } } -->> { "origin" : "A000001" } on : sh_1 Timestamp(3, 2)
                         { "origin" : "A000001" } -->> { "origin" : "F000001" } on : sh_1 Timestamp(3, 3)
                         { "origin" : "F000001" } -->> { "origin" : "H050002" } on : sh_1 Timestamp(3, 4)
                         { "origin" : "H050002" } -->> { "origin" : "K000003" } on : sh_1 Timestamp(3, 5)
                         { "origin" : "K000003" } -->> { "origin" : "N" } on : sh_1 Timestamp(3, 6)
                         { "origin" : "N" } -->> { "origin" : "P050000" } on : sh_0 Timestamp(3, 7)
                         { "origin" : "P050000" } -->> { "origin" : "S000001" } on : sh_0 Timestamp(3, 8)
                         { "origin" : "S000001" } -->> { "origin" : "U050002" } on : sh_0 Timestamp(3, 9)
                         { "origin" : "U050002" } -->> { "origin" : "Y045375" } on : sh_0 Timestamp(3, 10)
                         { "origin" : "Y045375" } -->> { "origin" : { "$maxKey" : 1 } } on : sh_0 Timestamp(3, 11)
 

 

So, sorted by 'origin', half the records are in one shard the the next half is in the other shard.

Now suppose we want to get all the records, sorted by 'origin' and every 1000 records, make a process that takes 1 seconds... Getting half of the records takes more than 10mn (the default timeout for cursors). 

If we don't specify a batch_size, we don't even arrive to 1.182 records.

If we specify a batch size of 1000, we get a "RecordNotFound" error when, after having returned half the records + the records from the first get from the 2nd shard, mongos tries to get more data from the 2nd shard.

That's what we wanted to show up because whatever the batch_size you specify, you won't be able to get all the records. Using a small batch size is often described as the way to solve cursors timeout. In a multi sharded db it is not the case.

 

Of course, this example has been crafted to reproduce the issue. It is not a real use case but I have real ones where I have 10 shards, process of each record takes more than 1 second and cursorTimeoutMillis: "7200000" (2 hours) and still get CursorNotFound errors.

no_cursor_timeout=True is not an option because it can lead to ghost cursors in the db

 

 



 Comments   
Comment by Mihai Andrei [ 15/Jun/20 ]

Hi rj-10gen@arsynet.com
After taking a look, this doesn’t appear to be a bug since there isn’t anything wrong with how the lastUseDate (i.e. the quantity used to determine if a cursor has timed out) of each cursor is being set. More precisely because the cursor established on the second shard in your repro script gets established but never used, it times out as one would expect. As such, I think this is a feature request along the lines of cursor keepAlive where there should be some mechanism (either in the background or otherwise) that lets cursors across multiple shards ping each other to prevent cursor timeouts like you are seeing in your repro script. As such, I’m going to go ahead and change this ticket from a bug to an improvement and close it as a duplicate of SERVER-6036; feel free to reach out if you have any questions.

Comment by Mihai Andrei [ 09/Jun/20 ]

The originating query requests a sort, which means that, after cursors are established on both shards, the cursor on the first shard (i.e. the shard that possesses the chunks which come first in sorted order) must be totally exhausted before we start to issue getMores against the cursor on the second shard (i.e. the shard that possesses the chunks which come after those on shard one in sorted order). This makes sense because this is how RouterStageMerge (the stage used in the query plan) executes and satisfies the requested sort. Now this leads to the question of why the cursor on the second shard timed out: the reason is because after the cursor on the second shard gets established and we record its _lastUseDate upon construction () (i.e. the value that is used by the mongod to determine whether a cursor has timed out or not) it simply idles and does nothing because the cursor on the first shard is in the process of being completely exhausted. Consequently, the _lastUseDate for the second shard's cursor never gets updated (since no getMores are issued against it before the time expires), and so the cursor gets killed:

Some logging to confirm the above:

Cursor on second shard gets established:

2020-06-09T16:37:15.898-0400 I  QUERY    [conn18]  creating cursor with id 860344826837784855 and last use date 2020-06-09T16:37:15.898-0400

The _lastUsedDate of the cursor never changes (as expected, since it doesn't get used)....

2020-06-09T16:37:34.440-0400 I  QUERY    [clientcursormon] cursor 860344826837784855 time left: 2020-06-09T16:37:34.440-0400 last used 2020-06-09T16:37:15.900-0400
2020-06-09T16:37:38.440-0400 I  QUERY    [clientcursormon] cursor 860344826837784855 time left: 2020-06-09T16:37:38.440-0400 last used 2020-06-09T16:37:15.900-0400
2020-06-09T16:37:42.440-0400 I  QUERY    [clientcursormon] cursor 860344826837784855 time left: 2020-06-09T16:37:42.440-0400 last used 2020-06-09T16:37:15.900-0400
2020-06-09T16:37:46.441-0400 I  QUERY    [clientcursormon] cursor 860344826837784855 time left: 2020-06-09T16:37:46.441-0400 last used 2020-06-09T16:37:15.900-0400
2020-06-09T16:37:50.441-0400 I  QUERY    [clientcursormon] cursor 860344826837784855 time left: 2020-06-09T16:37:50.441-0400 last used 2020-06-09T16:37:15.900-0400
2020-06-09T16:37:54.441-0400 I  QUERY    [clientcursormon] cursor 860344826837784855 time left: 2020-06-09T16:37:54.441-0400 last used 2020-06-09T16:37:15.900-0400
2020-06-09T16:37:58.441-0400 I  QUERY    [clientcursormon] cursor 860344826837784855 time left: 2020-06-09T16:37:58.441-0400 last used 2020-06-09T16:37:15.900-0400
2020-06-09T16:38:02.441-0400 I  QUERY    [clientcursormon] cursor 860344826837784855 time left: 2020-06-09T16:38:02.441-0400 last used 2020-06-09T16:37:15.900-0400
2020-06-09T16:38:06.442-0400 I  QUERY    [clientcursormon] cursor 860344826837784855 time left: 2020-06-09T16:38:06.442-0400 last used 2020-06-09T16:37:15.900-0400
2020-06-09T16:38:10.442-0400 I  QUERY    [clientcursormon] cursor 860344826837784855 time left: 2020-06-09T16:38:10.442-0400 last used 2020-06-09T16:37:15.900-0400
2020-06-09T16:38:14.442-0400 I  QUERY    [clientcursormon] cursor 860344826837784855 time left: 2020-06-09T16:38:14.442-0400 last used 2020-06-09T16:37:15.900-0400
2020-06-09T16:38:18.442-0400 I  QUERY    [clientcursormon] cursor 860344826837784855 time left: 2020-06-09T16:38:18.442-0400 last used 2020-06-09T16:37:15.900-0400
2020-06-09T16:38:22.442-0400 I  QUERY    [clientcursormon] cursor 860344826837784855 time left: 2020-06-09T16:38:22.442-0400 last used 2020-06-09T16:37:15.900-0400
2020-06-09T16:38:26.443-0400 I  QUERY    [clientcursormon] cursor 860344826837784855 time left: 2020-06-09T16:38:26.443-0400 last used 2020-06-09T16:37:15.900-0400
2020-06-09T16:38:30.443-0400 I  QUERY    [clientcursormon] cursor 860344826837784855 time left: 2020-06-09T16:38:30.443-0400 last used 2020-06-09T16:37:15.900-0400
2020-06-09T16:38:34.443-0400 I  QUERY    [clientcursormon] cursor 860344826837784855 time left: 2020-06-09T16:38:34.443-0400 last used 2020-06-09T16:37:15.900-0400
2020-06-09T16:38:38.443-0400 I  QUERY    [clientcursormon] cursor 860344826837784855 time left: 2020-06-09T16:38:38.443-0400 last used 2020-06-09T16:37:15.900-0400
2020-06-09T16:38:42.443-0400 I  QUERY    [clientcursormon] cursor 860344826837784855 time left: 2020-06-09T16:38:42.443-0400 last used 2020-06-09T16:37:15.900-0400
2020-06-09T16:38:46.444-0400 I  QUERY    [clientcursormon] cursor 860344826837784855 time left: 2020-06-09T16:38:46.444-0400 last used 2020-06-09T16:37:15.900-0400

...all the way up until the cursor exceeds the timeout of 10 minutes:

2020-06-09T16:47:02.470-0400 I  QUERY    [clientcursormon] cursor 860344826837784855 time left: 2020-06-09T16:47:02.470-0400 last used 2020-06-09T16:37:15.900-0400
2020-06-09T16:47:06.470-0400 I  QUERY    [clientcursormon] cursor 860344826837784855 time left: 2020-06-09T16:47:06.470-0400 last used 2020-06-09T16:37:15.900-0400
2020-06-09T16:47:10.470-0400 I  QUERY    [clientcursormon] cursor 860344826837784855 time left: 2020-06-09T16:47:10.470-0400 last used 2020-06-09T16:37:15.900-0400
2020-06-09T16:47:14.470-0400 I  QUERY    [clientcursormon] cursor 860344826837784855 time left: 2020-06-09T16:47:14.470-0400 last used 2020-06-09T16:37:15.900-0400
2020-06-09T16:47:18.470-0400 I  QUERY    [clientcursormon] cursor 860344826837784855 time left: 2020-06-09T16:47:18.470-0400 last used 2020-06-09T16:37:15.900-0400
2020-06-09T16:47:18.470-0400 I  QUERY    [clientcursormon] Cursor id 860344826837784855 timed out, idle since 2020-06-09T16:37:15.900-0400
2020-06-09T16:52:02.462-0400 I  CONNPOOL [TaskExecutorPool-0] Dropping all pooled connections to localhost:30010 due to ShutdownInProgress: Pool for localhost:30010 has expired.

Note that some documents are returned from the second shard because the first batch of results from the second shard is returned as part of the response from establishing the cursor (this explains why only first 1000 documents from the second shard are returned).

Given all this, we’re left with the question of whether this represents an actual bug in the server. This comes down to how cursor timeouts are meant to behave within the server: if it’s expected that a cursor should be killed if it doesn’t get used (i.e. lastUsedDate doesn’t get updated) within ‘cursorTimeOutMillis’ milliseconds, then this is not a bug. This is my interpretation of how cursor timeouts ought to work: the cursor on the second shard doesn’t get used within cursorTimeoutMillis, so it should timeout. Though it’s unfortunate, it seems like the only workaround at present is to set noCursorTimeout to true (that is, unless SERVER-6036 gets implemented).

If there are no objections, I’m going to close this ticket as a duplicate of SERVER-6036, since, if I understand correctly, implementing cursor keepalive would solve the issue described in this ticket and would fix the behavior that I've observed above: the cursor on the second shard would not timeout were there a way to keep cursors alive during a long running query spanning multiple shards.

Comment by Misha Tyulenev [ 23/Apr/20 ]

carl.champain this is looking like a query issue, I'm passing it to their backlog to investigate and sort out.

Comment by Carl Champain (Inactive) [ 24/Mar/20 ]

Hi rj-10gen@arsynet.com,

We are passing this ticket along to the appropriate team for additional investigation. Updates will be posted on this ticket as they happen.

Thank you,
Carl

Comment by Remi Jolin [ 23/Mar/20 ]

Hi carl.champain

I've just uploaded a zip file containing the DB, mongodb logs and a get_data.py script.

Best regards,
Remi.

Comment by Carl Champain (Inactive) [ 20/Mar/20 ]

Hi rj-10gen@arsynet.com,

Can you please share with us the working db and the python script example showing this issue?
I've created a secure upload portal for you. Files uploaded to this portal are visible only to MongoDB employees and are routinely deleted after some time.

Thanks,
Carl

Comment by Remi Jolin [ 19/Mar/20 ]

No, You should not have to construct your request to turn around DB's issues.  It may be a DBA problem but not a developer's one.

Shards' cursors should not timeout before the mongos "main" cursor" or we should have a way to "refresh" the cursors (that's what I call a DBA problem, it can be hidden to the developer...).

The main problem is that you can have a program that runs smoothly for years. Then the DBA adds more shards or migrate data to a DB with more (mush more) shards and the program suddenly crashes. Just because the more shards you have, the more chances you have to have one shard's cursor to imeout.

Comment by Dmitry Agranat [ 19/Mar/20 ]

Hi rj-10gen@arsynet.com, would SERVER-6036 help address this issue?

Generated at Thu Feb 08 05:12:48 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.