[SERVER-36808] Server closes cursors that are still in use during session cache refresh Created: 22/Aug/18  Updated: 02/Sep/18  Resolved: 31/Aug/18

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.6.7
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: Shane Harvey Assignee: Misha Tyulenev
Resolution: Duplicate Votes: 5
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File python-1626-output.log     File python-1626.py    
Issue Links:
Duplicate
duplicates SERVER-36332 CursorNotFound error in GetMore on a ... Closed
is duplicated by SERVER-36571 cursor disappears before cursorTimeou... Closed
is duplicated by SERVER-36821 Cursor id not found even though it di... Closed
Related
related to PYTHON-1626 Server closes live cursors Closed
related to SERVER-34810 Session cache refresh can erroneously... Closed
Operating System: ALL
Sprint: Sharding 2018-09-10
Participants:

 Description   

In PYTHON-1626, multiple users report that the server is closing cursors that are still in use resulting in pymongo.errors.CursorNotFound: cursor id 56626714158 not found errors when iterating a cursor. This appears to be the same issue as SERVER-34810 but upgrading to 3.6.6 and 3.6.7 did not resolve the issue.

I can reproduce this issue simply by iterating a cursor on a secondary (3 member replica set with auth enabled running 3.6.7) and waiting 30 seconds between each getMore command. The cursor is usually closed by the server within a few (<5) minutes.

I've attached a repro script that prints the commands that pymongo is running: python-1626.py

The script fails after a few minutes (presumably when the session cache is synced to the sessions collection):

Command getMore with request id 1137623865 failed in 1098 microseconds, failure:
{u'code': 43, u'ok': 0.0, u'$clusterTime': {u'clusterTime': Timestamp(1534962428, 1), u'signature': {u'keyId': 6592612058754187265L, u'hash': Binary('\xa5mr\xe9\xe1)\x95\xae"[\x05\xa7\xe6\x93\xa2\x98\xb9M\x88c', 0)}}, u'codeName': u'CursorNotFound', u'operationTime': Timestamp(1534962428, 1), u'errmsg': u'cursor id 56626714158 not found'}
 
Traceback (most recent call last):
  File "python-1626.py", line 57, in <module>
    for doc in coll.find(batch_size=10, limit=1000):
  File "/Users/shane/git/mongo-python-driver/pymongo/cursor.py", line 1189, in next
    if len(self.__data) or self._refresh():
  File "/Users/shane/git/mongo-python-driver/pymongo/cursor.py", line 1126, in _refresh
    self.__send_message(g)
  File "/Users/shane/git/mongo-python-driver/pymongo/cursor.py", line 982, in __send_message
    helpers._check_command_response(first)
  File "/Users/shane/git/mongo-python-driver/pymongo/helpers.py", line 152, in _check_command_response
    raise CursorNotFound(errmsg, code, response)
pymongo.errors.CursorNotFound: cursor id 56626714158 not found

The full output is: python-1626-output.log

Server version: 3.6.7

$ mongodb-osx-x86_64-enterprise-3.6.7/bin/mongod --version
db version v3.6.7
git version: 2628472127e9f1826e02c665c1d93880a204075e
OpenSSL version: OpenSSL 1.1.0h  27 Mar 2018
allocator: system
modules: enterprise
build environment:
    distarch: x86_64
    target_arch: x86_64

Python version: 2.7
PyMongo version: 3.7.1



 Comments   
Comment by Misha Tyulenev [ 31/Aug/18 ]

The fix for SERVER-36332 resolved it.
Before the fix the secondary issues refresh was not adding any data to the config.system.sessions collection so it was possible to kill this cursor on the next refresh on secondary.

Comment by Kelsey Schubert [ 23/Aug/18 ]

I've linked SERVER-36821, which contains another repro.

Generated at Thu Feb 08 04:44:08 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.