[SERVER-36571] cursor disappears before cursorTimeoutMillis (and its not a timeout) Created: 10/Aug/18  Updated: 11/Oct/18  Resolved: 29/Aug/18

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 4.0.0, 4.0.1
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: Remi Jolin Assignee: Nick Brewer
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File test_cursor_expiration.log     File test_cursor_expiration.py    
Issue Links:
Duplicate
duplicates SERVER-36808 Server closes cursors that are still ... Closed
Operating System: ALL
Steps To Reproduce:

set cursorTimeoutMillis = 7200000 (2 hours) on the test db

in python:  (tested with pymongo 3.7.1)

open a cursor (with enough data so mongo will have to do getMore requests to get all the data); get a few records, then wait for 60mn, then try to get the remainng records from the cursor. 

Cursor crashes: CursorNotFound

 

You'll find attached a program to reproduce this and its log

 

Participants:

 Description   

Context:
We have batchs that get data in db (cursor count > 5M records) and, for each record, compute various data. This computation can be quite long and the delta time between 2 request to getMore can be much more than 10mn (cursorTimeout). So we run our database with cursorTimeoutMillis = 7200000 (2 hours) and everything run well on our current db (mongoDB 3.0.x).

When trying MongoDb 4.0 (currently 4.0.1) we run into an issue : some programs crash with CursorNotFound (all our programs are in python).

 

On specific tests program we can see that the cursor vanishes 30 to 40mn after opening the cursor

 

So I tried to reproduce the problem with a small script that first get a few records from a cursor, than wait for some
(long) time and then get the remaing records. While waiting, I'm getting sessions list and serverStatus().metrics
.cursor

cursor total == 1 (our cursor)

Within 5mn, we see sessions 2 appearing (the cache seems to be updated every 5mn)
one of these sessions (which seems to be related to the cursor as we can see in mongodb.log (verbosity=5)) has its lastUse that does not change over time (seems coherent).

About 30mn later we have only one session left. The session related to the cursor is not there anymore. I guess its because of liidTTLIndex on config.system.sessions

{
 "v" : 2,
 "key" : {
 "lastUse" : 1
 },
 "name" : "lsidTTLIndex",
 "ns" : "config.system.sessions",
 "expireAfterSeconds" : 1800
 }

5mn later (session cache refresh ?) serverStatus.metrics.cursor has a total of 0 and if we stop waiting, our for loop
on the cursor ends with a CursorNotFound error.

I guess that if the index had a expireAfterSeconds greater than cursorTimeoutMillis / 1000 (to get seconds), the issue would disappear.

 

This issue is critical for us. We cannot upgrade until it is solved



 Comments   
Comment by Remi Jolin [ 11/Oct/18 ]

nick.brewer :

New title should be something like : session never expire if parameter localLogicalSessionTimeoutMinutes is changed.

Comment by Remi Jolin [ 11/Oct/18 ]

nick.brewer

I've just tested with version 4.0.3 and the issue is still here.

 

2018-10-11T13:33:34.566+0000 I CONTROL  [initandlisten] MongoDB starting : pid=15496 port=27017 dbpath=/data/test-standalone 64-bit host=mmovh6.milanamos.net
2018-10-11T13:33:34.566+0000 I CONTROL  [initandlisten] db version v4.0.3
2018-10-11T13:33:34.566+0000 I CONTROL  [initandlisten] git version: 7ea530946fa7880364d88c8d8b6026bbc9ffa48c
2018-10-11T13:33:34.566+0000 I CONTROL  [initandlisten] OpenSSL version: OpenSSL 1.1.0g  2 Nov 2017
2018-10-11T13:33:34.566+0000 I CONTROL  [initandlisten] allocator: tcmalloc
2018-10-11T13:33:34.566+0000 I CONTROL  [initandlisten] modules: none
2018-10-11T13:33:34.566+0000 I CONTROL  [initandlisten] build environment:
2018-10-11T13:33:34.566+0000 I CONTROL  [initandlisten]     distmod: ubuntu1804
2018-10-11T13:33:34.566+0000 I CONTROL  [initandlisten]     distarch: x86_64
2018-10-11T13:33:34.566+0000 I CONTROL  [initandlisten]     target_arch: x86_64
2018-10-11T13:33:34.566+0000 I CONTROL  [initandlisten] options: { config: "/data/test-standalone/automation-mongod.conf", net: { bindIp: "0.0.0.0", bindIpAll: true, port: 27017 }, op
erationProfiling: { mode: "slowOp", slowOpThresholdMs: 100 }, processManagement: { fork: true }, security: { authorization: "enabled", keyFile: "/var/lib/mongodb-mms-automation/keyfil
e" }, setParameter: { authenticationMechanisms: "SCRAM-SHA-1,SCRAM-SHA-256", cursorTimeoutMillis: "7200000", localLogicalSessionTimeoutMinutes: "130" }, storage: { dbPath: "/data/test
-standalone", engine: "wiredTiger" }, systemLog: { destination: "file", path: "/data/test-standalone/mongodb.log" } }
 
[...]
 
2018-10-11T15:53:35.906+0000 D COMMAND  [LogicalSessionCacheRefresh] run command config.$cmd { createIndexes: "system.sessions", indexes: [ { key: { lastUse: 1 }, name: "lsidTTLIndex", expireAfterSeconds: 7800 } ], $db: "config" }
2018-10-11T15:53:35.906+0000 D STORAGE  [LogicalSessionCacheRefresh] NamespaceUUIDCache: registered namespace config.system.sessions with UUID 8536711e-8cae-4309-9de1-e73aa77c7ae0
2018-10-11T15:53:35.906+0000 D -        [LogicalSessionCacheRefresh] User Assertion: IndexOptionsConflict: Index with name: lsidTTLIndex already exists with different options src/mongo/db/commands/create_indexes.cpp 230
2018-10-11T15:53:35.906+0000 D REPL     [LogicalSessionCacheRefresh] Waiting for write concern. OpTime: { ts: Timestamp(0, 0), t: -1 }, write concern: { w: 1, wtimeout: 0 }
2018-10-11T15:53:35.906+0000 D COMMAND  [LogicalSessionCacheRefresh] assertion while executing command 'createIndexes' on database 'config' with arguments '{ createIndexes: "system.sessions", indexes: [ { key: { lastUse: 1 }, name: "lsidTTLIndex", expireAfterSeconds: 7800 } ], $db: "config" }': IndexOptionsConflict: Index with name: lsidTTLIndex already exists with different options
2018-10-11T15:53:35.906+0000 I COMMAND  [LogicalSessionCacheRefresh] command config.$cmd command: createIndexes { createIndexes: "system.sessions", indexes: [ { key: { lastUse: 1 }, name: "lsidTTLIndex", expireAfterSeconds: 7800 } ], $db: "config" } numYields:0 ok:0 errMsg:"Index with name: lsidTTLIndex already exists with different options" errName:IndexOptionsConflict errCode:85 reslen:163 locks:{ Global: { acquireCount: { r: 2, w: 2 } }, Database: { acquireCount: { w: 2 } }, Collection: { acquireCount: { w: 1 } } } protocol:op_msg 0ms
2018-10-11T15:53:35.906+0000 I CONTROL  [LogicalSessionCacheRefresh] Sessions collection is not set up; waiting until next sessions refresh interval: Index with name: lsidTTLIndex already exists with different options

So please re-open this ticket.

Comment by Remi Jolin [ 31/Aug/18 ]

Sorry, but I can't see any mention of sessions staying forever when you increase  localLogicalSessionTimeoutMinutes...

The issue exists on a simple DB (no sharding, no replication).

Comment by Nick Brewer [ 29/Aug/18 ]

rj-10gen@arsynet.com This appears to reproduce an issue we're currently tracking here: SERVER-36808

You can follow along with that ticket for updates.

-Nick

Comment by Remi Jolin [ 22/Aug/18 ]

I've checked this morrnig. The local sesions created yesterday for tests are still there...

> new Date()
ISODate("2018-08-22T08:09:05.214Z")
> db.aggregate([{$listLocalSessions: {}}, {$sort: {lastUse: 1}}])
{ "_id" : { "id" : UUID("2a59d779-8a4d-40aa-a981-f97af6ce631a"), "uid" : BinData(0,"gvI6E3rwpLEMfFQo/pac02FLjxJ9+Zfrt+aIKnzCZeE=") }, "lastUse" : ISODate("2018-08-21T13:43:57.982Z"), "user" : "adminMongo@admin" }
{ "_id" : { "id" : UUID("b09f7f33-cbc2-4e9a-9cdf-ec86621a07b5"), "uid" : BinData(0,"gvI6E3rwpLEMfFQo/pac02FLjxJ9+Zfrt+aIKnzCZeE=") }, "lastUse" : ISODate("2018-08-21T13:44:59.584Z"), "user" : "adminMongo@admin" }
{ "_id" : { "id" : UUID("083ad931-5045-4f83-9b8f-e221f6650d1c"), "uid" : BinData(0,"gvI6E3rwpLEMfFQo/pac02FLjxJ9+Zfrt+aIKnzCZeE=") }, "lastUse" : ISODate("2018-08-21T13:51:08.394Z"), "user" : "adminMongo@admin" }
{ "_id" : { "id" : UUID("cece5fbe-e5b5-46fd-b6d5-c17ba48e8d4f"), "uid" : BinData(0,"gvI6E3rwpLEMfFQo/pac02FLjxJ9+Zfrt+aIKnzCZeE=") }, "lastUse" : ISODate("2018-08-21T14:29:32.022Z"), "user" : "adminMongo@admin" }
{ "_id" : { "id" : UUID("185dbaf5-fb3d-493a-b91f-a2ee9372da02"), "uid" : BinData(0,"gvI6E3rwpLEMfFQo/pac02FLjxJ9+Zfrt+aIKnzCZeE=") }, "lastUse" : ISODate("2018-08-21T14:30:33.715Z"), "user" : "adminMongo@admin" }

the index on system.sessions did not change (coherent with log messages):

> db.system.sessions.getIndexes()
[
        {
                "v" : 2,
                "key" : {
                        "_id" : 1
                },
                "name" : "_id_",
                "ns" : "config.system.sessions"
        },
        {
                "v" : 2,
                "key" : {
                        "lastUse" : 1
                },
                "name" : "lsidTTLIndex",
                "ns" : "config.system.sessions",
                "expireAfterSeconds" : 1800
        }
]

I suppose that if I could drop it it would be replaced by the new one with ttl = 7800 and the sessions would get in this collection and finally disappear...

Comment by Remi Jolin [ 21/Aug/18 ]

Hi,

 

I tried to increase prameter localLogicalSessionTimeoutMinutes to 130 to be coherent with the cursor timeout (10mn more than cursor timeout). It seems to fix the cursor issue but there are errors in mongodb.log :

2018-08-21T14:37:15.960+0000 D COMMAND  [LogicalSessionCacheRefresh] run command config.$cmd { createIndexes: "system.sessions", indexes: [ { key: { lastUse: 1 }, name: "lsidTTLIndex", expireAfterSeconds: 7800 } ], $db: "config" }
2018-08-21T14:37:15.960+0000 D -        [LogicalSessionCacheRefresh] User Assertion: IndexOptionsConflict: Index with name: lsidTTLIndex already exists with different options src/mongo/db/commands/create_indexes.cpp 330
2018-08-21T14:37:15.960+0000 D COMMAND  [LogicalSessionCacheRefresh] assertion while executing command 'createIndexes' on database 'config' with arguments '{ createIndexes: "system.sessions", indexes: [ { key: { lastUse: 1 }, name: "lsidTTLIndex", expireAfterSeconds: 7800 } ], $db: "config" }': IndexOptionsConflict: Index with name: lsidTTLIndex already exists with different options
2018-08-21T14:37:15.960+0000 I COMMAND  [LogicalSessionCacheRefresh] command config.$cmd command: createIndexes { createIndexes: "system.sessions", indexes: [ { key: { lastUse: 1 }, name: "lsidTTLIndex", expireAfterSeconds: 7800 } ], $db: "config" } numYields:0 ok:0 errMsg:"Index with name: lsidTTLIndex already exists with different options" errName:IndexOptionsConflict errCode:85 reslen:163 locks:{ Global: { acquireCount: { r: 1, w: 1 } }, Database: { acquireCount: { W: 1 } } } protocol:op_msg 0ms
2018-08-21T14:37:15.960+0000 I CONTROL  [LogicalSessionCacheRefresh] Sessions collection is not set up; waiting until next sessions refresh interval: Index with name: lsidTTLIndex already exists with different options
2018-08-21T14:37:15.987+0000 D STORAGE  [TTLMonitor]  fetched CCE metadata: { md: { ns: "config.system.sessions", options: { uuid: UUID("8536711e-8cae-4309-9de1-e73aa77c7ae0") }, indexes: [ { spec: { v: 2, key: { _id: 1 }, name: "_id_", ns: "config.system.sessions" }, ready: true, multikey: false, multikeyPaths: { _id: BinData(0, 00) }, head: 0, prefix: -1, backgroundSecondary: false }, { spec: { v: 2, key: { lastUse: 1 }, name: "lsidTTLIndex", ns: "config.system.sessions", expireAfterSeconds: 1800 }, ready: true, multikey: false, multikeyPaths: { lastUse: BinData(0, 00) }, head: 0, prefix: -1, backgroundSecondary: false } ], prefix: -1 }, idxIdent: { _id_: "index-5-1117377452801273835", lsidTTLIndex: "index-6-1117377452801273835" }, ns: "config.system.sessions", ident: "collection-4-1117377452801273835" }
2018-08-21T14:37:15.987+0000 D STORAGE  [TTLMonitor] returning metadata: md: { ns: "config.system.sessions", options: { uuid: UUID("8536711e-8cae-4309-9de1-e73aa77c7ae0") }, indexes: [ { spec: { v: 2, key: { _id: 1 }, name: "_id_", ns: "config.system.sessions" }, ready: true, multikey: false, multikeyPaths: { _id: BinData(0, 00) }, head: 0, prefix: -1, backgroundSecondary: false }, { spec: { v: 2, key: { lastUse: 1 }, name: "lsidTTLIndex", ns: "config.system.sessions", expireAfterSeconds: 1800 }, ready: true, multikey: false, multikeyPaths: { lastUse: BinData(0, 00) }, head: 0, prefix: -1, backgroundSecondary: false } ], prefix: -1 }

and $listCursor returns no session. never.

Comment by Remi Jolin [ 10/Aug/18 ]

@Nick Brewer

I re-run my test program because I could not find the previous logs.I uploaded  both mongodb log and the script log to the secure portal. Both are running on the same server so timestamps are the same

Tell me if you want more information.

 

Thanks.

Remi.

Comment by Nick Brewer [ 10/Aug/18 ]

rj-10gen@arsynet.com Could you upload the mongod logs from the time the cursor is opened, to the time you receive the error? You can upload this information to our secure portal

Thanks,
Nick

Generated at Thu Feb 08 04:43:30 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.