[SERVER-46797] session refresh does not use localLogicalSessionTimeoutMinutes to delete old sessions Created: 11/Mar/20  Updated: 27/Oct/23  Resolved: 07/Jul/20

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 4.2.3
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Remi Jolin Assignee: Dmitry Agranat
Resolution: Community Answered Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File mongod.log.1    
Operating System: ALL
Steps To Reproduce:

set localLogicalSessionTimeoutMinutes: 120 and cursorTimeoutMillis : 7200000

restart mongoDB,

start a cursor, get a few records and wait for 40mn, issue a getMore on the cursor (try to retrieve all the cursor's values).

You'll get a cursorNotFound error

Participants:

 Description   

If you change localLogicalSessionTimeoutMinutes parameter to a value greater then 30, for example 120, sessions are still deleted after 30mn. And the attached cursors are deleted too

 

2020-03-11T14:42:57.041+0000 I QUERY [LogicalSessionCacheRefresh] killing cursor: 4547660581473595061 as part of killing session(s)

program was started at 14:07 

 



 Comments   
Comment by Dmitry Agranat [ 07/Jul/20 ]

Hi rj-10gen@arsynet.com,

We haven’t heard back from you for some time, so I’m going to mark this ticket as resolved. If this is still an issue for you, please provide additional information and we will reopen the ticket.

Regards,
Dima

Comment by Carl Champain (Inactive) [ 23/Mar/20 ]

rj-10gen@arsynet.com,

Thanks for getting back to us.
We are also seeing that mongos doesn't have the lsidTTLIndex. However, if you connect directly to the mongod nodes in your cluster, then you should be able to see the lsidTTLIndex in the system.sessions collection, in the config database. The solution must be applied to each of these nodes individually.

Comment by Remi Jolin [ 23/Mar/20 ]

Hi,

I'll try the dropIndex, wait, ... process before next week but as I said before, the is no lsidTTLIndex when you issue a db.system.sessions.getIndexes() on mongos. Only the index on _id.

And regarding not changing localLogicalSessionTimeoutMinutes, it's ok now for me. I added a "auto-refresh" session to my ODM and use this session in the find/aggregate below. 

Syntax is something like (python):

 

with Collection.start_session() as session:
    for rec in Collection.find({}, session=session):
        ....
 

When running start_session(), there a thread n the background that will do a refreshSessions every 10mn

 

Comment by Carl Champain (Inactive) [ 19/Mar/20 ]

Hi rj-10gen@arsynet.com,

Thank you for sending us the information.
Our hypothesis is that your production cluster is hitting SERVER-43218. This bug happens if you restart all cluster nodes with a new localLogicalSessionTimeoutMinutes, which leads the TTL indexes on the shards to not be updated to the new value. A solution is to:

  1. Drop the lsidTTLIndex by running db.system.sessions.dropIndex("lsidTTLIndex") on each node
  2. Wait for the next logical session refresh to update the value.
  3. Run db.system.sessions.getIndexes() to confirm the changes.

Note that our documentation mentions the following regarding the use of localLogicalSessionTimeoutMinutes:

This parameter is intended for testing purposes only and not for production use.

This means that we maintain commands for testing and commands for production differently.

 

We were also able to successfully recreate the single node behavior. Consequently, we opened SERVER-46982 to fix this bug—you can watch it for updates. However, if this behavior is a problem for you right now, then it can also be fixed with the solution above.

 

On the other end, on the production system, there are sessions in system.sessions. But there is no "expireAfter" index. Only the index on _id.

It is difficult for us to diagnose this particular problem without the logs.

 

Finally, the following log line you provided from your production cluster

2020-03-13T14:02:36.321+0000 I COMMAND [conn1876495] command config.$cmd command: createIndexes { createIndexes: "system.sessions", indexes: [ { key: { lastUse: 1 }, name: "lsidTTLIndex", expireAfterSeconds: 1800 } ], allowImplicitCollectionCreation: false, $clusterTime: { clusterTime: Timestamp(1584108155, 2), signature: { hash: BinData(0, 58CF2D4173FF3E7EBA760392AE730C679EED2DF9), keyId: 6790649384410284063 } }, $configServerState: { opTime: { ts: Timestamp(1584108155, 2), t: 2 } }, $db: "config" } numYields:0 ok:0 errMsg:"request doesn't allow collection to be created implicitly" errName:CannotImplicitlyCreateCollection errCode:227 reslen:491 locks:{ ParallelBatchWriterMode: { acquireCount: { r: 3 } }, ReplicationStateTransition: { acquireCount: { w: 3 } }, Global: { acquireCount: { r: 1, w: 2 } }, Database: { acquireCount: { r: 1, w: 2 } }, Collection: { acquireCount: { r: 3, w: 1, W: 1 } }, Mutex: { acquireCount: { r: 2 } } } flowControl:{ acquireCount: 1 } protocol:op_msg 258ms

appears to be an issue which was worked on in SERVER-44720. These log lines appear because createIndexes is being sent to shards that don't own chunks for the sessions collection. Luckily, It doesn't indicate anything wrong on your side and the extra log noise should go away in v4.4.

Kind regards,
Carl

Comment by Remi Jolin [ 13/Mar/20 ]

Done. zip file contains  db files + conf file

Enjoy

Comment by Carl Champain (Inactive) [ 13/Mar/20 ]

rj-10gen@arsynet.com,

I've created a secure upload portal for you. Files uploaded to this portal are visible only to MongoDB employees and are routinely deleted after some time. Can you upload the test DB and the production log file?

Thank you,
Carl

Comment by Remi Jolin [ 13/Mar/20 ]

if you have a private repo, I can upload the test DB. 

Comment by Remi Jolin [ 13/Mar/20 ]

As it was difficult to send you the production log file (too many private information in it), I tried to reproduce the issue on a full new mongoDB installation.

So test is mongoDB 4.2.3 no shard, no replication. and... I could not reproduce the issue but found some différences between the 2.

 production is mongoDB 4.2.3, + sharding + replication

on the test system, it seems that the sessions never timeout. Here you can see LocalSessions was lastUse earlier than 2 hours :

 

> new Date()
ISODate("2020-03-13T13:20:55.090Z")
> db.aggregate([{$listLocalSessions: {allUsers: true}}])
{ "_id" : { "id" : UUID("722816e7-c229-4c06-b68c-55b93a01b049"), "uid" : BinData(0,"47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=") }, "lastUse" : ISODate("2020-03-13T09:30:50.980Z") }
{ "_id" : { "id" : UUID("8442dbc0-3166-493a-8712-a96b92e9afa3"), "uid" : BinData(0,"47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=") }, "lastUse" : ISODate("2020-03-13T10:54:14.669Z") }
{ "_id" : { "id" : UUID("7bf51574-3a56-4846-8490-b63a581f483d"), "uid" : BinData(0,"47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=") }, "lastUse" : ISODate("2020-03-13T13:03:39.356Z") }
{ "_id" : { "id" : UUID("bcce711f-9fe0-4a24-aa2c-896c4c433612"), "uid" : BinData(0,"47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=") }, "lastUse" : ISODate("2020-03-13T09:25:28.419Z") }
{ "_id" : { "id" : UUID("3b8dcd2f-76a3-4f1e-b214-a7621979fa40"), "uid" : BinData(0,"47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=") }, "lastUse" : ISODate("2020-03-13T13:20:56.670Z") }
{ "_id" : { "id" : UUID("cda8c868-a27a-419f-a7d2-fb12325dda3b"), "uid" : BinData(0,"47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=") }, "lastUse" : ISODate("2020-03-13T10:34:08.080Z") }
{ "_id" : { "id" : UUID("38e7e83a-b977-4e3a-a90a-d97d061a8abf"), "uid" : BinData(0,"47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=") }, "lastUse" : ISODate("2020-03-13T09:40:34.902Z") }
> db.system.sessions.aggregate([{$listSessions: {allUsers: true}}])
> new Date()
ISODate("2020-03-13T13:21:02.139Z")
>

 

And no sessions in system.sessions...

the index on system.sessions are :

 

 
> db.system.sessions.getIndexes()
[ { "v" : 2, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "config.system.sessions" }, 
  { "v" : 2, 
    "key" : { "lastUse" : 1 }, 
    "name" : "lsidTTLIndex", 
    "ns" : "config.system.sessions", 
    "expireAfterSeconds" : NumberLong(7200) }
]> 

(seems ok)

Many I-Control messages regarding sessions in mongodb.log (see complete log joined)

2

020-03-13T11:14:47.636+0100 I CONTROL [LogicalSessionCacheReap] Failed to reap transaction table: Location13111: wrong type for field (expireAfterSeconds) long != int
2020-03-13T11:14:47.636+0100 I CONTROL [LogicalSessionCacheRefresh] Failed to refresh session cache: Location13111: wrong type for field (expireAfterSeconds) long != int

 

repeated every 5 mn.

 

 

On the other end, on the production system, there are sessions in system.sessions. But there is no "expireAfter" index. Only the index on _id. 

 

mongos> db.system.sessions.find().count()
36
mongos> db.system.sessions.getIndexes()
[ { "v" : 2, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "config.system.sessions" }]
mongos> 
 

And I found those messages in mongodb.log (every 5mn)

2020-03-13T14:02:36.321+0000 I COMMAND [conn1876495] command config.$cmd command: createIndexes { createIndexes: "system.sessions", indexes: [ { key: { lastUse: 1 }, name: "lsidTTLIndex", expireAfterSeconds: 1800 } ], allowImplicitCollectionCreation: false, $clusterTime: { clusterTime: Timestamp(1584108155, 2), signature: { hash: BinData(0, 58CF2D4173FF3E7EBA760392AE730C679EED2DF9), keyId: 6790649384410284063 } }, $configServerState: { opTime: { ts: Timestamp(1584108155, 2), t: 2 } }, $db: "config" } numYields:0 ok:0 errMsg:"request doesn't allow collection to be created implicitly" errName:CannotImplicitlyCreateCollection errCode:227 reslen:491 locks:{ ParallelBatchWriterMode: { acquireCount: { r: 3 } }, ReplicationStateTransition: { acquireCount: { w: 3 } }, Global: { acquireCount: { r: 1, w: 2 } }, Database: { acquireCount: { r: 1, w: 2 } }, Collection: { acquireCount: { r: 3, w: 1, W: 1 } }, Mutex: { acquireCount: { r: 2 } } } flowControl:{ acquireCount: 1 } protocol:op_msg 258ms

It seems to try to create the "expireAfter" index and cannot do it. But why is it trying to set the timeout to 1800 seconds and not 7200 ? (in conf file and with getParameter command: localLogicalSessionTimeoutMinutes: 120)

Well... in test, sessions never expire and on production, they expire before timeout...

Comment by Carl Champain (Inactive) [ 12/Mar/20 ]

Hi rj-10gen@arsynet.com,

Thank you for the report.
To help us diagnose this issue, can you please provide the mongod.log file for this node?

Kind regards,
Carl
 

Generated at Thu Feb 08 05:12:28 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.