[SERVER-39044] Cannot add session into the cache error code 261 Created: 16/Jan/19  Updated: 06/Nov/20  Resolved: 22/Feb/19

Status: Closed
Project: Core Server
Component/s: Internal Code
Affects Version/s: 4.0.3
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Jaume Valls Vila Assignee: Misha Tyulenev
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File mongod_PRIMARY.txt     Text File mongod_SECONDARY.txt    
Issue Links:
Duplicate
duplicates SERVER-39420 Remove in-memory boolean to indicate ... Closed
Operating System: ALL
Sprint: Sharding 2019-02-25
Participants:
Case:

 Description   

We have a MongoDB cluster with the following configuration:

  • 3 config servers (replicatesconfig)
  • 2 shards (replicaset)
  • 3 mongos

After high load testing we found out that the queries start to fail and we get the following error.

ok" : 0,
 "errmsg" : "cannot add session into the cache",
 "code" : 261,
 "codeName" : "TooManyLogicalSessions",
 "operationTime" : Timestamp(1547653301, 1),
  "$clusterTime" : {
    "clusterTime" : Timestamp(1547653301, 1),
    "signature" : {
      "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
      "keyId" : NumberLong(0)
  }

Inside mongo logs we could find this message:

CONTROL [LogicalSessionCacheRefresh] Failed to refresh session cache: NoProgressMade: no progress was made executing batch write op in config.system.sessions after 5 rounds (0 ops completed in 6 rounds total)

After some research there is no documentation related to this error. What does this error mean? How can we fix it?



 Comments   
Comment by sudheer reddy bholla [ 06/Nov/20 ]

please check if you have sharding option enabled in the  configuration file 

if yes remove it and restart the mongodb service 

Comment by Joydip Datta [ 08/Jul/20 ]

I am getting the same error in a standalone mongodb setup.

MongoDB Enterprise node01:PRIMARY> show dbs
2020-07-08T12:56:07.934+0000 E QUERY    [js] Error: listDatabases failed:{
	"ok" : 0,
	"errmsg" : "cannot add session into the cache",
	"code" : 261,
	"codeName" : "TooManyLogicalSessions"
} :
_getErrorWithCode@src/mongo/shell/utils.js:25:13
Mongo.prototype.getDBs@src/mongo/shell/mongo.js:139:1
shellHelper.show@src/mongo/shell/utils.js:882:13
shellHelper@src/mongo/shell/utils.js:766:15
@(shellhelp2):1:1

MongoDB version is 4.0.19

MongoDB Enterprise node01:PRIMARY> version()
4.0.19

Following are excerpts from mongod log

2020-07-08T12:56:03.123+0000 I NETWORK  [listener] connection accepted from XX.YY.20.30:47572 #2050276 (5 connections now open)
2020-07-08T12:56:03.124+0000 I NETWORK  [conn2050276] received client metadata from XX.YY.20.30:47572 conn2050276: { application: { name: "MongoDB Shell" }, driver: { name: "MongoDB Internal Client", version: "4.0.19" }, os: { type: "Linux", name: "CentOS Linux release 7.6.1810 (Core) ", architecture: "x86_64", version: "Kernel 3.10.0-957.el7.x86_64" } }
2020-07-08T12:56:03.138+0000 I ACCESS   [conn2050276] Successfully authenticated as principal rootuser on admin from client XX.YY.20.30:47572
2020-07-08T12:56:22.557+0000 I COMMAND  [PeriodicTaskRunner] task: UnusedLockCleaner took: 589ms
2020-07-08T12:57:23.176+0000 I COMMAND  [PeriodicTaskRunner] task: UnusedLockCleaner took: 618ms
2020-07-08T12:58:23.770+0000 I COMMAND  [PeriodicTaskRunner] task: UnusedLockCleaner took: 593ms
2020-07-08T12:59:24.472+0000 I COMMAND  [PeriodicTaskRunner] task: UnusedLockCleaner took: 701ms
2020-07-08T12:59:59.041+0000 I CONTROL  [LogicalSessionCacheReap] Sessions collection is not set up; waiting until next sessions reap interval: sharding state is not yet initialized
2020-07-08T12:59:59.041+0000 I CONTROL  [LogicalSessionCacheRefresh] Sessions collection is not set up; waiting until next sessions refresh interval: sharding state is not yet initialized
2020-07-08T13:00:25.044+0000 I COMMAND  [PeriodicTaskRunner] task: UnusedLockCleaner took: 572ms
2020-07-08T13:01:25.624+0000 I COMMAND  [PeriodicTaskRunner] task: UnusedLockCleaner took: 579ms
2020-07-08T13:02:26.222+0000 I COMMAND  [PeriodicTaskRunner] task: UnusedLockCleaner took: 597ms
2020-07-08T13:03:26.738+0000 I COMMAND  [PeriodicTaskRunner] task: UnusedLockCleaner took: 515ms
2020-07-08T13:04:27.236+0000 I COMMAND  [PeriodicTaskRunner] task: UnusedLockCleaner took: 498ms
2020-07-08T13:04:38.185+0000 I NETWORK  [conn2050276] end connection XX.YY.20.30:47572 (4 connections now open)
2020-07-08T13:04:59.041+0000 I CONTROL  [LogicalSessionCacheReap] Sessions collection is not set up; waiting until next sessions reap interval: sharding state is not yet initialized
2020-07-08T13:04:59.041+0000 I CONTROL  [LogicalSessionCacheRefresh] Sessions collection is not set up; waiting until next sessions refresh interval: sharding state is not yet initialized

Any help would be useful. The cluster is down and this is affecting our workloads.

Also, is there a way to reopen this JIRA or should I open a new one?

Comment by Eric Yum [ 25/Dec/19 ]

I have submit a new issue in 'Python Driver': https://jira.mongodb.org/browse/PYTHON-2090

 

Comment by Eric Yum [ 25/Dec/19 ]

Hi Misha/Daniel,

I have encountered this bug too! And I try the new mongoDB 4.0.13, however, it does not disappear!

Fortunately, I have a clue: before I upgrade pymongo from 3.7 to 3.10, this bug appear. And when downgrading from 3.10 to 3.7, it disappear.

Obviously, the root cause is in pymongo 3.10!

Comment by Misha Tyulenev [ 22/Feb/19 ]

greg.mckeon yes its a dup

Comment by Jaume Valls Vila [ 21/Feb/19 ]

Hello Misha/Daniel,

Finally we did a downgrade to version 3.6 that we already worked with and had no problems. We were having a big cost and we had to make a decision to move onwards with the production system.

Thanks,

Comment by Danny Hatcher (Inactive) [ 20/Feb/19 ]

Hello emarketingsolutions,

I believe that the simplest solution mentioned by Misha is to perform a rolling restart of your config servers (restart each of the Secondaries in turn and then restart the Primary). That should autogenerate the collection and you should not experience further errors. If you do still see the problems after cycling the config servers, please let us know.

Thanks,

Danny

Comment by Misha Tyulenev [ 12/Feb/19 ]

emarketingsolutions this is a bug we are working on. To force session collection recreation the config server needs to be bounced. Another option is to manually create the session collection and indexes. What approach is better suited. CC kelsey.schubert

Comment by Jaume Valls Vila [ 11/Feb/19 ]

Hello Misha,

system.sessions collection has not been recreated. Currently it doesn't exist.
But everything seems to work. Do we have to do something to have that collection recreated?

Thanks

Comment by Jaume Valls Vila [ 07/Feb/19 ]

Hello Misha,

Yesterday when I dropped the sessions collection, the system.sessions structure was removed (not only sessions but also system).
Is not being recreated again, do I have to run some command or restart something?

Thanks

UPDATE
We have restarted all the mongos, config and shards servers. Looks like is working now, we have done a high load test and has worked correctly.
But system.sessions collection does not exist.

Comment by Jaume Valls Vila [ 06/Feb/19 ]

Hello Misha,

I confirm that all mongos are upgraded to 4.0.5 (also the three config servers and the two shards are upgraded to 4.0.5 version).
I have dropped the sessions collection and waiting to see if its recreated correctly.

Thank you

Comment by Misha Tyulenev [ 06/Feb/19 ]

emarketingsolutions correct, if the config.system.sessions collection is not sharded the system will not be able to expire used sessions and once it reaches the max it will produce the error you see.
The fix is to drop the sessions collection and wait until its recreated.
I don't have an explanation why it did not happen, can you please confirm that all mongos are also correctly upgraded to 4.0.5?

Comment by Julien Durillon [ 06/Feb/19 ]

Hi,
I'm following this issue and checking on my own sharded cluster. I have the same not sharded collection, but the

db.getSiblingDB('config').collections.find({_id: 'config.system.sessions'})

command returns nothing. I can access the collection, though. I found in the code that there should be a lsidTTLIndex index, but it's not there. I tried creating it and got

not authorized on config to execute command { createIndexes: \"system.sessions\", indexes: [ { key: { lastUse: 1.0 }, name: \"lsidTTLIndex\", expireAfterSeconds: 1800.0, background: true } ], lsid: { id: UUID(\"422b9897-11e5-4b26-a553-91ad3de197d4\") },
$clusterTime: { clusterTime: Timestamp(1549455997, 1), signature: { hash: BinData(0, D5269B44D2608876A8068850665DE07E2DBA53BE), keyId: 6594319973974278147 } }, $db: \"config\" }

Comment by Jaume Valls Vila [ 06/Feb/19 ]

Hello Daniel/Misha,

a) Misha thats the error message we see inside mongos logs:

[LogicalSessionCacheRefresh] Sessions collection is not set up; waiting until next sessions refresh interval: Collection config.system.sessions is not sharded.

Commands result:

db.getSiblingDB('config').system.sessions.stats()['sharded'] --> FALSE
db.getSiblingDB('config').collections.find({_id: 'config.system.sessions'}) --> { "_id" : "config.system.sessions", "lastmodEpoch" : ObjectId("000000000000000000000000"), "lastmod" : ISODate("2019-02-01T13:18:10.537Z"), "dropped" : true }

b) mongos logs: all the errors we see are related to config.system.sessions is not sharded
c) config server primary: no errors, only messages about connections accepted. And sometimes this kind of messages:
[conn1201] command config.lockpings command: findAndModify....
[conn1534] command config.$cmd command: update ...

config.system.sessions should be sharded? All the problem comes from this collection not being sharded?

Comment by Danny Hatcher (Inactive) [ 05/Feb/19 ]

Hello emarketingsolutions,

To clarify, Misha's point A) please run the following commands against a mongos and provide the following outputs:

db.getSiblingDB('config').system.sessions.stats()['sharded']
db.getSiblingDB('config').collections.find({_id: 'config.system.sessions'})

Additionally, please provide the logs he mentioned in points B) and C).

Thank you,

Danny

Comment by Misha Tyulenev [ 05/Feb/19 ]

Hello, its apparently the same issue.
to look into this I will need to know if
a) check for "[LogicalSessionCacheRefresh] Sessions collection is not set up; waiting until next sessions refresh interval: Collection config.system.sessions is not sharded." messages in the mongos logs and please manually verify that config.system.sessions is correctly sharded
b) mongos logs
c) config server primary log

Comment by Jaume Valls Vila [ 05/Feb/19 ]

Hello,

We upgraded to version 4.0.5 and did some hight load testing. Yesterday the test went ok (1hour -> 1000000 inserts). Today we are repeating the same test but mongo stopped working.

We are able to connect to mongo from one mongos but when trying to throw "show dbs" we get again the same error message.

{
        "ok" : 0,
        "errmsg" : "cannot add session into the cache",
        "code" : 261,
        "codeName" : "TooManyLogicalSessions",
        "operationTime" : Timestamp(1549378009, 1),
        "$clusterTime" : {
                "clusterTime" : Timestamp(1549378009, 1),
                "signature" : {
                        "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
                        "keyId" : NumberLong(0)
                }
        }
} 

If we try to connect via the other two mongos we cannot even connect. We get the following message:

MongoDB shell version v4.0.5
connecting to: mongodb://127.0.0.1:27017/?gssapiServiceName=mongodb
2019-02-05T14:52:30.501+0000 E QUERY    [js] Error: network error while attempting to run command 'isMaster' on host '127.0.0.1:27017'  :
connect@src/mongo/shell/mongo.js:328:13
@(connect):1:6
exception: connect failed

If we reboot the mongos machines everything starts to work again.

Can you help us with this?

Thanks,
Jaume
 

Comment by Jaume Valls Vila [ 04/Feb/19 ]

Hello,

We updated the version and the issue has been solved.

Thank you very much,
Jaume

Comment by Danny Hatcher (Inactive) [ 31/Jan/19 ]

Hello thom@scatterblogs.com,

Can you please upload the log files from your node to our Secure Upload Portal?

Thank you,

Danny

Comment by Danny Hatcher (Inactive) [ 31/Jan/19 ]

Hello emarketingsolutions,

Could you please upgrade to 4.0.4, drop the system.sessions as mentioned in a previous comment, and let us know if you still see this issue?

Thank you,

Danny

Comment by Misha Tyulenev [ 31/Jan/19 ]

daniel.hatcher possibly the issue is fixed with SERVER-36964 ? The fix went into 4.0.4. The fix implements the separate check on if config.system.sessions collection to exist before setting it up and it significantly reduces lock contention.
Note - after upgrade please drop the collection to ensure its properly sharded and indexed. There is no harm in it when the system is idle as it will be recreated in 5 minutes.

Comment by Dennis Thom [ 31/Jan/19 ]

Hi, we just had the exact same error while heavy hitting our machine. However, we're using a standalone server (no sharding or replication) and access it with a Java client. We never saw this error with earlier MongoDB versions.

Comment by Danny Hatcher (Inactive) [ 28/Jan/19 ]

Hello Jaume,

Our apologies, we are still looking into this. We should have an update later this week.

Thank you for your patience,

Danny

Comment by Jaume Valls Vila [ 28/Jan/19 ]

Hello Daniel,

Do we have any update here?

Thanks in advance.
Jaume

Comment by Danny Hatcher (Inactive) [ 23/Jan/19 ]

Hello Jaume,

Thank you for the requested files. We are looking into them now.

Danny

Comment by Jaume Valls Vila [ 21/Jan/19 ]

Hello Danny,

The data has been uploaded in the portal you provided.

Please let me know if you need more information.

Regards,
Jaume

Comment by Danny Hatcher (Inactive) [ 18/Jan/19 ]

Hello emarketingsolutions,

We think the issue may still live within the "system.sessions" collection. Would it be possible to provide us a mongodump of your config database? I've generated a Secure Upload Portal for you to use; only MongoDB Engineers will be able to see any files uploaded. In addition to the mongodump, please provide updated logs from the Primary config and from the Primary of one of your shards.

Thank you,

Danny

Comment by Jaume Valls Vila [ 18/Jan/19 ]

Hello Danny,

Executing the following command from robomongo in app0 I've got this output.

 db.getSiblingDB('config').system.sessions.stats()

/* 1 */
{
    "ns" : "config.system.sessions",
    "size" : 0,
    "count" : 0,
    "storageSize" : 4096,
    "capped" : false,
    "wiredTiger" : {
        "metadata" : {
            "formatVersion" : 1
        },
        "creationString" : "access_pattern_hint=none,allocation_size=4KB,app_metadata=(formatVersion=1),assert=(commit_timestamp=none,read_timestamp=none),block_allocation=best,block_compressor=snappy,cache_resident=false,checksum=on,colgroups=,collator=,columns=,dictionary=0,encryption=(keyid=,name=),exclusive=false,extractor=,format=btree,huffman_key=,huffman_value=,ignore_in_memory_cache_size=false,immutable=false,internal_item_max=0,internal_key_max=0,internal_key_truncate=true,internal_page_max=4KB,key_format=q,key_gap=10,leaf_item_max=0,leaf_key_max=0,leaf_page_max=32KB,leaf_value_max=64MB,log=(enabled=false),lsm=(auto_throttle=true,bloom=true,bloom_bit_count=16,bloom_config=,bloom_hash_count=8,bloom_oldest=false,chunk_count_limit=0,chunk_max=5GB,chunk_size=10MB,merge_custom=(prefix=,start_generation=0,suffix=),merge_max=15,merge_min=0),memory_page_image_max=0,memory_page_max=10m,os_cache_dirty_max=0,os_cache_max=0,prefix_compression=false,prefix_compression_min=4,source=,split_deepen_min_child=0,split_deepen_per_child=0,split_pct=90,type=file,value_format=u",
        "type" : "file",
        "uri" : "statistics:table:collection-61--3049931131650357300",
        "LSM" : {
            "bloom filter false positives" : 0,
            "bloom filter hits" : 0,
            "bloom filter misses" : 0,
            "bloom filter pages evicted from cache" : 0,
            "bloom filter pages read into cache" : 0,
            "bloom filters in the LSM tree" : 0,
            "chunks in the LSM tree" : 0,
            "highest merge generation in the LSM tree" : 0,
            "queries that could have benefited from a Bloom filter that did not exist" : 0,
            "sleep for LSM checkpoint throttle" : 0,
            "sleep for LSM merge throttle" : 0,
            "total size of bloom filters" : 0
        },
        "block-manager" : {
            "allocations requiring file extension" : 0,
            "blocks allocated" : 0,
            "blocks freed" : 0,
            "checkpoint size" : 0,
            "file allocation unit size" : 4096,
            "file bytes available for reuse" : 0,
            "file magic number" : 120897,
            "file major version number" : 1,
            "file size in bytes" : 4096,
            "minor version number" : 0
        },
        "btree" : {
            "btree checkpoint generation" : 94958,
            "column-store fixed-size leaf pages" : 0,
            "column-store internal pages" : 0,
            "column-store variable-size RLE encoded values" : 0,
            "column-store variable-size deleted values" : 0,
            "column-store variable-size leaf pages" : 0,
            "fixed-record size" : 0,
            "maximum internal page key size" : 368,
            "maximum internal page size" : 4096,
            "maximum leaf page key size" : 2867,
            "maximum leaf page size" : 32768,
            "maximum leaf page value size" : 67108864,
            "maximum tree depth" : 0,
            "number of key/value pairs" : 0,
            "overflow pages" : 0,
            "pages rewritten by compaction" : 0,
            "row-store internal pages" : 0,
            "row-store leaf pages" : 0
        },
        "cache" : {
            "bytes currently in the cache" : 182,
            "bytes read into cache" : 0,
            "bytes written from cache" : 0,
            "checkpoint blocked page eviction" : 0,
            "data source pages selected for eviction unable to be evicted" : 0,
            "eviction walk passes of a file" : 0,
            "eviction walk target pages histogram - 0-9" : 0,
            "eviction walk target pages histogram - 10-31" : 0,
            "eviction walk target pages histogram - 128 and higher" : 0,
            "eviction walk target pages histogram - 32-63" : 0,
            "eviction walk target pages histogram - 64-128" : 0,
            "eviction walks abandoned" : 0,
            "eviction walks gave up because they restarted their walk twice" : 0,
            "eviction walks gave up because they saw too many pages and found no candidates" : 0,
            "eviction walks gave up because they saw too many pages and found too few candidates" : 0,
            "eviction walks reached end of tree" : 0,
            "eviction walks started from root of tree" : 0,
            "eviction walks started from saved location in tree" : 0,
            "hazard pointer blocked page eviction" : 0,
            "in-memory page passed criteria to be split" : 0,
            "in-memory page splits" : 0,
            "internal pages evicted" : 0,
            "internal pages split during eviction" : 0,
            "leaf pages split during eviction" : 0,
            "modified pages evicted" : 0,
            "overflow pages read into cache" : 0,
            "page split during eviction deepened the tree" : 0,
            "page written requiring cache overflow records" : 0,
            "pages read into cache" : 0,
            "pages read into cache after truncate" : 0,
            "pages read into cache after truncate in prepare state" : 0,
            "pages read into cache requiring cache overflow entries" : 0,
            "pages requested from the cache" : 0,
            "pages seen by eviction walk" : 0,
            "pages written from cache" : 0,
            "pages written requiring in-memory restoration" : 0,
            "tracked dirty bytes in the cache" : 0,
            "unmodified pages evicted" : 0
        },
        "cache_walk" : {
            "Average difference between current eviction generation when the page was last considered" : 0,
            "Average on-disk page image size seen" : 0,
            "Average time in cache for pages that have been visited by the eviction server" : 0,
            "Average time in cache for pages that have not been visited by the eviction server" : 0,
            "Clean pages currently in cache" : 0,
            "Current eviction generation" : 0,
            "Dirty pages currently in cache" : 0,
            "Entries in the root page" : 0,
            "Internal pages currently in cache" : 0,
            "Leaf pages currently in cache" : 0,
            "Maximum difference between current eviction generation when the page was last considered" : 0,
            "Maximum page size seen" : 0,
            "Minimum on-disk page image size seen" : 0,
            "Number of pages never visited by eviction server" : 0,
            "On-disk page image sizes smaller than a single allocation unit" : 0,
            "Pages created in memory and never written" : 0,
            "Pages currently queued for eviction" : 0,
            "Pages that could not be queued for eviction" : 0,
            "Refs skipped during cache traversal" : 0,
            "Size of the root page" : 0,
            "Total number of pages currently in cache" : 0
        },
        "compression" : {
            "compressed pages read" : 0,
            "compressed pages written" : 0,
            "page written failed to compress" : 0,
            "page written was too small to compress" : 0,
            "raw compression call failed, additional data available" : 0,
            "raw compression call failed, no additional data available" : 0,
            "raw compression call succeeded" : 0
        },
        "cursor" : {
            "bulk-loaded cursor-insert calls" : 0,
            "create calls" : 1,
            "cursor operation restarted" : 0,
            "cursor-insert key and value bytes inserted" : 0,
            "cursor-remove key bytes removed" : 0,
            "cursor-update value bytes updated" : 0,
            "cursors cached on close" : 0,
            "cursors reused from cache" : 0,
            "insert calls" : 0,
            "modify calls" : 0,
            "next calls" : 0,
            "prev calls" : 1,
            "remove calls" : 0,
            "reserve calls" : 0,
            "reset calls" : 2,
            "search calls" : 0,
            "search near calls" : 0,
            "truncate calls" : 0,
            "update calls" : 0
        },
        "reconciliation" : {
            "dictionary matches" : 0,
            "fast-path pages deleted" : 0,
            "internal page key bytes discarded using suffix compression" : 0,
            "internal page multi-block writes" : 0,
            "internal-page overflow keys" : 0,
            "leaf page key bytes discarded using prefix compression" : 0,
            "leaf page multi-block writes" : 0,
            "leaf-page overflow keys" : 0,
            "maximum blocks required for a page" : 0,
            "overflow values written" : 0,
            "page checksum matches" : 0,
            "page reconciliation calls" : 0,
            "page reconciliation calls for eviction" : 0,
            "pages deleted" : 0
        },
        "session" : {
            "cached cursor count" : 1,
            "object compaction" : 0,
            "open cursor count" : 0
        },
        "transaction" : {
            "update conflicts" : 0
        }
    },
    "nindexes" : 1,
    "totalIndexSize" : 4096,
    "indexSizes" : {
        "_id_" : 4096
    },
    "ok" : 1.0,
    "operationTime" : Timestamp(1547816798, 3)
}

Maybe this could help to find out the issue.

Form another side I've checked the function that throws the error:

Status LogicalSessionCacheImpl::_addToCache(LogicalSessionRecord record) {
    stdx::lock_guard<stdx::mutex> lk(_cacheMutex);
    if (_activeSessions.size() >= static_cast<size_t>(maxSessions)) {
        return {ErrorCodes::TooManyLogicalSessions, "cannot add session into the cache"};
    }
    _activeSessions.insert(std::make_pair(record.getId(), record));
    return Status::OK();
}

Is there a way to check the values of _activeSessions and static_cast<size_t>(maxSessions)) maybe this could be the solution.

Cheers,
Jaume

Comment by Jaume Valls Vila [ 18/Jan/19 ]

Hello Danny,

Until this morning the 3 mongos gave the same error when trying the command "show dbs". We did a connection to one of the mongos (app0) via robomongo (ssh tunnel + connection to mongo) and works fine (show dbs, db.getCollection('clk').find({}), db.getCollection('clk').count({})...).

We have connected via command line to mongo (app0) and all commands didn't work and suddenly start working again, but the other two mongos (app1, app2) continues giving the same error. Via app0 if we try our java application that uses java mongo driver (3.8.2) we have the same error.

 

Thanks,

Jaume 

Comment by Julien Durillon [ 17/Jan/19 ]

Hello,

I have a similar issue since I move my sharded cluster from 3.6 to 4.0. Except that when I try to drop the sessions collection, I get an "Unauthorized" error. my user has the root role, though.

Also, db.getSiblingDB('config').collections.find({_id: 'config.system.sessions'}) does not return me anything.

What I do get in the logs is a recurrent:

[LogicalSessionCacheRefresh] Sessions collection is not set up; waiting until next sessions refresh interval: Collection config.system.sessions is not sharded.

Comment by Jaume Valls Vila [ 17/Jan/19 ]

Hello Danny,

Attached you can find logs from primary and secondary config server.

The control message appears always in secondary (in mongos and in shards) and sometimes in primary config server.

Let me know if you need more information.

Regards,

Jaume

 

 

Comment by Danny Hatcher (Inactive) [ 17/Jan/19 ]

Hello Jaume,

Can you please provide the mongod log from the Primary config server covering the timeframe from the drop until now?

Thank you,

Danny

Comment by Jaume Valls Vila [ 17/Jan/19 ]

Hello Daniel,

The first command returned true.

replicasetconfig:PRIMARY> db.getSiblingDB('config').system.sessions.drop()
true

After waiting 5 minutes I've executed the second one in the mongos servers with the following output:

mongos> db.getSiblingDB('config').system.sessions.stats()
{
	"ok" : 0,
	"errmsg" : "cannot add session into the cache",
	"code" : 261,
	"codeName" : "TooManyLogicalSessions",
	"operationTime" : Timestamp(1547719583, 1),
	"$clusterTime" : {
		"clusterTime" : Timestamp(1547719583, 2),
		"signature" : {
			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
			"keyId" : NumberLong(0)
		}
	}
}

 
Thanks,
Jaume

Comment by Danny Hatcher (Inactive) [ 16/Jan/19 ]

Hello,

Can you please connect to the config server primary and run the following command?

db.getSiblingDB('config').system.sessions.drop()

Please wait at least 5 minutes after running the drop command and then run the following against the mongos and provide the results:

db.getSiblingDB('config').system.sessions.stats()

Thank you,

Danny

Comment by Jaume Valls Vila [ 16/Jan/19 ]

Hello Daniel,

I have the same output as described in the issue.

 

{
  "ok" : 0,
  "errmsg" : "cannot add session into the cache",
  "code" : 261,
  "codeName" : "TooManyLogicalSessions",
  "operationTime" : Timestamp(1547662891, 1),
  "$clusterTime" : {
    "clusterTime" : Timestamp(1547662899, 1),
    "signature" : {
      "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
      "keyId" : NumberLong(0)
    }
  }
}

 

Thank you

Jaume 

Comment by Danny Hatcher (Inactive) [ 16/Jan/19 ]

Hello,

Can you run the following against the mongos as well?

db.getSiblingDB('config').system.sessions.stats()

Thank you,

Danny

Comment by Jaume Valls Vila [ 16/Jan/19 ]

Hello Danny,

This cluster was recently setup, around November.

Once the cluster was setup the replica set name for the config servers and the replica set name for the shards was the same and generated errors. We had to reconfigure the name for the replica set of the config servers. With this change the cluster worked fine and we did some load test and worked correctly.

We didn't made any more test until yesterday, that we need the system to be on production. Everything worked ok until it suddenly stopped working. We did  "show dbs" command connecting to mongo via command line and we saw that error message. Then we saw inside the logs the CONTROL message. This message appears since one month ago inside the logs.

Output for command 1 is:

No result

Output for command 2 is:

{ "_id" : "config.system.sessions", "lastmodEpoch" : ObjectId("5be2f12d8e86d2e82d8120f9"), "lastmod" : ISODate("1970-02-19T17:02:47.296Z"), "dropped" : false, "key" : \\{ "_id" : 1 }, "unique" : false, "uuid" : UUID("6f90cc1a-2644-478b-a923-24b37f7a0eb3") }

 

Thanks

Comment by Danny Hatcher (Inactive) [ 16/Jan/19 ]

Hello,

Was this cluster either recently setup or recently upgraded? If so, can you please provide a brief timeline of those activities?

Regardless, please connect to one of your mongos processes, run the following two commands, and please provide the output:

db.getSiblingDB('config').system.sessions.stats()['sharded']
db.getSiblingDB('config').collections.find({_id: 'config.system.sessions'})

Thank you,

Danny

Generated at Thu Feb 08 04:50:51 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.