[SERVER-37511] Logical session reaper and refresh threads should set up sessions collection immediately Created: 08/Oct/18  Updated: 29/Oct/23  Resolved: 23/Oct/18

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 3.6.10, 4.0.5, 4.1.5

Type: Bug Priority: Major - P3
Reporter: William Schultz (Inactive) Assignee: Blake Oler
Resolution: Fixed Votes: 0
Labels: sharding-wfbf-day
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
related to SERVER-36921 Transaction lock timeout errors when ... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.0, v3.6
Sprint: Sharding 2018-11-05
Participants:
Linked BF Score: 18

 Description   

The logical session cache reaper thread will initially set up the sessions collection. As a part of this, it runs createIndexes which takes a database exclusive lock. Currently, if it runs before the replication system has been set up (and its config initialized), it will wait an entire refresh interval before setting up the collection. This can cause the setup to occur at some later point while the database is up and running, and has the potential to cause transaction aborts due to lock timeouts.



 Comments   
Comment by Githook User [ 28/Dec/18 ]

Author:

{'username': 'BlakeIsBlake', 'email': 'blake.oler@mongodb.com', 'name': 'Blake Oler'}

Message: SERVER-37511 Ensure sessions collection is created in replica set fixture

(cherry picked from commit 6c5d1761688ea0c8e13fe62afb3574b5326ae9e6)
Branch: v3.6
https://github.com/mongodb/mongo/commit/309a5706ad18c1026cdf4e8b2ae2c4c7b00a4868

Comment by Githook User [ 07/Dec/18 ]

Author:

{'name': 'Blake Oler', 'email': 'blake.oler@mongodb.com', 'username': 'BlakeIsBlake'}

Message: SERVER-37511 Ensure sessions collection is created in replica set fixture

(cherry picked from commit 6c5d1761688ea0c8e13fe62afb3574b5326ae9e6)
Branch: v4.0
https://github.com/mongodb/mongo/commit/4aecec9cc78209b12ba12bdec9a22fa3daad1777

Comment by Githook User [ 23/Oct/18 ]

Author:

{'name': 'Blake Oler', 'email': 'blake.oler@mongodb.com', 'username': 'BlakeIsBlake'}

Message: SERVER-37511 Ensure sessions collection is created in replica set fixture
Branch: master
https://github.com/mongodb/mongo/commit/6c5d1761688ea0c8e13fe62afb3574b5326ae9e6

Comment by Misha Tyulenev [ 19/Oct/18 ]

As discussed offline the fix will be done in the replication test to wait for config.system.sessions collection to be created before starting the test.

Comment by Misha Tyulenev [ 08/Oct/18 ]

william.schultz " My understanding is that the config.system.sessions collection only needs to be created once, and from then on it is only necessary to do document level operations on the collection to maintain its correct state." is not correct.
It should be possible to build config.system.sessions collection at any time - we have help tickets e.g. https://jira.mongodb.org/browse/HELP-7707?focusedCommentId=2018462&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-2018462
that provide advice to customers to use the auto-healing property of the sessions collection.
This is due to the bugs in the original implementation.

Comment by William Schultz (Inactive) [ 08/Oct/18 ]

As an additional note, there was previously an issue with the reaper thread periodically aborting running transactions, due to the strength of locks acquired when running the createIndexes command. blake.oler pointed out that this should be fixed now, in 4.0 and 4.2 by SERVER-36961 i.e. now the reaper only acquires strong locks during its initial run.

Comment by William Schultz (Inactive) [ 08/Oct/18 ]

There is nothing testing specific about the root issue described here. The issue happens to cause spurious failures in our tests that expect transactions to run and commit successfully. If a transaction encounters a failure due to a lock timeout, then it fails the test. In a real system, I am less sure whether we would consider this a significant "bug". It is already the case that running DDL operations of any kind increases the possibility of lock conflicts with concurrent transactions. That is a fundamental aspect of the system that is known, and is not considered a bug. It seems that the worst that would happen here is that a user that is running transactions encounters a bunch of lock conflicts when the reaper thread chooses to run for the first time. Every subsequent run of the reaper thread, though, shouldn't produce any issues. My understanding is that the config.system.sessions collection only needs to be created once, and from then on it is only necessary to do document level operations on the collection to maintain its correct state. It seems sensible that the reaper thread would be forced to create the collection and indexes at startup, instead of waiting a refresh interval, but I don't think it's necessarily a serious problem. I'll leave that decision up to misha.tyulenev and blake.oler since you have been working on this recently.

To be clear, I do not think that blacklisting our transactions tests is the way to fix the test failures. We should want coverage of the interaction between the logical session reaper and transactions.

Comment by Misha Tyulenev [ 08/Oct/18 ]

william.schultz please clarify if this scenario occurs during testing only?

Comment by William Schultz (Inactive) [ 08/Oct/18 ]

This has shown up as an issue in our test infrastructure, but may also be a general issue worth fixing.

Generated at Thu Feb 08 04:46:12 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.