[SERVER-37511] Logical session reaper and refresh threads should set up sessions collection immediately Created: 08/Oct/18 Updated: 29/Oct/23 Resolved: 23/Oct/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | 3.6.10, 4.0.5, 4.1.5 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | William Schultz (Inactive) | Assignee: | Blake Oler |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | sharding-wfbf-day | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Backport Requested: |
v4.0, v3.6
|
||||||||||||||||
| Sprint: | Sharding 2018-11-05 | ||||||||||||||||
| Participants: | |||||||||||||||||
| Linked BF Score: | 18 | ||||||||||||||||
| Description |
|
The logical session cache reaper thread will initially set up the sessions collection. As a part of this, it runs createIndexes which takes a database exclusive lock. Currently, if it runs before the replication system has been set up (and its config initialized), it will wait an entire refresh interval before setting up the collection. This can cause the setup to occur at some later point while the database is up and running, and has the potential to cause transaction aborts due to lock timeouts. |
| Comments |
| Comment by Githook User [ 28/Dec/18 ] |
|
Author: {'username': 'BlakeIsBlake', 'email': 'blake.oler@mongodb.com', 'name': 'Blake Oler'}Message: (cherry picked from commit 6c5d1761688ea0c8e13fe62afb3574b5326ae9e6) |
| Comment by Githook User [ 07/Dec/18 ] |
|
Author: {'name': 'Blake Oler', 'email': 'blake.oler@mongodb.com', 'username': 'BlakeIsBlake'}Message: (cherry picked from commit 6c5d1761688ea0c8e13fe62afb3574b5326ae9e6) |
| Comment by Githook User [ 23/Oct/18 ] |
|
Author: {'name': 'Blake Oler', 'email': 'blake.oler@mongodb.com', 'username': 'BlakeIsBlake'}Message: |
| Comment by Misha Tyulenev [ 19/Oct/18 ] |
|
As discussed offline the fix will be done in the replication test to wait for config.system.sessions collection to be created before starting the test. |
| Comment by Misha Tyulenev [ 08/Oct/18 ] |
|
william.schultz " My understanding is that the config.system.sessions collection only needs to be created once, and from then on it is only necessary to do document level operations on the collection to maintain its correct state." is not correct. |
| Comment by William Schultz (Inactive) [ 08/Oct/18 ] |
|
As an additional note, there was previously an issue with the reaper thread periodically aborting running transactions, due to the strength of locks acquired when running the createIndexes command. blake.oler pointed out that this should be fixed now, in 4.0 and 4.2 by |
| Comment by William Schultz (Inactive) [ 08/Oct/18 ] |
|
There is nothing testing specific about the root issue described here. The issue happens to cause spurious failures in our tests that expect transactions to run and commit successfully. If a transaction encounters a failure due to a lock timeout, then it fails the test. In a real system, I am less sure whether we would consider this a significant "bug". It is already the case that running DDL operations of any kind increases the possibility of lock conflicts with concurrent transactions. That is a fundamental aspect of the system that is known, and is not considered a bug. It seems that the worst that would happen here is that a user that is running transactions encounters a bunch of lock conflicts when the reaper thread chooses to run for the first time. Every subsequent run of the reaper thread, though, shouldn't produce any issues. My understanding is that the config.system.sessions collection only needs to be created once, and from then on it is only necessary to do document level operations on the collection to maintain its correct state. It seems sensible that the reaper thread would be forced to create the collection and indexes at startup, instead of waiting a refresh interval, but I don't think it's necessarily a serious problem. I'll leave that decision up to misha.tyulenev and blake.oler since you have been working on this recently. To be clear, I do not think that blacklisting our transactions tests is the way to fix the test failures. We should want coverage of the interaction between the logical session reaper and transactions. |
| Comment by Misha Tyulenev [ 08/Oct/18 ] |
|
william.schultz please clarify if this scenario occurs during testing only? |
| Comment by William Schultz (Inactive) [ 08/Oct/18 ] |
|
This has shown up as an issue in our test infrastructure, but may also be a general issue worth fixing. |