[DRIVERS-1577] Workarounds for killAllSessions in unified test runner Created: 22/Feb/21  Updated: 06/May/22

Status: Implementing
Project: Drivers
Component/s: Atlas Testing, Unified Test Runner
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Oleg Pudeyev (Inactive) Assignee: Jeremy Mikola
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Issue split
split to CDRIVER-3950 Workarounds for killAllSessions in un... Backlog
split to CXX-2222 Workarounds for killAllSessions in un... Backlog
split to CSHARP-3534 Workarounds for killAllSessions in un... Closed
split to GODRIVER-1949 Workarounds for killAllSessions in un... Closed
split to JAVA-4097 Workarounds for killAllSessions in un... Closed
split to MOTOR-704 Workarounds for killAllSessions in un... Closed
split to NODE-3189 Workarounds for killAllSessions in un... Closed
split to PHPLIB-640 Workarounds for killAllSessions in un... Closed
split to PYTHON-2636 Workarounds for killAllSessions in un... Closed
split to RUBY-2574 Ignore Unauthorized error when execut... Closed
split to RUST-733 Workarounds for killAllSessions in un... Closed
Related
related to SERVER-54216 killAllSessions produces authorizatio... Closed
Epic Link: DRIVERS-828
Driver Changes: Needed
Quarter: FY22Q3
Driver Compliance:
Key Status/Resolution FixVersion
CDRIVER-3950 Backlog
CXX-2222 Backlog
CSHARP-3534 Fixed 2.13.0
GODRIVER-1949 Fixed 1.6.0
JAVA-4097 Won't Fix
NODE-3189 Gone away
MOTOR-704 Duplicate
PYTHON-2636 Done
PHPLIB-640 Fixed 1.9.0
RUBY-2574 Fixed 2.15.0
RUST-733 Duplicate
SWIFT-1176 Won't Do

 Description   

Action for Drivers

The unified test format spec clarifies additional error codes that may be ignored when executing killAllSessions to work around SERVER-54216 and Atlas prohibiting usage of the command. Additionally, it suggests several approaches to disable killAllSessions behavior for Atlas (e.g. detecting "mongodb.net" hostnames, implementing a configuration option for the test runner).

Original Description

Unified test format currently contains the recommendation to kill all sessions before every test:

Terminating Open Transactions

Open transactions can cause tests to block indiscriminately. Test runners SHOULD terminate all open transactions at the start of a test suite and after each failed test by killing all sessions in the cluster. Using the internal MongoClient, execute the killAllSessions command on either the primary or, if connected to a sharded cluster, all mongos servers.

killAllSessions command does not work in Atlas (HELP-22193). The closest workaround is killing the user's own sessions, but this currently doesn't work due to server bug (SERVER-54216).

Until SERVER-54216 is fixed, unified test runners should provide an option to not run killAllSessions at all, and the workload executor should request this option when instantiating the test runner.

When SERVER-54216 is fixed, unified test runners may switch to killing only sessions of their user, at which point the option to not kill sessions may be removed, OR outside of Atlas the test runners could continue killing all sessions but for Atlas testing the test runners would only kill their own sessions.

Note that killing sessions is not strictly necessary in the test runner unless transactions are actually being used, and currently Atlas planned maintenance tests do not use transactions.



 Comments   
Comment by Githook User [ 09/Apr/21 ]

Author:

{'name': 'Jeremy Mikola', 'email': 'jmikola@gmail.com', 'username': 'jmikola'}

Message: DRIVERS-1577 Note caveats for killAllSessions (#952)
Branch: master
https://github.com/mongodb/specifications/commit/1b4ce24071065b9c00263939dbb7e9524147c068

Comment by Jeremy Mikola [ 08/Apr/21 ]

https://github.com/mongodb/specifications/pull/952

Comment by Jeremy Mikola [ 06/Apr/21 ]

Follow-up: andreas.braun and I spoke about this earlier today and agreed to proceed with amending the unified test runner spec to detect an Atlas connection string and disable killAllSessions accordingly.

Comment by Jeremy Mikola [ 02/Apr/21 ]

oleg.pudeyev, andreas.braun: If I understand correctly, this isn't just an issue with Comprehensive Atlas Testing (DRIVERS-828), but also applies to any testing against an Atlas cluster. For DRIVERS-828, the workload executor is a logical place to configure the test runner (e.g. invoke a disableKillAllSessions() method or otherwise pass an option when instantiating the test runner). But I'd worry a driver's general test suite might not have a place to do so if the test suite is simply invoked with an arbitrary connection string.

Would it make more sense to just require the unified test runner to inspect the connection string before it decides to invoke killAllSessions and NOP if it sees an Atlas hosts? I think that could be done by checking for "mongodb.net" in the hostname.

Comment by Jeremy Mikola [ 30/Mar/21 ]

Just spoke with andreas.braun about this today. We agreed that test runners would do well to implement some internal API that allows for disabling killAllSessions behavior. This was done for PHPLIB-613 and presumably similar to what Ruby and Java had to do in their workload executors.

Beyond that, we can explore the other two ideas Andreas had above about lazily calling killAllSessions for tests that utilize sessions (before and after the tests run) and allowing killAllSessions to fail and ignore errors. Note that the first suggestion might not work for all cases since dropCollection may block on a previous lingering transaction – and in that case, the current test that would block might not utilize sessions at all so the "lazy" check might not trigger.

Comment by Jeremy Mikola [ 05/Mar/21 ]

To confirm, the current language in the unified test format spec was lifted from the transactions test README. I believe that the reason the transaction spec test runner instructed calling killAllSessions before the test suite was because we don't know what state the server is in before starting the test suite, and killing all sessions ensures we have a blank slate.

That may not be relevant in the case of Atlas driver testing where we're running workloads (which presently don't even use transactions) on new clusters.

One idea andreas.braun and I discussed was shifting the responsibility of killing sessions to the end of each test when the entity map and its resources are destroyed; however, that doesn't address the issue of inheriting dirty state if the unified tests are run as part of a larger test suite. I asked in SERVER-54216 if that bug also applies to killAllSessionsByPattern when using LSID patterns (not user criteria). If not, I would propose that the test runner collects LSIDs of all sessions in the entity map at the end of each test and execute a single killAllSessionsByPattern command to kill them individually.

Andreas had a few other suggestions:

  • Have the test runner lazily call killAllSessions when the first session entity in a test is created.
  • Allow killAllSessions to fail and ignore any error (covers the case of Atlas prohibiting the command and the bug in SERVER-54216). This is not mutually exclusive with the first suggestion.

Lastly, I opened mongodb/docs#5062 and reported the issue with the current server docs conflicting with the bug in SERVER-54216.

Generated at Thu Feb 08 08:23:39 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.