[DRIVERS-2218] Clarify intended bounding of implicit session allocation and rework flaky prose test Created: 25/Feb/22 Updated: 25/Jul/22 |
|
| Status: | Implementing |
| Project: | Drivers |
| Component/s: | Sessions |
| Fix Version/s: | None |
| Type: | Spec Change | Priority: | Unknown |
| Reporter: | Shane Harvey | Assignee: | Neal Beeken |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | prose-test | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Driver Changes: | Needed | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Downstream Changes Summary: | Review the updated clarification for spec test 13 and ensure your driver is aligned with the test's assertion requirements. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Driver Compliance: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
The prose test added in
The issue is that the session is checked back into the pool after the connection is checked back in. So there's always the risk that this test uses >1 server session. The only way to guarantee this test is to also optimize implicit session deallocation as well. However, that is potentially trickier since we would need to ensure that retryable reads/writes, cursors, and other pinned operations don't deallocate the session too early. Acceptance Criteria
|
| Comments |
| Comment by Githook User [ 24/Mar/22 ] |
|
Author: {'name': 'Neal Beeken', 'email': 'neal.beeken@mongodb.com', 'username': 'nbbeeken'}Message: DRIVERS-2218: Clarify intended bounding of implicit session allocation and rework flaky prose test (#1157) |
| Comment by Neal Beeken [ 02/Mar/22 ] |
|
I still think there's no issue and test value with the prose test as it is the test does confirm what is asked: "This limits the number of implicit sessions to never exceed the maximum connection pool size." If we want to allow drivers to adopt the current requirement of ensuring the order of operations is always: checkout, acquire (operation), release, checkin, then that is what asserting the same LSID for every operation in a pool size of 1 should accomplish. Although there is a gap, where it says you should run a find, but doesn't clarify that it should be a small query. I haven't seen much precedent for that in other tests, but nonetheless should be called out. To Daria's summary (thanks for that!) I think #5 is what we really want to get at, we need to at least assert that drivers have changed their session acquire logic to permit operations to run instead of running into a TooManyLogicalSessions error before proceeding. Unless you are suggesting we should revisit the spec to explicitly disallow attempting to release the session before checkin completes? We could consider that to make this change more uniform. (In reference to Shane's statement "I would advise drivers not try to optimize implicit session deallocation") |
| Comment by Shane Harvey [ 02/Mar/22 ] |
|
Sure we can do that in this ticket as long as we also fix the issue in the prose test. |
| Comment by Daria Pardue [ 02/Mar/22 ] |
|
shane.harvey All the more reason to update the language. Would you be opposed to repurposing this ticket for that update to the language? We can then send it to leads triage since we probably want the clarification sooner rather than later, given the teams are in the process of implementing this. |
| Comment by Shane Harvey [ 01/Mar/22 ] |
|
Yeah that's true. It's also worth mentioning that sentence is incorrect regardless since there are other factors that will create more than maxPoolSize sessions, multi-batch cursors for example. Even ignoring cursors, clients that run operations on multiple nodes (eg sharded clusters or non-primary read preferences) can create maxPoolSize * # hosts connections. |
| Comment by Shane Harvey [ 01/Mar/22 ] |
|
Great summary daria.pardue. However, I would advise drivers not try to optimize implicit session deallocation without a benchmark showing that such behavior is required. I believe it would be too risky given the complexity around retryable reads/writes, cursors, and other pinned operations. My understanding is that simply delaying the allocation provides nearly all of the benefit we need to solve the original issue in |
| Comment by Shane Harvey [ 28/Feb/22 ] |
|
One solution for this might be to acknowledge the race in the test description and update the test to be more like:
Another solution might be to repeat the existing test X times until the assertion that "all commands contain the same lsid" holds true. |
| Comment by Shane Harvey [ 28/Feb/22 ] |
|
The test needs to be concurrent because
|
| Comment by Robert Stam [ 28/Feb/22 ] |
|
Why does this test have to be concurrent? Wouldn't testing the operations sequentially (which is what I first assumed it would be doing until I read more carefully) be sufficient to test that all operations use the same `lsid`? |
| Comment by Robert Stam [ 28/Feb/22 ] |
|
I just realized that the test says the operations should be concurrent. Are you referring to a concurrency race condition? |
| Comment by Robert Stam [ 28/Feb/22 ] |
|
> The issue is that the session is checked back into the pool after the connection is checked back in I don't understand the issue. Yes the session might be checked back in after the connection, but as long as it is checked in before the next implicit session is created we should be fine. |