[DRIVERS-2194] Clarify the order between server monitor events and connection pool events Created: 07/Feb/22  Updated: 31/Mar/22

Status: Backlog
Project: Drivers
Component/s: SDAM
Fix Version/s: None

Type: Spec Change Priority: Unknown
Reporter: Valentin Kavalenka Assignee: Unassigned
Resolution: Unresolved Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by JAVA-4484 Enable ServerDiscoveryAndMonitoringPr... Blocked
Problem/Incident
Related
is related to JAVA-4483 Fix and disable ServerDiscoveryAndMon... Closed
Driver Changes: Needed

 Description   

This SDAM prose test implies the following order of events:

  1. step 2: ServerHeartbeatSucceededEvent happens before ConnectionPoolReadyEvent caused by it
  2. steps 4 and 5: ServerHeartbeatFailedEvent happens before ConnectionPoolReadyEvent caused by a successful check that happened after the check that caused ServerHeartbeatFailedEvent.

However, neither SDAM more CMAP specs require any such ordering. Java driver recently recently started delivering server monitor events asynchronously (in a dedicated thread, together with server and cluster events). This change affected the aforementioned SDAM prose test: it started failing sporadically, and we disabled it. If the order implied by the test is indeed what the specifications require, the order should be explicitly and unambiguously specified either by SDAM or by CMAP specs.



 Comments   
Comment by Shane Harvey [ 24/Feb/22 ]

I don't think we should mandate async/sync event emission or ordering just for this spec test. I'm fine with tweaking the test to suggest that the order should be checked only if possible to allow Java to implement it strictly.

Comment by Valentin Kavalenka [ 08/Feb/22 ]

it seems like a limitation of the Java driver that events aren't guaranteed to be emitted in the order that they occur

The reason this happened is that neither SDAM nor CMAP specs specify any order relations between SDAM and CMAP events. I.e., the recent change in the Java driver did not violate driver specs, yet it made the aforementioned prose test flaky, and we discovered that it fails, and found the reason only after some time since the change was made. To reduce the chances of situations like this one happening, we need to express the order relation requirements in the specs, instead of letting implementors to derive the implied requirements from a single test that fails occasionally if the requirements are not met.

Are there objections to the idea of explicitly expressing in the driver spec the implied requirements?

can you think of a way to rewrite the prose test so that it doesn't rely on undefined behavior?

If it is decided that there are no order relations between SDAM and CMAP events, then the test can be rephrased to check the order of SDAM events (ServerHeartbeatSucceededEvent -> ServerHeartbeatFailedEvent -> ServerHeartbeatSucceededEvent) and the order of CMAP events (ConnectionPoolReadyEvent -> ConnectionPoolClearedEvent -> ConnectionPoolReadyEvent) without checking order across the SDAM/CMAP boundary. This is stricter than just checking that events occurred.

Comment by Patrick Freed [ 08/Feb/22 ]

The test's goal is to verify the causal relationship between successful heartbeat checks and pools being marked as ready, so if we remove the ordering requirement it loses some of its purpose. I also think that, from a debugging user's perspective, the distinction between families of events probably isn't really significant (e.g. CMAP vs SDAM events), but the ordering of them is, especially if there's a causal relationship between two of them.

All that being said, it's not really that important, so if it would be a lot of work for Java to meet this requirement, it may suffice to just assert that the events were seen. But again, in my opinion, it seems like a limitation of the Java driver that events aren't guaranteed to be emitted in the order that they occur, and that it may present further challenges testing or debugging other features down the road.

Comment by Shane Harvey [ 08/Feb/22 ]

Thanks Valentin. My preference would be to avoid mandating an order and instead change or remove the prose test if possible.

patrick.freed, can you think of a way to rewrite the prose test so that it doesn't rely on undefined behavior?

Comment by Valentin Kavalenka [ 08/Feb/22 ]

Currently, my understanding is as follows. The most straightforward way is to start delivering CMAP events in the same thread that delivers SDAM events. However, depending on implementations of user-supplied ConnectionPoolListeners, this may become a performance bottleneck as the rate of ConnectionCheckedOutEvent/ConnectionCheckedInEvent events for the whole MongoClient is not lower than the rate of user operations, which may be quite high. This means that we may decide to deliver a subset of CMAP events in the same thread that delivers SDAM events, e.g., only ConnectionPoolReadyEvent and ConnectionPoolClearedEvent, while delivering the rest of events in the same thread that raised them, i.e., synchronously (this is how all CMAP events are delivered currently). If we choose this path, the subset of CMAP events delivered in the same thread as SDAM events will depend on the order relations between CMAP and SDAM events.

It is also possible to not specify any order relations between CMAP and SDAM events, but then we need to update the prose test mentioned in the description (this test appears to be the only place that depends on such order relations).

Comment by Shane Harvey [ 08/Feb/22 ]

If we changed the spec to mandate this order of events, how would Java implement that requirement? The answer could impact our decision.

Comment by Valentin Kavalenka [ 08/Feb/22 ]

expose the creation time of each event? Then the events could be sorted to the correct order regardless of async or sync publishing

Non-logical timestamps cannot be used to impose/derive order between actions under the Java memory model, even if they come from a monotonic clock. The same must be true for all other driver languages/runtimes except maybe (I have no idea) PHP, Node.js, Ruby, Python.

Anyway, what you are talking, Shane, is an implementation detail, while this ticket is about the specification requirements. For example, SDAM tries to express ordering constraints on its events here, but there is no place that expresses order relations between SDAM and CMAP events.

Comment by Shane Harvey [ 08/Feb/22 ]

It would surely be unexpected for me to see a ConnectionPoolReadyEvent before a ServerHeartbeatSucceededEvent since the former requires the later. Could the Java driver expose the creation time of each event? Then the events could be sorted to the correct order regardless of async or sync publishing.

Generated at Thu Feb 08 08:24:58 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.