[DRIVERS-2329] Require monitors to signal that the topology changed after every check Created: 16/May/22  Updated: 27/Jun/22  Resolved: 27/Jun/22

Status: Closed
Project: Drivers
Component/s: SDAM
Fix Version/s: None

Type: Spec Change Priority: Unknown
Reporter: Patrick Freed Assignee: Neal Beeken
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Driver Changes: Not Needed

 Description   

Summary

The server monitoring spec currently allows monitors to leave operations blocked in server selection after a check if the check did not change the topology. If drivers actually implement this, it can lead to undesirable behavior though:

  • Server A goes down with error
  • Operation comes in, fails server selection, requests a new check
  • Monitor for server A performs a check
  • Check fails with same error, doesn't notify topology changes
  • Operation remains blocked in server selection for heartbeatFrequencyMS or server selection timeout

If monitors were to signal that the topology failed after every check, the operation would be unblocked, fail server selection again, and request a new check, allowing the topology to be rediscovered more quickly once it comes back up.

Note that the requested requirement is already implicitly required to pass the test in DRIVERS-1251.

Motivation

Who is the affected end user?

End users.

How does this affect the end user?

Operations can be blocked for a long time in the event of failover.

How likely is it that this problem or use case will occur?

If a driver does not adhere to the above requirement, it will happen any time failover occurs.

If the problem does occur, what are the consequences and how severe are they?

In actual production cases where lots of operations are being fired off, it probably doesn't affect the user in practice. For use cases where only a few operations are being executed, it could cause blocking for 10s of seconds.

Is this issue urgent?

It's possible that no drivers are affected by this, since otherwise users would likely complain. It's still good to update the spec so drivers don't accidentally have this issue.

Is this ticket required by a downstream team?

No

Is this ticket only for tests?

No



 Comments   
Comment by Githook User [ 27/Jun/22 ]

Author:

{'name': 'Neal Beeken', 'email': 'neal.beeken@mongodb.com', 'username': 'nbbeeken'}

Message: DRIVERS-2329: remove optimization mention that no longer applies (#1263)
Branch: master
https://github.com/mongodb/specifications/commit/3fea6cb177f20ce8426a1f00c852f297b20a96b1

Comment by Shane Harvey [ 19/May/22 ]

Is this essentially the same issue as DRIVERS-830? The spec already requires drivers to update the TopologyDescription on every check even if the ServerDescriptions are equal.

Comment by Patrick Freed [ 17/May/22 ]

Yeah, in that case there wouldn't be any need--I mostly had the polling protocol in mind with this ticket. That said, notifying the topology has changed when it actually hasn't should be low-cost, whereas not waking up sleeping operations for longer than necessary can be high cost, so maybe its worth the simplification.

This ticket does still affect the streaming protocol, though only checks that create new monitoring connections, since those only happen every heartbeatFrequencyMS unless another check is requested.

Comment by Jeffrey Yemin [ 17/May/22 ]

patrick.freed@mongodb.com, what if the check succeeds but the topology version is stale? Currently the spec says that these sort of responses should be discarded without notifying.

Generated at Thu Feb 08 08:25:18 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.