Uploaded image for project: 'Drivers'
  1. Drivers
  2. DRIVERS-2798

Gossiping the cluster time from monitoring connections can result in loss of availability

    XMLWordPrintableJSON

Details

    • Icon: Improvement Improvement
    • Resolution: Unresolved
    • Icon: Major - P3 Major - P3
    • None
    • Sessions
    • Needed

    Description

      Summary

      In unusual situations, gossiping the cluster time received on monitoring connections results in complete loss of availability and requires an application restart. The problem was traced to a temporary state during which the driver attempts to connect to a member of the wrong replica set running on the same pod.  Since cluster times between deployments are not compatible, it results in all operations failing until the application is restarted.

      Motivation

      Who is the affected end user?

      We only have one report of this, in JAVA-5256.  Please see that ticket for details, as they are quite involved.

      How does this affect the end user?

      Availability is completely compromised and an application restart is required.

      How likely is it that this problem or use case will occur?

      It's certainly unusual, as we have not heard other reports of this from people using our Kubernetes operator.  On the other hand, the fix is likely simple for most drivers, though testing is an issue (there are probably no tests of the existing behavior)

      If the problem does occur, what are the consequences and how severe are they?

      Complete loss of availability to the desired cluster.

      Is this issue urgent?

      The user has no simple workaround, but it is possible to work around

      Is this ticket required by a downstream team?

      No

      Is this ticket only for tests?

      No

      Acceptance Criteria

      The requirement is for a clarification to the sessions specification, saying that cluster time gossiping should be limited to pooled connections and should not include monitoring connections.  It's unclear though how a test could be written.  In a POC of this in the Java driver, it was achieved by a simple design change that made it impossible to gossip the cluster time for monitoring connections, but it's certainly possible that a future design change could reverse that and the issue could be re-introduced.

      Additional Notes

      Gossiping of cluster time has been a bit of a mystery to many driver engineers, as the specification contains no rationale for it. Discussions with server engineers recently have revealed the following justification:

      • In a sharded cluster, each shard has an independent monotonically increasing logical clock
      • Every write on the shard includes the current logical clock time
      • The gossiping pushes the logical clock forward to just past the gossiped time
      • This means that a client thread that does a write that targets shard A, then a subsequent write to shard B, will result in the second write having a later time than the first write
      • This in turn means that the first write will precede the second write in various operations which create a total ordering of write operations. A change stream is the primary example.

      Since monitoring connections are never used for writes, there is no benefit to gossiping cluster times from those connections

      Attachments

        Activity

          People

            Unassigned Unassigned
            jeff.yemin@mongodb.com Jeffrey Yemin
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: