Uploaded image for project: 'Java Driver'
  1. Java Driver
  2. JAVA-5125

Some write concern errors are not retried according to spec

    • Type: Icon: Bug Bug
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Retryability
    • None
    • Hide

      1. What would you like to communicate to the user about this feature?
      2. Would you like the user to see examples of the syntax and/or executable code and its output?
      3. Which versions of the driver/connector does this apply to?

      Show
      1. What would you like to communicate to the user about this feature? 2. Would you like the user to see examples of the syntax and/or executable code and its output? 3. Which versions of the driver/connector does this apply to?

      JAVA-5124 was opened because the legacy retryable writes test suite was not being run against sharded clusters. After that issue was addressed it was observed that tests of write concern error retryability fail on 4.2 and 4.4 sharded clusters. The tests that fail mostly have description ending with "succeeds after WriteConcernError ShutdownInProgress", although there are a few other failures as well.

      While attempting to debug these test failures, the following observations were made:

      1. Clusters starting with 5.0 include a RetryableWriteError error label, which is likely what's causing the driver to take a different code path and succeed on newer server releases
      2. Although the failpoint in the tests specify that error code 91 be returned, 4.2 and 4.4 sharded cluster actually return error code 6. Moreover, there is specific branching code in the driver relating to error code 91.
      3. A 4.2 replica set also does not include the RetryableWriteError error label, but it does return the expected error code of 91. The test succeeds on a 4.2 replica set.

      This looks like a bug. The reason that the tests pass in the other variants is due to the specific error code used in the test: 91. Because of this, ProtocolHelper#createSpecialException returns a MongoNodeIsRecoveringException} which is then thrown. But with error code 6, {{ProtocolHelper#createSpecialException returns null, no exception is thrown, and the retry logic doesn't execute. So the fact that 4.2 and 4.4 sharded clusters, for some reason, are incorrectly returning error code 6 instead of 91 due to the failpoint, the bug is exposed. The retryable writes spec clearly intends that error code 6 in a write concern error should cause a RetryableWriteError to be added, yet it is not. It's simple to demonstrate this by just changing the 91 to 6 in the tests, and they will fail in all configurations.

      The plan for JAVA-5124 is to enable these tests for sharded clusters, but modify the test files to exclude them for server releases below 5.0. Once this issue is addressed, the tests can go back to the way they are in the original specification.

            Assignee:
            Unassigned Unassigned
            Reporter:
            jeff.yemin@mongodb.com Jeffrey Yemin
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: