kNotIdempotent retry criteria may be too permissive

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Catalog and Routing
    • ALL
    • 2
    • None
    • 3
    • TBD
    • 🟩 Routing and Topology
    • None
    • None
    • None
    • None
    • None
    • None

      The RetryPolicy for non-idempotent operations will retry on any NotPrimary error, assuming that if the remote was not a primary no write could have occurred. This is a bit concerning to me for a few reasons:

      • The NotPrimaryError category is documented as not being sufficient to determine if a write has actually occurred, and that individual codes must be inspected to determine that.
      • not all non-idempotent operations are writes. e.g. if a getMore receives an error like "InterruptedDueToReplStateChange" (assuming it can), are we sure the server-side cursor metadata wasn't modified? This case seems unlikely to cause an issue in practice, but is worth considering.

       
      In light of this, we should consider being less permissive with the kNotIdempotent retry policy by default and only retry cases that are truly safe.

              Assignee:
              Unassigned
              Reporter:
              Patrick Freed
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: