Uploaded image for project: 'Go Driver'
  1. Go Driver
  2. GODRIVER-1966

Strange delays with failCommand in v1.4.0+

      We see delays when we upgrade our mongo driver from `v1.3.2` to `v1.4.0`.

      Attached is the `main.go` used to generate the following output.

      It will connect a client, prepare the test space, then perform 3 read (`find`) queries and 3 write (`insert`) queries with and without the failCommand enabled. Below is a snippet of the `find` with failCommand enabled.

       

      Output with `v1.3.2`:

      ```

      == Testing Find queries failpoint ENABLED ==

      • Time elapsed to do: connect client: 38.833µs
      • Time elapsed to do: clear failpoint: 7.497043ms
      • Time elapsed to do: populate table: 995.285µs
      • Time elapsed to do: set failpoint: 890.543µs
      • Time elapsed to do: find query 0: 502.632805ms
        Encountered expected error: (mongo.CommandError): (NotWritablePrimary) Failing command due to 'failCommand' failpoint
      • Time elapsed to do: find query 1: 998.847787ms
        Encountered expected error: (mongo.CommandError): (NotWritablePrimary) Failing command due to 'failCommand' failpoint
      • Time elapsed to do: find query 2: 999.679731ms
        Encountered expected error: (mongo.CommandError): (NotWritablePrimary) Failing command due to 'failCommand' failpoint
      • Time elapsed to do: clear failpoint: 500.759202ms
      • Time elapsed to do: clear failpoint: 1.258652ms
      • Time elapsed to do: clean table post test: 1.381183ms
      • Time elapsed to do: disconnect client: 1.103236ms

      ```

      First: ~500m

      Second/third: ~1s

       

      Output with `v1.4.0`:

      ```

      == Testing Find queries failpoint ENABLED ==

      • Time elapsed to do: connect client: 41.041µs
      • Time elapsed to do: clear failpoint: 5.948902ms
      • Time elapsed to do: populate table: 946.695µs
      • Time elapsed to do: set failpoint: 849.93µs
      • Time elapsed to do: find query 0: 9.996635035s
        Encountered expected error: (mongo.CommandError): (NotWritablePrimary) Failing command due to 'failCommand' failpoint
      • Time elapsed to do: find query 1: 20.015427043s
        Encountered expected error: (mongo.CommandError): (NotWritablePrimary) Failing command due to 'failCommand' failpoint
      • Time elapsed to do: find query 2: 19.987305077s
        Encountered expected error: (mongo.CommandError): (NotWritablePrimary) Failing command due to 'failCommand' failpoint
      • Time elapsed to do: clear failpoint: 10.022557671s
      • Time elapsed to do: clear failpoint: 1.460051ms
      • Time elapsed to do: clean table post test: 1.456988ms
      • Time elapsed to do: disconnect client: 1.322098ms

      ```

      First: ~10s

      Second/third: ~20s

       

      The query latency jumps x10. The time to clear the failpoint does as well.

      1. Why does this happen?
      2. Why does it appear to double for every query after the first?

      The same pattern appears for the write query.

      Impact: As a result we cannot depend on our unit tests as all queries with failCommand have this delay.

            Assignee:
            benji.rewis@mongodb.com Benji Rewis (Inactive)
            Reporter:
            eliesrs@gmail.com Ethan Lie
            Votes:
            1 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: