predictive_connpool.js is slightly flaky on Windows

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Won't Fix
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Networking & Observability
    • ALL
    • Hide

      Run predictive_connpool.js under burn-in a lot.

      Show
      Run predictive_connpool.js under burn-in a lot.
    • 200
    • None
    • 3
    • TBD
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • 0

      I've seen this test fail in burn-in on Windows in two ways:

      1. (more rarely) Socket reuse? Here's an example:

      [js_test:predictive_connpool] @jstests\noPassthrough\network\predictive_connpool.js:167:4
      [js_test:predictive_connpool] couldn't connect to server EC2AMAZ-TIEA7HP:21545, connection attempt failed: SocketException: Error connecting to EC2AMAZ-TIEA7HP:21545 (10.128.168.40:21545) :: caused by :: syncConnect connect error :: caused by :: Only one usage of each socket address (protocol/network address/port) is normally permitted. 

      2. (more commonly) One of the checks fails because the expected number of in-use connections doesn't rise as high as expected, as though the mongos has deadlocked. Here's an example:

      [js_test:predictive_connpool] uncaught exception: Error: assert.soon failed (timeout 600000ms), msg : Check #4 failed The hang analyzer is automatically called in assert.soon functions. If you are *expecting* assert.soon to possibly fail, call assert.soon with {runHangAnalyzer: false} as the fifth argument (you can fill unused arguments with `undefined`). :
      [js_test:predictive_connpool] doassert@src/mongo/shell/assert.js:20:14
      [js_test:predictive_connpool] _doassert@src/mongo/shell/assert.js:124:13
      [js_test:predictive_connpool] assert.soon@src/mongo/shell/assert.js:431:22
      [js_test:predictive_connpool] hasConnPoolStats@jstests\noPassthrough\network\predictive_connpool.js:97:12
      [js_test:predictive_connpool] walkThroughBehavior@jstests\noPassthrough\network\predictive_connpool.js:132:21
      [js_test:predictive_connpool] @jstests\noPassthrough\network\predictive_connpool.js:153:20
      [js_test:predictive_connpool] Error: assert.soon failed (timeout 600000ms), msg : Check #4 failed The hang analyzer is automatically called in assert.soon functions. If you are *expecting* assert.soon to possibly fail, call assert.soon with {runHangAnalyzer: false} as the fifth argument (you can fill unused arguments with `undefined`).
      [js_test:predictive_connpool] failed to load: jstests\noPassthrough\network\predictive_connpool.js
      

      Note that those links are from a patch where I modified the test to include a bit more logging (logging the full connPoolStats output instead of part of it) and I modified burn-in to run it 10x more than it normally would. Here's another patch where I just added a newline to the end so burn-in would pick it up. I did have to restart burn-in a few times before the failures occurred.

      I'm not sure is if this is an issue where only some specific builds fail: can you start a patch and get a "good" build that will never fail, or will any build will eventually fail? On some patches I could restart burn-in 12 times with no failures, and on others it failed pretty readily. I thought our builds were deterministic but that may not apply to Windows.

            Assignee:
            Unassigned
            Reporter:
            Ryan Berryhill
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: