Uploaded image for project: 'C Driver'
  1. C Driver
  2. CDRIVER-3519

Artificially large ismaster may timeout when sending in Windows

      The test /Async/large_ismaster was frequently failing on Windows due to timing out:

      https://evergreen.mongodb.com/task/mongo_c_driver_windows_2017_test_latest_server_auth_sspi_winssl_b12e2a44dd2a3321aecd0222dcde173b8c5b9457_20_02_03_16_22_28
      https://evergreen.mongodb.com/task/mongo_c_driver_windows_2017_test_latest_server_auth_sasl_winssl_b12e2a44dd2a3321aecd0222dcde173b8c5b9457_20_02_03_16_22_2
      https://evergreen.mongodb.com/task/mongo_c_driver_windows_2017_32_test_latest_server_auth_nosasl_winssl_b12e2a44dd2a3321aecd0222dcde173b8c5b9457_20_02_03_16_22_28
      https://evergreen.mongodb.com/task/mongo_c_driver_windows_2017_32_test_latest_server_noauth_nosasl_winssl_b12e2a44dd2a3321aecd0222dcde173b8c5b9457_20_02_03_16_22_28
      https://evergreen.mongodb.com/task/mongo_c_driver_windows_2017_32_test_latest_server_auth_sasl_winssl_b12e2a44dd2a3321aecd0222dcde173b8c5b9457_20_02_03_16_22_28

      It tests sending an artificially large (>= 1MB) isMaster command through the async monitoring logic. On failure, it takes 10 seconds (the specified timeout of the async command), on passing it takes <1 second.

      I investigated by running only this test on repeat, adding printfs, and making mongod logs more verbose. This was the patch with those results:
      https://evergreen.mongodb.com/version/5e42c53b32f41774e352dcba

      The printfs are interesting. The first call to writev succeeds, but then libmongoc appears to repeatedly attempt to write the rest without making progress until it times out:

      [2020/02/11 13:39:17.907] pre writev, writing 1048618 bytes
      [2020/02/11 13:39:17.907] post writev
      [2020/02/11 13:39:17.907] pre-poll
      [2020/02/11 13:39:17.907] post-poll
      [2020/02/11 13:39:17.907] Have written 561152 bytes, writing more.
      [2020/02/11 13:39:17.907] pre writev, writing 487466 bytes
      [2020/02/11 13:39:17.907] post writev
      [2020/02/11 13:39:17.907] pre-poll
      [2020/02/11 13:39:17.907] post-poll
      [2020/02/11 13:39:17.907] Have written 561152 bytes, writing more.
      [2020/02/11 13:39:17.907] pre writev, writing 487466 bytes
      [2020/02/11 13:39:17.907] post writev
      [2020/02/11 13:39:17.907] pre-poll
      [2020/02/11 13:39:17.907] post-poll
      [2020/02/11 13:39:17.907] Have written 561152 bytes, writing more.
      [2020/02/11 13:39:17.907] pre writev, writing 487466 bytes
      [2020/02/11 13:39:17.907] post writev
      ...
      

      Here are the relevant verbose logs of mongod (slightly edited for readability):

      2020-02-11T18:39:17.482+0000 D2 NETWORK  [conn9] Accepted TLS connection from peer: C=US,ST=New York,L=New York City,O=MDB,OU=Drivers,CN=client
      2020-02-11T18:39:17.551+0000 D2 COMMAND  [conn9] warning: log line attempted (960kB) over max size (10kB), printing beginning and end ... run command admin.$cmd { isMaster: 1, key_000000: 0, ... key_065534: 0, $readPreference: { mode: "secondaryPreferred" }, $db: "admin" }
      2020-02-11T18:39:17.609+0000 D3 NETWORK  [conn9] Starting server-side compression negotiation
      2020-02-11T18:39:17.610+0000 D3 NETWORK  [conn9] Compression negotiation not requested by client
      2020-02-11T18:39:17.618+0000 I  COMMAND  [conn9] warning: log line attempted (960kB) over max size (10kB), printing beginning and end ... command admin.$cmd command: isMaster { isMaster: 1, key_000000: 0, ... key_065534: 0, $readPreference: { mode: "secondaryPreferred" }, $db: "admin" } numYields:0 reslen:257 locks:{} protocol:op_query 10ms
      2020-02-11T18:39:17.621+0000 D2 NETWORK  [conn9] Session from 127.0.0.1:49797 encountered a network error during SourceMessage: HostUnreachable: Connection closed by peer
      2020-02-11T18:39:17.621+0000 I  NETWORK  [conn9] end connection 127.0.0.1:49797 (0 connections now open)
      2020-02-11T18:39:17.852+0000 I  NETWORK  [listener] connection accepted from 127.0.0.1:49798 #10 (1 connection now open)
      2020-02-11T18:39:17.897+0000 D2 NETWORK  [conn10] Accepted TLS connection from peer: C=US,ST=New York,L=New York City,O=MDB,OU=Drivers,CN=client
      2020-02-11T18:39:17.905+0000 D2 NETWORK  [conn10] Session from 127.0.0.1:49798 encountered a network error during SourceMessage: SocketException: The specified data could not be decrypted.
      2020-02-11T18:39:17.905+0000 I  NETWORK  [conn10] end connection 127.0.0.1:49798 (0 connections now open)
      2020-02-11T18:39:28.714+0000 D1 EXECUTOR [IndexBuildsCoordinatorMongod-0] Reaping this thread; next thread reaped no earlier than 2020-02-11T18:39:58.714+0000
      2020-02-11T18:39:28.714+0000 D1 EXECUTOR [IndexBuildsCoordinatorMongod-0] shutting down thread in pool IndexBuildsCoordinatorMongod
      

      It appears to send the full ismaster, but then prints Connection closed by peer followed by SocketException: The specified data could not be decrypted..

            Assignee:
            Unassigned Unassigned
            Reporter:
            kevin.albertson@mongodb.com Kevin Albertson
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: