-
Type: Task
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Testing
The test /Async/large_ismaster was frequently failing on Windows due to timing out:
https://evergreen.mongodb.com/task/mongo_c_driver_windows_2017_test_latest_server_auth_sspi_winssl_b12e2a44dd2a3321aecd0222dcde173b8c5b9457_20_02_03_16_22_28
https://evergreen.mongodb.com/task/mongo_c_driver_windows_2017_test_latest_server_auth_sasl_winssl_b12e2a44dd2a3321aecd0222dcde173b8c5b9457_20_02_03_16_22_2
https://evergreen.mongodb.com/task/mongo_c_driver_windows_2017_32_test_latest_server_auth_nosasl_winssl_b12e2a44dd2a3321aecd0222dcde173b8c5b9457_20_02_03_16_22_28
https://evergreen.mongodb.com/task/mongo_c_driver_windows_2017_32_test_latest_server_noauth_nosasl_winssl_b12e2a44dd2a3321aecd0222dcde173b8c5b9457_20_02_03_16_22_28
https://evergreen.mongodb.com/task/mongo_c_driver_windows_2017_32_test_latest_server_auth_sasl_winssl_b12e2a44dd2a3321aecd0222dcde173b8c5b9457_20_02_03_16_22_28
It tests sending an artificially large (>= 1MB) isMaster command through the async monitoring logic. On failure, it takes 10 seconds (the specified timeout of the async command), on passing it takes <1 second.
I investigated by running only this test on repeat, adding printfs, and making mongod logs more verbose. This was the patch with those results:
https://evergreen.mongodb.com/version/5e42c53b32f41774e352dcba
The printfs are interesting. The first call to writev succeeds, but then libmongoc appears to repeatedly attempt to write the rest without making progress until it times out:
[2020/02/11 13:39:17.907] pre writev, writing 1048618 bytes [2020/02/11 13:39:17.907] post writev [2020/02/11 13:39:17.907] pre-poll [2020/02/11 13:39:17.907] post-poll [2020/02/11 13:39:17.907] Have written 561152 bytes, writing more. [2020/02/11 13:39:17.907] pre writev, writing 487466 bytes [2020/02/11 13:39:17.907] post writev [2020/02/11 13:39:17.907] pre-poll [2020/02/11 13:39:17.907] post-poll [2020/02/11 13:39:17.907] Have written 561152 bytes, writing more. [2020/02/11 13:39:17.907] pre writev, writing 487466 bytes [2020/02/11 13:39:17.907] post writev [2020/02/11 13:39:17.907] pre-poll [2020/02/11 13:39:17.907] post-poll [2020/02/11 13:39:17.907] Have written 561152 bytes, writing more. [2020/02/11 13:39:17.907] pre writev, writing 487466 bytes [2020/02/11 13:39:17.907] post writev ...
Here are the relevant verbose logs of mongod (slightly edited for readability):
2020-02-11T18:39:17.482+0000 D2 NETWORK [conn9] Accepted TLS connection from peer: C=US,ST=New York,L=New York City,O=MDB,OU=Drivers,CN=client 2020-02-11T18:39:17.551+0000 D2 COMMAND [conn9] warning: log line attempted (960kB) over max size (10kB), printing beginning and end ... run command admin.$cmd { isMaster: 1, key_000000: 0, ... key_065534: 0, $readPreference: { mode: "secondaryPreferred" }, $db: "admin" } 2020-02-11T18:39:17.609+0000 D3 NETWORK [conn9] Starting server-side compression negotiation 2020-02-11T18:39:17.610+0000 D3 NETWORK [conn9] Compression negotiation not requested by client 2020-02-11T18:39:17.618+0000 I COMMAND [conn9] warning: log line attempted (960kB) over max size (10kB), printing beginning and end ... command admin.$cmd command: isMaster { isMaster: 1, key_000000: 0, ... key_065534: 0, $readPreference: { mode: "secondaryPreferred" }, $db: "admin" } numYields:0 reslen:257 locks:{} protocol:op_query 10ms 2020-02-11T18:39:17.621+0000 D2 NETWORK [conn9] Session from 127.0.0.1:49797 encountered a network error during SourceMessage: HostUnreachable: Connection closed by peer 2020-02-11T18:39:17.621+0000 I NETWORK [conn9] end connection 127.0.0.1:49797 (0 connections now open) 2020-02-11T18:39:17.852+0000 I NETWORK [listener] connection accepted from 127.0.0.1:49798 #10 (1 connection now open) 2020-02-11T18:39:17.897+0000 D2 NETWORK [conn10] Accepted TLS connection from peer: C=US,ST=New York,L=New York City,O=MDB,OU=Drivers,CN=client 2020-02-11T18:39:17.905+0000 D2 NETWORK [conn10] Session from 127.0.0.1:49798 encountered a network error during SourceMessage: SocketException: The specified data could not be decrypted. 2020-02-11T18:39:17.905+0000 I NETWORK [conn10] end connection 127.0.0.1:49798 (0 connections now open) 2020-02-11T18:39:28.714+0000 D1 EXECUTOR [IndexBuildsCoordinatorMongod-0] Reaping this thread; next thread reaped no earlier than 2020-02-11T18:39:58.714+0000 2020-02-11T18:39:28.714+0000 D1 EXECUTOR [IndexBuildsCoordinatorMongod-0] shutting down thread in pool IndexBuildsCoordinatorMongod
It appears to send the full ismaster, but then prints Connection closed by peer followed by SocketException: The specified data could not be decrypted..