[CDRIVER-3519] Artificially large ismaster may timeout when sending in Windows Created: 03/Feb/20  Updated: 25/Sep/23

Status: Backlog
Project: C Driver
Component/s: tests
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Kevin Albertson Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: flaky-tests, platform-problems
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Epic Link: Stabilize Evergreen

 Description   

The test /Async/large_ismaster was frequently failing on Windows due to timing out:

https://evergreen.mongodb.com/task/mongo_c_driver_windows_2017_test_latest_server_auth_sspi_winssl_b12e2a44dd2a3321aecd0222dcde173b8c5b9457_20_02_03_16_22_28
https://evergreen.mongodb.com/task/mongo_c_driver_windows_2017_test_latest_server_auth_sasl_winssl_b12e2a44dd2a3321aecd0222dcde173b8c5b9457_20_02_03_16_22_2
https://evergreen.mongodb.com/task/mongo_c_driver_windows_2017_32_test_latest_server_auth_nosasl_winssl_b12e2a44dd2a3321aecd0222dcde173b8c5b9457_20_02_03_16_22_28
https://evergreen.mongodb.com/task/mongo_c_driver_windows_2017_32_test_latest_server_noauth_nosasl_winssl_b12e2a44dd2a3321aecd0222dcde173b8c5b9457_20_02_03_16_22_28
https://evergreen.mongodb.com/task/mongo_c_driver_windows_2017_32_test_latest_server_auth_sasl_winssl_b12e2a44dd2a3321aecd0222dcde173b8c5b9457_20_02_03_16_22_28

It tests sending an artificially large (>= 1MB) isMaster command through the async monitoring logic. On failure, it takes 10 seconds (the specified timeout of the async command), on passing it takes <1 second.

I investigated by running only this test on repeat, adding printfs, and making mongod logs more verbose. This was the patch with those results:
https://evergreen.mongodb.com/version/5e42c53b32f41774e352dcba

The printfs are interesting. The first call to writev succeeds, but then libmongoc appears to repeatedly attempt to write the rest without making progress until it times out:

[2020/02/11 13:39:17.907] pre writev, writing 1048618 bytes
[2020/02/11 13:39:17.907] post writev
[2020/02/11 13:39:17.907] pre-poll
[2020/02/11 13:39:17.907] post-poll
[2020/02/11 13:39:17.907] Have written 561152 bytes, writing more.
[2020/02/11 13:39:17.907] pre writev, writing 487466 bytes
[2020/02/11 13:39:17.907] post writev
[2020/02/11 13:39:17.907] pre-poll
[2020/02/11 13:39:17.907] post-poll
[2020/02/11 13:39:17.907] Have written 561152 bytes, writing more.
[2020/02/11 13:39:17.907] pre writev, writing 487466 bytes
[2020/02/11 13:39:17.907] post writev
[2020/02/11 13:39:17.907] pre-poll
[2020/02/11 13:39:17.907] post-poll
[2020/02/11 13:39:17.907] Have written 561152 bytes, writing more.
[2020/02/11 13:39:17.907] pre writev, writing 487466 bytes
[2020/02/11 13:39:17.907] post writev
...

Here are the relevant verbose logs of mongod (slightly edited for readability):

2020-02-11T18:39:17.482+0000 D2 NETWORK  [conn9] Accepted TLS connection from peer: C=US,ST=New York,L=New York City,O=MDB,OU=Drivers,CN=client
2020-02-11T18:39:17.551+0000 D2 COMMAND  [conn9] warning: log line attempted (960kB) over max size (10kB), printing beginning and end ... run command admin.$cmd { isMaster: 1, key_000000: 0, ... key_065534: 0, $readPreference: { mode: "secondaryPreferred" }, $db: "admin" }
2020-02-11T18:39:17.609+0000 D3 NETWORK  [conn9] Starting server-side compression negotiation
2020-02-11T18:39:17.610+0000 D3 NETWORK  [conn9] Compression negotiation not requested by client
2020-02-11T18:39:17.618+0000 I  COMMAND  [conn9] warning: log line attempted (960kB) over max size (10kB), printing beginning and end ... command admin.$cmd command: isMaster { isMaster: 1, key_000000: 0, ... key_065534: 0, $readPreference: { mode: "secondaryPreferred" }, $db: "admin" } numYields:0 reslen:257 locks:{} protocol:op_query 10ms
2020-02-11T18:39:17.621+0000 D2 NETWORK  [conn9] Session from 127.0.0.1:49797 encountered a network error during SourceMessage: HostUnreachable: Connection closed by peer
2020-02-11T18:39:17.621+0000 I  NETWORK  [conn9] end connection 127.0.0.1:49797 (0 connections now open)
2020-02-11T18:39:17.852+0000 I  NETWORK  [listener] connection accepted from 127.0.0.1:49798 #10 (1 connection now open)
2020-02-11T18:39:17.897+0000 D2 NETWORK  [conn10] Accepted TLS connection from peer: C=US,ST=New York,L=New York City,O=MDB,OU=Drivers,CN=client
2020-02-11T18:39:17.905+0000 D2 NETWORK  [conn10] Session from 127.0.0.1:49798 encountered a network error during SourceMessage: SocketException: The specified data could not be decrypted.
2020-02-11T18:39:17.905+0000 I  NETWORK  [conn10] end connection 127.0.0.1:49798 (0 connections now open)
2020-02-11T18:39:28.714+0000 D1 EXECUTOR [IndexBuildsCoordinatorMongod-0] Reaping this thread; next thread reaped no earlier than 2020-02-11T18:39:58.714+0000
2020-02-11T18:39:28.714+0000 D1 EXECUTOR [IndexBuildsCoordinatorMongod-0] shutting down thread in pool IndexBuildsCoordinatorMongod

It appears to send the full ismaster, but then prints Connection closed by peer followed by SocketException: The specified data could not be decrypted..



 Comments   
Comment by Kevin Albertson [ 25/Sep/23 ]

Test is currently skipped: https://github.com/mongodb/mongo-c-driver/blob/015c4497c2e98469d31eedf0213e0e7500105f31/src/libmongoc/tests/test-mongoc-async.c#L387

Test was unskipped and run in a patch build: https://spruce.mongodb.com/version/6511c0353627e09fd1d599f0/. Resulted in failure.

Comment by Githook User [ 12/Feb/20 ]

Author:

{'name': 'Kevin Albertson', 'username': 'kevinAlbs', 'email': 'kevin.albertson@mongodb.com'}

Message: CDRIVER-3519 add back skip if not single
Branch: master
https://github.com/mongodb/mongo-c-driver/commit/50a7cdc51911f43ce5d79ceaf8235b763b29bf4a

Comment by Kevin Albertson [ 12/Feb/20 ]

Moving this back to "Open" to be further investigated at a later time.

Comment by Githook User [ 12/Feb/20 ]

Author:

{'name': 'Kevin Albertson', 'username': 'kevinAlbs', 'email': 'kevin.albertson@mongodb.com'}

Message: CDRIVER-3519 skip /Async/large_ismaster on Windows
Branch: master
https://github.com/mongodb/mongo-c-driver/commit/6b8ee2c563eaac3595af031f53f75170187ef7d7

Generated at Wed Feb 07 21:18:17 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.