[CDRIVER-2783] test-valgrind-latest-sharded-auth-openssl cannot initialize MongoDB Created: 01/Aug/18  Updated: 28/Oct/23  Resolved: 06/Aug/18

Status: Closed
Project: C Driver
Component/s: tests
Affects Version/s: 1.12.0
Fix Version/s: 1.13.0

Type: Bug Priority: Major - P3
Reporter: A. Jesse Jiryu Davis Assignee: A. Jesse Jiryu Davis
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Problem/Incident
is caused by SERVER-36459 --keyFile now required to start shard... Closed
is caused by SERVER-36460 TLS certificate "purpose" requirement... Closed

 Description   

Seen here:

https://evergreen.mongodb.com/task/mongo_c_driver_valgrind_ubuntu_test_valgrind_latest_sharded_auth_openssl_patch_bb34c6f2fdf52a2c91ab0fcb67e8aed1e7a5a5b4_5b61185dc9ec444d01677620_18_08_01_02_18_05

Mongo Orchestration tries to start a sharded cluster of replica sets with SSL and auth. According to the log file in mongo-agxOY7/mongod.log, the replica on port 27219 seems to reject connection attempts from the other replicas:

2018-08-01T22:36:37.579+0000 I NETWORK  [listener] connection accepted from 127.0.0.1:56037 #12 (3 connections now open)
2018-08-01T22:36:37.584+0000 W NETWORK  [conn12] SSL peer certificate validation failed: unsupported certificate purpose
2018-08-01T22:36:37.584+0000 I NETWORK  [conn12] end connection 127.0.0.1:56037 (2 connections now open)

The replica seems to accept connections from Mongo Orchestration itself, which uses PyMongo to connect.

Later, it logs errors like:

2018-08-01T22:40:39.605+0000 I NETWORK  [listener] connection accepted from 127.0.0.1:57142 #75 (6 connections now open)
2018-08-01T22:40:39.609+0000 W NETWORK  [conn75] SSL peer certificate validation failed: unsupported certificate purpose
2018-08-01T22:40:39.609+0000 I NETWORK  [conn75] received client metadata from 127.0.0.1:57142 conn75: { driver: { name: "MongoDB Internal Client", version: "4.1.1-175-g075d7fe" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "14.04" } }
2018-08-01T22:40:39.610+0000 I ACCESS   [conn75] SASL SCRAM-SHA-1 authentication failed for __system on local from client 127.0.0.1:57142 ; AuthenticationFailed: It is not possible to authenticate as the __system user on servers started without a --keyFile parameter
2018-08-01T22:40:39.610+0000 I NETWORK  [conn75] end connection 127.0.0.1:57142 (5 connections now open)

I don't know whether the SSL error is the root cause, or a symptom, or doesn't matter. The AuthenticationFailed error seems crucial.

The other replicas log similarly. mongos logs:

2018-08-01T22:37:04.241+0000 I NETWORK  [ReplicaSetMonitor-TaskExecutor] can't authenticate to localhost:27218 as internal user, error: Authentication failed.



 Comments   
Comment by Githook User [ 06/Aug/18 ]

Author:

{'name': 'A. Jesse Jiryu Davis', 'email': 'jesse@mongodb.com', 'username': 'ajdavis'}

Message: CDRIVER-2783 use one-node replica sets as shards

Work around this Mongo Orchestration issue by using one-node replica
sets as shard servers with TLS and auth:

https://github.com/10gen/mongo-orchestration/issues/251
Branch: master
https://github.com/mongodb/mongo-c-driver/commit/e1f5405d8861d7e70cd447b28d88d40acf89fa73

Comment by Githook User [ 06/Aug/18 ]

Author:

{'name': 'A. Jesse Jiryu Davis', 'email': 'jesse@mongodb.com', 'username': 'ajdavis'}

Message: CDRIVER-2783 update test certificates
Branch: master
https://github.com/mongodb/mongo-c-driver/commit/6a16e7bacaef0a5755b537447cf1a6c4718749db

Comment by Shane Harvey [ 06/Aug/18 ]

This looks like it's caused by the same issue described in HELP-7061. MO attempts to shutdown the server on port 27218 (log file "/data/mci/4098235018245251bf48b09ce9d836b8/mongoc/MO/db/mongo-IegT5e/mongod.log") and the shutdown command fails on the server with:

2018-08-01T22:37:17.970+0000 I COMMAND  [conn50] command admin.$cmd command: shutdown { shutdown: 1, force: true, $readPreference: { mode: "secondaryPreferred" }, $db: "admin" } numYields:0 ok:0 errMsg:"operation was interrupted" errName:InterruptedDueToStepDown errCode:11602 reslen:415 locks:{ Global: { acquireCount: { r: 2, W: 2 }, acquireWaitCount: { W: 1 }, timeAcquiringMicros: { W: 221 } } } protocol:op_query 6374ms
2018-08-01T22:37:17.970+0000 I NETWORK  [conn50] Error sending response to client: SocketException: Broken pipe. Ending connection from 127.0.0.1:60715 (connection id: 50)

Pymongo never gets a network/socket error and is stuck waiting for a response from the mongod. TCP keepalive should eventually cause a socket error but the curl times out first:

[2018/08/01 15:41:12.522] curl: (28) Operation timed out after 300000 milliseconds with 0 bytes received

Comment by A. Jesse Jiryu Davis [ 05/Aug/18 ]

Part of the problem is that MongoDB requires shard servers to start with --keyFile if there are multiple servers per shard and auth is enabled, but Mongo Orchestration doesn't do that correctly:

https://github.com/10gen/mongo-orchestration/issues/251

Until recently the C Driver's tests of a sharded cluster with OpenSSL and auth weren't running at all. Adding ASAN and Coverity tests happened to add new variants that do test a sharded cluster with OpenSSL and auth, and now we're seeing that our tests are misconfigured.

Generated at Wed Feb 07 21:16:19 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.