[DRIVERS-2648] Add driver tests for Serverless Proxy incremental rollout Created: 12/Jun/23  Updated: 25/Jan/24

Status: Implementing
Project: Drivers
Component/s: Serverless
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Siyuan Zhou Assignee: Steve Silvester
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by DRIVERS-2812 Address failing tests in Serverless P... Investigating
Issue split
split to CDRIVER-4770 Add driver tests for Serverless Proxy... Backlog
split to CXX-2782 Add driver tests for Serverless Proxy... Backlog
split to JAVA-5234 Add driver tests for Serverless Proxy... Backlog
split to NODE-5731 Add driver tests for Serverless Proxy... Backlog
split to CSHARP-4836 Add driver tests for Serverless Proxy... Closed
split to GODRIVER-3040 Add driver tests for Serverless Proxy... Closed
split to MOTOR-1207 Add driver tests for Serverless Proxy... Closed
split to PHPLIB-1303 Add driver tests for Serverless Proxy... Closed
split to PYTHON-4031 Add driver tests for Serverless Proxy... Closed
split to RUBY-3347 Add driver tests for Serverless Proxy... Closed
split to RUST-1796 Add driver tests for Serverless Proxy... Closed
Related
Tested
Driver Changes: Needed - No Spec Changes
Quarter: FY24Q4
Downstream Changes Summary:

Cloud is implementing a new serverless proxy that will be incrementally rolled out to new and existing clusters over the coming months. The team is leveraging pymongo's test suite to verify compliance with driver specs. In order to help ensure compatibility with all of our drivers - especially where they differ subtly and unintentionally from our official specs - we have agreed to add tests against the new serverless proxy to each driver's CI suite.  There are known failures that are being tracked in DRIVERS-2812.  The Serverless Proxy tests should not be run on PR commits until those are addressed.

  1. DRIVERS-2812
  1. DRIVERS-2812
  1. DRIVERS-2812
  1. DRIVERS-2812

 

 

Each driver must continue running their serverless test suite against the existing serverless project (which will remain configured for the current serverless proxy). Each driver must run that same serverless test suite against the new serverless project which has been configured to use the new serverless proxy. Please let james.kovacs@mongodb.com (DBX) and siyuan.zhou@mongodb.com (Serverless) know of any reproducible failures against the new serverless proxy.

Note for implementers: there is a new SERVERLESS_REGION variable that must be set to US_EAST_1.  It is recommended that drivers follow the pattern used in https://github.com/mongodb/mongo-python-driver/commit/41a131ea1c15ffa969a14ce1334ce19837dc226b, as described in the Serverless README.

Start date:
Driver Compliance:
Key Status/Resolution FixVersion
CDRIVER-4770 Backlog
CXX-2782 Backlog
CSHARP-4836 Fixed 2.23.0
GODRIVER-3040 Fixed 2.0.0
JAVA-5234 Backlog
NODE-5731 Backlog
MOTOR-1207 Duplicate
PYTHON-4031 Fixed 4.7
PHPLIB-1303 Done
RUBY-3347 Fixed 2.20.0
RUST-1796 Fixed 2.8.0

 Description   

Summary

Atlas Serverless team plans to roll out serverless proxy incrementally in CLOUDP-145502. Serverless Proxy will be deployed between Cloud Load Balancers and Atlas Proxy, replacing the vanilla Envoy. Serverless Proxy will terminate client TLS connections and transcode requests to gRPC when forwarding them to the Atlas Proxy. Serverless Proxy also handles hello commands and process authentication commands specially.

During the incremental rollout, we will only enable the new behavior for some tenants using SCRAM authentication, while keeping the current behavior for others, e.g. those that haven't be included in the rollout and those using X.509. Since Serverless Proxy essentially handles connection management and authentication for Serverless with dramatic changes like using gRPC, it's critical to test it's compatible with drivers.

I'd propose to duplicate the driver serverless tests for Serverless Proxy's new behavior on 1-2 popular drivers, ideally node.js or python. As far as I know, we need to

  • add new test in the Evergreen config.
  • update create-instance.sh to run against the pre-defined serverless cluster in rollout or call Atlas test-only API to enable the rollout on Dev.
  • make sure only SCRAM is used in the tests, otherwise, disable other authentication methods.

I wonder if the plan makes sense to Driver team and whether you have bandwidth later this quarter. Alternatively, Atlas Serverless are happy to add the tests with the Driver team's guidance and code review.

Motivation

Who is the affected end user?

Atlas Serverless team.

How does this affect the end user?

This is to add correctness test coverage for an internal architectural change.

Is this issue urgent?

Atlas Serverless team plans to start testing in a month.

Is this ticket required by a downstream team?

Atlas Serverless

Is this ticket only for tests?

Only for tests.

Acceptance Criteria

Add driver tests that run against serverless clusters that are included in the Serverless Proxy incremental rollout.



 Comments   
Comment by Githook User [ 22/Jan/24 ]

Author:

{'name': 'Steven Silvester', 'email': 'steven.silvester@ieee.org', 'username': 'blink1073'}

Message: DRIVERS-2648 Add AWS Secrets handling for serverless (#371)

  • add setup file
  • fix path
  • add vault name parameter
  • fix vault name handling
  • try us-east-1
  • update path handling
  • try us-east-1
  • try us-east-1
  • make region configurable
  • add readme
Comment by Githook User [ 19/Jan/24 ]

Author:

{'name': 'Steven Silvester', 'email': 'steven.silvester@ieee.org', 'username': 'blink1073'}

Message: DRIVERS-2648 Add AWS Secrets handling for serverless (#371)

  • add setup file
  • fix path
  • add vault name parameter
  • fix vault name handling
  • try us-east-1
  • update path handling
  • try us-east-1
  • try us-east-1
  • make region configurable
  • add readme
Comment by Celina Tala [ 19/Jan/24 ]

Final Update: All but 2 driver tests are successfully running under gRPC. 

test_errors_during_the_initial_connection_hello_are_ignored: This is to be expected as we currently handle the hello command in the serverless proxy instead of the atlas proxy, while failpoints are counted in the atlas proxy. It will be resolved once we move failpoint configuration to the serverless proxy. 

 

test_pinned_connection_is_released_after_a_transient_network_commit_error: This error is also expected until PYTHON-4074 is resolved.

Comment by Celina Tala [ 18/Jan/24 ]

Filed PYTHON-4152 for the pythons team to dig into the remaining failures

Comment by Steve Silvester [ 12/Jan/24 ]

Running in US_EAST_1 we're now down to three failures. I'm going to consider this unblocked from the drivers end as the serverless team works on the remaining errors.

Comment by Celina Tala [ 11/Jan/24 ]

steve.silvester@mongodb.com We realized that the drivers test creates serverless instances in region US_EAST_2 (here) while our gRPC MTM is created in region US_EAST_1. Would it be easy to change this script? We can also make a new MTM in the correct region on our side.

We also had to turn on Single Target Serverless Deployment and Atlas Enable Test Commands feature flag for our gRPC MTM Group

Comment by Celina Tala [ 10/Jan/24 ]

steve.silvester@mongodb.com Hi steve! I looked into the failing tests in the patch you linked, and all of them succeeded in our nightly test runs (the specific tests are bookmarked). The test_errors_during_the_initial_connection_hello_are_ignored test is expected to fail as our hello command is dealt with in the serverless proxy while failpoints are still dealt with in our Atlas Proxy. So I wonder if the failures come from somewhere else? Our test currently runs Python Driver version 4.6.0 using python 3.11. I see you mentioned that multiple drivers are also experiencing test failures, are the tests all the same across all drivers?

Comment by Steve Silvester [ 10/Jan/24 ]

Hi tristan.wedderburn@mongodb.com, we don't have anything merged yet, since it isn't fully working in any of the drivers. Here is the open PR for Python: https://github.com/mongodb/mongo-python-driver/pull/1428.

And the latest manual build, with the same set of failures as when I last tried in December:
https://spruce.mongodb.com/task/mongo_python_driver_serverless_next__platform~rhel8_auth_ssl~auth_ssl_python_version~3.10_serverless~enabled_test_serverless_patch_578024e16af92dcdb4f4a871e257d46483aff58b_659e9ce4d6d80a84c0823e53_24_01_10_13_34_42?execution=0&sortBy=STATUS&sortDir=ASC

Comment by Steve Silvester [ 10/Jan/24 ]

Hi tristan.wedderburn@mongodb.com, we don't have anything merged yet, since it isn't fully working in any of the drivers. Here is the open PR for Python: https://github.com/mongodb/mongo-python-driver/pull/1428.

And the latest manual build, with the same set of failures as when I last tried in December:
https://spruce.mongodb.com/task/mongo_python_driver_serverless_next__platform~rhel8_auth_ssl~auth_ssl_python_version~3.10_serverless~enabled_test_serverless_patch_578024e16af92dcdb4f4a871e257d46483aff58b_659e9ce4d6d80a84c0823e53_24_01_10_13_34_42?execution=0&sortBy=STATUS&sortDir=ASC

Comment by Tristan Wedderburn [ 10/Jan/24 ]

Hi steve.silvester@mongodb.com, Siyuan has released a newer version to cloud-dev as of today. Is there a nightly evergreen patch that we can watch?

Comment by Steve Silvester [ 08/Jan/24 ]

Hi tristan.wedderburn@mongodb.com, I'm back in office. I started watching CLOUDP-219103, is the plan to release it this week?

Comment by Steve Silvester [ 21/Dec/23 ]

Thanks tristan.wedderburn@mongodb.com! Today is my last day in office for the next two weeks, we can sync up when I get back.

Comment by Tristan Wedderburn [ 21/Dec/23 ]

Hi Team, after investigating some of the recent driver test failures, it appears that we need to release a newer version of the serverless proxy to cloud-dev. For example, we need to release functionality such as supporting C# driver auth over OP_QUERY, which was added in CLOUDP-201831. Since this is a short week, we will proceed with the upgrade early next week. CLOUDP-219103 will track that upgrade. Once that is released, we should revisit the remaining failing tests across the drivers

Comment by Siyuan Zhou [ 20/Dec/23 ]

steve.silvester@mongodb.com, thank you! tristan.wedderburn@mongodb.com can help you debugging the failures this week.

Comment by Steve Silvester [ 18/Dec/23 ]

siyuan.zhou@mongodb.com, celina.tala@mongodb.com, given that we are failing the same tests against multiple drivers with the new proxy version, what steps would you like to take next?

Comment by Celina Tala [ 16/Nov/23 ]

Also fyi we still have not turned on gRPC for these tests yet. We will let you know once we do

Generated at Thu Feb 08 08:26:06 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.