[DRIVERS-910] Allow MongoClient to automatically transition from replica set to sharded endpoint without restarts Created: 19/Feb/20  Updated: 21/Dec/23

Status: Backlog
Project: Drivers
Component/s: SDAM
Fix Version/s: None

Type: Epic Priority: Major - P3
Reporter: Shane Harvey Assignee: Matt Dale
Resolution: Unresolved Votes: 1
Labels: invisiblesharding-fy24q2, invisiblesharding-m1
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Issue split
split to CDRIVER-4683 Allow MongoClient to automatically tr... Blocked
split to CSHARP-4717 Allow MongoClient to automatically tr... Blocked
split to CXX-2714 Allow MongoClient to automatically tr... Blocked
split to GODRIVER-2905 Allow MongoClient to automatically tr... Blocked
split to JAVA-5071 Allow MongoClient to automatically tr... Blocked
split to MOTOR-1153 Allow MongoClient to automatically tr... Blocked
split to NODE-5453 Allow MongoClient to automatically tr... Blocked
split to PHPLIB-1195 Allow MongoClient to automatically tr... Blocked
split to PYTHON-3836 Allow MongoClient to automatically tr... Blocked
split to RUBY-3297 Allow MongoClient to automatically tr... Blocked
split to RUST-1702 Allow MongoClient to automatically tr... Blocked
Related
related to PYTHON-2131 Driver connect replica set failed whe... Closed
related to DRIVERS-2740 Add support for polling SRV records f... Backlog
related to DRIVERS-2622 Add SDAM tests for standalone restart... Backlog
Driver Changes: Needed
Quarter: FY25Q2
Case:
Engineering Lead: Matt Dale Matt Dale
Product Manager: Rachelle Palmer Rachelle Palmer
Program Manager: Esha Bhargava Esha Bhargava
Driver Compliance:
Key Status/Resolution FixVersion
CDRIVER-4683 Blocked
CXX-2714 Blocked
CSHARP-4717 Blocked
GODRIVER-2905 Blocked
JAVA-5071 Blocked
NODE-5453 Blocked
MOTOR-1153 Blocked
PYTHON-3836 Blocked
PHPLIB-1195 Blocked
RUBY-3297 Blocked
RUST-1702 Blocked

 Description   

Summary

Definitions:

  • router endpoint - server endpoint that routes requests to shards (i.e. mongos)
  • replica set endpoint - server endpoint that emulates a replica set or standalone (i.e. mongod)

Starting in 8.0, sharded clusters will be the only supported topology in MongoDB. To allow customers currently using a MongoDB replica set to migrate to MongoDB 8.0 without needing to reconfigure their applications, the server team will build a "replica set endpoint" that makes a single-shard cluster appear to be a replica set (PM-2965). However, there are limitations to the replica set endpoint:

  • It can only be used with single-shard clusters, so customers have to switch to the router endpoint if they want to add more shards (e.g. to scale horizontally).
    • Note that the there have been discussions about supporting multi-shard clusters with the replica set endpoint, but it's currently not in the scope of PM-2965. In any case, horizontal scalability will likely still be limited for customers using the replica set endpoint.
  • It will likely be removed with a future version of MongoDB (9.0 or later), so probably can't be used indefinitely.
    • It's not decided when the replica set endpoint feature will be removed.
      As a result, customers may eventually be motivated or required to stop using the replica set endpoint.

To switch to the router endpoint, customers using existing drivers will have to update their connection string and restart their applications. We want to offer customers a better migration experience that doesn't require reconfiguring and restarting their application to switch to the router endpoint.

Updated from the original description:

It might be useful to allow a MongoClient to survive cluster topology changes from a replica set to a sharded cluster (or vice versa). For example, client is connected to mongoses A and B and an admin restarts A and B as a replica set, the client could rediscover A and B as replica set members.

Currently the SDAM spec does not allow this. When a MongoClient is connected to a replica set, it will remove servers that are discovered to be mongos nodes. When a MongoClient is connected to a sharded cluster (a set of mongos nodes), it will remove servers that are not mongos nodes.

I can imagine this addition to the SDAM spec would allow it:

If all nodes are removed from the Topology, clients MUST reset the TopologyType to Unknown and rediscover the original seed addresses.

Originally requested in: PYTHON-2131

Motivation

Who is the affected end user?

Customers who upgrade existing replica sets to MongoDB 8.0 or who chose use the new replica set endpoint for a new MongoDB 8.0 cluster (see PM-2965).

How does this affect the end user?

Customers may delay migrating off the replica set endpoint, reducing their motivation to horizontally scale their clusters or preventing them from migrating to a future MongoDB version that doesn't have the replica set endpoint feature.

How likely is it that this problem or use case will occur?

All customers using the MongoDB 8.0 replica set endpoint will eventually need to switch to the router endpoint.

If the problem does occur, what are the consequences and how severe are they?

Customers who want or need to switch to the router endpoint must change their connection string and restart their applications to re-initialize the MongoClient.

Is this issue urgent?

This feature will be useful when MongoDB 8.0 is released. It will become even more useful when a future MongoDB version is released that removes the replica set endpoint feature.

Is this ticket required by a downstream team?

No.

Is this ticket only for tests?

No.

Acceptance Criteria

To allow migrating from the replica set endpoint to the router endpoint, drivers must be able to:

  1. Discover the router endpoint(s) somehow.
  2. Give customers some way to configure whether or not they want their applications to automatically switch to the router endpoint(s), if available.
  3. Update the current topology from replica set or standalone to sharded cluster.

Open questions:

  1. Should we try to make this work for pre-8.0 MongoDB versions?
  2. Should we try to make this work for self-hosted databases or Atlas only?
  3. Are there other topology transitions that may become important in the future?
    1. Future customers may switch from sharded to load-balanced clusters. Should we try to support no-restart transitions from sharded to load-balanced? (suggested by tyler.brock@mongodb.com)


 Comments   
Comment by Garaudy Etienne [ 25/Jan/23 ]

rachelle.palmer@mongodb.com jeff.yemin@mongodb.com I don't think this is strictly 7.0 required. But definitely before 8.0 because the sooner this is out the higher the likelihood folks have a driver with this before switching to sharding in 8.0+ clusters. Can it be done immediately post-7.0?

Comment by Garaudy Etienne [ 12/Jan/23 ]

For INIT-319, we're going to make any customer who upgrades to 8.0 switch to sharding, so it will greatly increase the number of folks who run into this issue. But if we do this before then, we'll the increasing number of folks who voluntarily switch to sharding today. So it's both. 

Comment by Garaudy Etienne [ 12/Jan/23 ]

james.kovacs@mongodb.com The emulation layer is strictly for apps that need the exact same single-threaded op latency and can't create enough concurrency in their system. We expect less than 5% of users to use the emulation layer. 

jeff.yemin@mongodb.com this is still required for all customers upgrading to sharding, no? So the sooner the better. 

Comment by PM Bot [ 22/Feb/22 ]

If you are not logged in, you can view the tickets in this epic by following this link.

Comment by PM Bot [ 19/Jan/22 ]

If you are not logged in, you can view the tickets in this epic by following this link.

Comment by Bernie Hackett [ 09/Dec/21 ]

We can't poll srv for replica set changes, that would mean we have two sources of "truth", when in reality the replica set itself is the only source of truth. I suppose we could always poll srv waiting to see if a mongos ever magically appears in the result, but it would be better if we were given a signal about what we should do, perhaps through TXT records?

Comment by Alex Bevilacqua [ 09/Dec/21 ]

behackett I wonder if limiting the scope here to poll SRV records regardless of topology type would make more sense. Having an application that only specifies a seed list wouldn't really be able to connect to anything different, but an SRV records that refreshes with mongos hosts after an Atlas conversion could theoretically work here.

Comment by Bernie Hackett [ 09/Dec/21 ]

There are a few tricky things here. For transition from replica set to sharded cluster, the client is already connected to a valid replica set, that will presumably just become the first shard in the sharded cluster. How do we signal the driver that it should disconnect from the replica set it's already talking to, and won't be restarted, and instead connect to a list of mongos instances without a restart?

Comment by Oleg Pudeyev (Inactive) [ 19/Feb/20 ]

Currently according to the sdam spec the state of a topology having no servers is a terminal state. Ruby provides diagnostics (warn level logging) when this state is reached, to inform the user of possible misconfiguration of their deployment.

In the linked python ticket, the submitter requests the following:

> I want to convert the sharded cluster in our production environment to a replica set, but i don't want to change the connection uri in the application configuration.

Note that SPEC-1248 will make it possible for all drivers to automatically detect both replica sets and sharded clusters from the same URI, hence an application restart would be the only thing needed to use the new deployment (currently depending on the driver, application reconfiguration may also be required; per Shane's comment in the linked ticket, it seems that applications using the Python driver do indeed need to be reconfigured).

So, it seems that the change from sharded to replica set topology in a running application is not actually a customer request in this case.

Generated at Thu Feb 08 08:22:34 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.