[DOCS-15702] [C2C] Investigate changes in REP-160: Support for Static Unlike Topologies Created: 25/Oct/22  Updated: 13/Nov/23  Resolved: 09/Jan/23

Status: Closed
Project: Documentation
Component/s: C2C, Server
Affects Version/s: 1.1.0
Fix Version/s: 1.1.0, Server_Docs_20231030, Server_Docs_20231106, Server_Docs_20231105, Server_Docs_20231113

Type: Task Priority: Major - P3
Reporter: Backlog - Core Eng Program Management Team Assignee: Kenneth Dyer
Resolution: Fixed Votes: 0
Labels: .local, nyc.local23
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Documented
Related
is related to DOCS-15881 Some missing info re: M -> N shard mi... Closed
Participants:
Days since reply: 1 year, 11 weeks, 1 day ago
Epic Link: DOCSP-26837

 Description   
Original Downstream Change Summary

We will need to update documentation to cover the new migration scenarios that are enabled

Description of Linked Ticket

Epic Summary

Summary

Support C2C sync between different cluster topologies on source and destination clusters

Motivation

Currently, mongosync only supports replica set to replica set or N to N shards replication. This project allows for more flexibility in topologies and assists customers who want to scale in/out via C2C sync. For example, an on-prem customer managing a sharded cluster with a small number of shards may want to move to Atlas and scale out to a bigger sharded cluster at the same time. This project will enable use cases like that.

Cast of Characters

  • Product Owner:
  • Project Lead:
  • Program Manager:

Documentation

Product Description
Scope Document
Technical Design Document



 Comments   
Comment by Frederic Vitzikam [ 22/Nov/22 ]

We added this section relatively late so the first section of the document was not updated to detail those changes, only reference back to that later section (I did fix the format example in the first section and re-read the text of the section for things that were no longer correct). Implementation ticket are https://jira.mongodb.org/browse/REP-1725 and https://jira.mongodb.org/browse/REP-1726.

Comment by Frederic Vitzikam [ 09/Nov/22 ]

There were also some discussions in recent weeks about disabling the balancer. I updated the technical document to say that Mongosync does not touch the balancer so users are allowed to disable it if they want before the migration and can do so if they are worried about it impacting performances of the sync (even if we don't expect any issue at the moment, we also created a follow-up outside the project REP-1762).

Comment by Frederic Vitzikam [ 03/Nov/22 ]

I saved the Slack conversation on [] vs {} for "key" in "sharding": [ {... "shardCollection": { "key": [] } } ] here.

Comment by Frederic Vitzikam [ 02/Nov/22 ]

Edit (Dec 06, 2022): All the comments I made on this ticket are older than the document (Mongosync Downstream Changes) we are asked to fill now.

 

Note: there were some changes on "sharding" after this comment, see the following comments on the ticket.

I only realized now that this ticket existed already, I was about to create documentation requests for REP-160.

Since I already spent time gathering what I thought was worth cutting tickets for, I put what I had written below instead:

As noted above we are adding support for unlike topologies (sharded cluster to replica set is out of scope for REP-160). We are also adding automatic pre-splitting of the destination cluster as part of this (there is a dedicated section in the technical design document linked above).

  • In Replica Set to Sharded Cluster mode, it is mandatory to add "sharding": [...] to /start call.
    The meaning of that parameter is to describe how Mongosync should shard (on destination) the collections it is replicating from the Replica Set source cluster.

    	"sharding": [ 
    		  {database: <dbName>, collection: <collName>, shardCollection: shardCollection1}, 
    		  {database: <dbName>, collection: <collName>, shardCollection: shardCollection2}, 
    		]
     
    	e.g.
    	"sharding": [ { "database": "foo",  "collection": "bar",  "shardCollection": { "key": [{"a": 1}, { "b": 1}  ] } } ]
    	in `curl -X POST http://localhost:27182/api/v1/start -d '{"source": "cluster0", "destination": "cluster1", "sharding": [...]}'`
    	


    where key is similar to shardCollection:

    • The "sharding" array itself can be empty. Any existing collection not listed will be replicated unsharded.
    • If it is not empty, each of its element requires the keys "database" (with a string value), "collection" (with a string value) and "shardCollection", no other key is supported.
    • "shardCollection" has a single key "key" that is required, no other keys are allowed.
    • "key" is an array of single entry dictionaries (unlike sh.shardCollection which takes a dictionary but enforces order on the keys) to make clear that the order of the entries matter. It has the same meaning as in sh.shardCollection.
    • The single value in each inner dictionary can be either 1 (for ranged sharding) or "hashed", with at most one hashed value in total in the below "key".
    • Unlike sh.shardCollection, we do no create the supporting index automatically on the destination from the above:
      • each shard key the user provides in "sharding" has to be supported by an index on the source cluster as we will not create any new index on destination, only replicate source ones.
      • For the same reason, one of the indexes that supports the shard key pattern requires the simple collation if the collection is using a different collation.
      • Index uniqueness is to be specified when creating the source index if they want a unique index to support the shard key they are requested on destination. i.e unlike sh.shardCollection the above does not take an "options" key along the "key" key.
      • unique source indexes that are incompatible with the requested shard key on destination will cause Mongosync to fail.
        For example, if there is a unique index on the unsharded source collection which doesn’t contain the requested shard key as a prefix.
        The definition of a compatible index is here in the technical design.
      • Essentially, those constraints are the same that a call to shardCollection on the source (assuming it was a sharded cluster and the call was with ImplicitlyCreateIndex disabled and without specifying unique: true) would have.
        The reason is that it is precisely what Mongosync does on the destination internally: it reads the passed "shardCollection" and call the mongos shardCollection API with the corresponding shard key, ImplicitlyCreateIndex: false and unique: false.
    • In term of user response to the failures
      • if their call to /start is rejected, they should fix the "sharding" setting and call /start again.
      • if their call to /start is rejected and Mongosync chooses to terminate on the call, they should restart the Mongosync executable do the above.
      • if their call to /start returns "success": true but later Mongosync errors out during the replication, they need to fix shardCollection, add required indexes, remove incompatible indexes and restart cluster to cluster sync from scratch (i.e. drop all data on the destination cluster).
        This happens when they did not create one of the source index needed, or have an incompatible index on source, as mentioned above.
        Where exactly Mongosync will detect what issue and crash is here in the tech doc.
  • Renaming a collection whose name is in the "sharding" list given to /start to a different name is undefined behavior until the /progress API returns estimatedCopiedBytes > 0 (indicating collection copy actually started).
    • In particular it is not enough to just check the destination for that collection having been created by Mongosync already.
    • Similarly it is undefined behavior to rename a collection whose name is not in ths "sharding" list to a name that is into the list until the same condition on /progress above.
  • reversible in /start is currently not supported when running with unlike topologies (i.e. RS->SC or M-SC->N-SC with M != N), it is allowed only in RS->RS to RS->SC with same number of shards on source and destination.

    	curl -X POST http://localhost:27182/api/v1/start -d '{"reversible": true, "enableUserWriteBlocking": true, "source": "cluster0", "destination": "cluster1", "sharding": [ { "database": "foo", "collection": "bar", "shardCollection": { "key": [{"a": 1}, {"b": 1}] } } ] }'
    	


    will error out with "Reversible can only be enabled if your source and destination clusters have the same number of shards"
    The exact error type is going to be defined in REP-1691 (as current code has a slight issue with the way it handles that error).

Generated at Thu Feb 08 08:13:36 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.