[SERVER-1559] Ability for a replica set node to only have a subset of the databases or collections? Created: 03/Aug/10  Updated: 02/May/23

Status: Backlog
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: None

Type: New Feature Priority: Major - P3
Reporter: William Shulman Assignee: Alan Zheng
Resolution: Unresolved Votes: 41
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by SERVER-1799 Allow multiple databases to be specif... Closed
is duplicated by SERVER-7320 allow for a blacklist for replication Closed
Related
related to SERVER-9780 Secondaries (priority=0) able to spec... Backlog
Participants:
Case:

 Description   

It would be very useful to activate replication of a server only on selected databases. I guess this would be a generalization of the --only option for mongod. Furthermore it would be great if the set of database names were configurable via local.sources (or equiv) vs. just via a command line option. This way changes can be made without restarting the database.



 Comments   
Comment by Vineel Yalamarthi [ 02/May/23 ]

We use MongoShake in production, as long as understand its configuration , it works well. Its a very active project.If some one doesn't have the luxury to upgrade to 6.0 MongoDB, then MongoShake is great project to get started on.

Comment by Alan Zheng [ 01/May/23 ]

MongoDB's Cluster-to-Cluster Sync is the official tool by MongoDB to move data between two MongoDB clusters. There is also the ability to filter and sync only a subset of data. The minimum version requirement for Cluster-to-Cluster Sync is MongoDB 6.0.

Please download the utility mongosync in our download center to get started.

MongoShake is created by an external third party and is not tested by MongoDB.

Comment by Vineel Yalamarthi [ 01/May/23 ]

This seems to have been resolved recently. MongoDB 6.0 comes up with something called MongoSync.  alan.zheng@mongodb.com 

https://www.mongodb.com/docs/cluster-to-cluster-sync/current/reference/collection-level-filtering/

 danny@invitenetworks.com  if that doesn't work for you because its Mongo 6.0 and still in preview state,  you can check MongoShake. Its battle tested and had at-least 15 releases so far. 

Comment by Danny Beutler [ 06/Sep/22 ]

Hello, I am checking in on this. I have a very specific use case where we are a multi-tenant SaSS provider. We provide an on-prem node for some users and we would like to limit the collections that are replicated to just those that apply to a specific tenant for security/privacy reasons. Alan Zheng, looks like you are the one that might be managing this now. I am happy to chat if getting a customer perspective would help with this feature. 

Comment by Yuriy Safris [ 31/Aug/20 ]

Hi 

Can you clarify what it means: "it can be used between any two clusters using SSL. Set up can be a bit tricky"

How to configure authentication for the destination, which is an on-premise MongoDB ReplicaSet?

 

net:   
  ssl:     
    mode: allowSSL  
    allowInvalidCertificates: true  
    allowInvalidHostnames: true  
    allowConnectionsWithoutCertificates: true     
    PEMKeyFile: /etc/ssl/mongodb.pem

Comment by Michael Smith [ 29/May/19 ]

We have some databases that are large, infrequently updated, maybe even infrequently used, and periodic snapshots within the region are sufficient. The regular replica set behaviour is more redundancy than we need for these databases, and slows down the resync process if we have to add or rebuild a replica.

We have other databases that are small, frequently updated or used, where we want to minimize the RPO and RTO, and the regular replica set behaviour works well.

And we have still other databases that are large and frequently updated, but might be experimental, and once in a while the oplog blows out while running a job with lots of inserts.

Having a mix of these databases in the same cluster would be useful for simplifying application configuration. A few ideas of how this might look:

  1. mark a database (on creation?) as not replicated, such that its operations are never replicated to secondaries;
  2. turn off replication for an existing database;
  3. remove a database from secondary nodes of the replica set without removing it from the primary;
  4. take a previously non-replicated database and replicate it (!!)

The most useful would be #1, the others are just decoration to avoid mongodump/restore.

Comment by Alyson Cabral (Inactive) [ 29/May/19 ]

We've invested time building some of this functionality in a MongoDB tool called mongomirror. This tool has solved some of our most pressing use cases around active migrations.

In fact, in the upcoming release of mongomirror, users will be able to sync a subset of dbs and collections between MongoDB clusters. While mongomirror is specifically built to support MongoDB Atlas, it can be used between any two clusters using SSL. Set up can be a bit tricky, but once auth is in place, you can simply spin up mongomirror to manage db- and collection-level mirroring between any two Mongo clusters.

Now, to bring this functionality to a wider audience and a wider range of use cases, there is significant work to be done around operationalization. seth.payne, the product manager for mongomirror, and I are actively drawing up a plan for how to get us there. This will take time, and thank you for your patience.

As we are in the requirements gathering phase for this area, can you expand some more on what you mean by partially replicating to a DR site? What is the workflow where this helps in a disaster? Why is it beneficial to only replicate a subset of the data? Do the clusters need to be entirely independent replica sets? If you're open to it, I'd love to schedule a call to talk through some of these questions.

Aly Cabral
Product Manager, Core Server

Comment by Abhilash Mannathanil [ 29/May/19 ]

Are there any updates on this feature request? This is one of the useful scenarios where we want to replicate only portion of the data to a disaster recovery site. In the absence of this, its forced to run separate instances where the locally significant data is written to one instance(s) and globally significant data is written to another instance, which is replicated to a disaster recovery site.

Comment by Joe Enzminger [ 10/Apr/19 ]

Would it be possible to get some commentary from Mongo on why this feature request has no action for nine years?  It may be that it isn't a common use case, but in our usage of Mongo since 2009 this comes up all the time when looking at reporting, event sourcing, and backup architectures.  MongoDB Ops Manager has had this capability for backups for quite some time.

 

 

Comment by Kevin Pulo [ 22/Apr/16 ]

To make this ticket easier to search for, this feature is also sometimes referred to as "Filtered replication".

Comment by Justin Smestad [ 15/Sep/10 ]

This is the same feature requested here: http://jira.mongodb.org/browse/SERVER-1799

Generated at Thu Feb 08 02:57:22 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.