[SERVER-38106] Migrating away from RocksDB to WiredTiger Created: 13/Nov/18  Updated: 15/Nov/18  Resolved: 14/Nov/18

Status: Closed
Project: Core Server
Component/s: Replication, WiredTiger
Affects Version/s: 3.4.13
Fix Version/s: None

Type: Question Priority: Major - P3
Reporter: Demian Barlaro Assignee: Danny Hatcher (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
Participants:

 Description   

Hi, we have been working for some time with RocksDB as our engine and we are now trying to migrate to WiredTiger. We have some pretty big databases around 4~12 TB of data and according to the process described in the docs, we added a new node with WiredTiger and tried letting it replicate from scratch. 

With the amount of data, the replication times are VERY long and quite a lot of times we were in the situation were the WiredTiger node decided to change the node it was replicating from, only to drop ALL data and start from scratch again. Only once we succeeded in completing the replication, but the node ended up way behind in comparison to the oplog. 

Again with this amount of data it becomes prohibitive to have a big enough oplog to hold weeks of transactions and the process is also very flimsy, single threaded, slow and prone to fail.

So my questions are the following;

1.- Is there a better way to proceed with this migration?

2.- Is there a way to speed up replication (i.e. multithreaded replication)?

3.- Is there a way to tell the new WiredTiger node to stop dropping all data in case of mishaps? 

We are working with 3 and 5 nodes replica sets of Percona MongoDB version 3.4.13 and trying to move to Open Source MongoDB 3.4.13 (with the idea of upgrading to 4.x once we are in WiredTiger and dropping RocksDB and Percona entirely).

 

Thanks!



 Comments   
Comment by Danny Hatcher (Inactive) [ 13/Nov/18 ]

Hello Demian,

Unfortunately, the answer to all of your questions is no within the constraints of the current replica set. We are currently attempting to improve the speed of the initial sync but such improvements would not be found until 4.2 at the earliest.

In 3.4 we introduced the concept of an oplog buffer during initial sync. As of that version, MongoDB will tail the oplog into a buffer while the initial sync is cloning the data. The downside is that the tail stops when the cloning has finished so you can still fall off the oplog while applying the buffered oplog. That may be what happened to you; if you still have the log file from that node I can take a look.

One possible alternative for you is to create a new replica set and seed that one with the mongorestore instead of adding nodes into the current replica set. This should ensure that no initial syncs wipe out the data on the new Primary. However, you would also need to ensure that you have all the updates from the original cluster covering this full period.

You may wish to contact our Consulting Services as they would able to help you step through the migration process in detail.

Thank you very much,

Danny

Comment by Demian Barlaro [ 13/Nov/18 ]

Just to add some information, we also tried doing a dump and restore procedure using mongodump/restore but the result was again that the new wiredTiger node would delete all the loaded data and start replicating from scratch from another node in the replica set.

Generated at Thu Feb 08 04:47:58 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.