[SERVER-19043] Implement a connectDatabase / disconnectDatabase functionality. Created: 18/Jun/15  Updated: 21/Aug/20  Resolved: 21/Aug/20

Status: Closed
Project: Core Server
Component/s: Admin, WiredTiger
Affects Version/s: None
Fix Version/s: None

Type: New Feature Priority: Major - P3
Reporter: Paul Reed Assignee: Brian Lane
Resolution: Done Votes: 14
Labels: asya
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Windows for me, but probably the other places.


Issue Links:
Depends
Related
is related to SERVER-23573 Backup/Restore of individual database... Closed
is related to SERVER-29557 Allow healthy databases to skip repairs Closed
is related to SERVER-30452 How to restore the mongodb from mongo... Closed
Participants:
Case:

 Description   

With the old engine, you could shift folders/files in and out and they would become immediately available for access from clients.
This was a great way to manage large data importing / index rebuilding. You could take a single server offline - build new data/indexes and then systematically copy this structure around ones replicaset to get them back into sync - oplog - didnt come into it.
With WiredTiger engine, this will not work, as if you create a new database/folder or increase the collection count within that database - it is ignored.
I believe because the underlying structure is not generated to support the new files.

Is it possible to generate this missing structure simply by issueing a command which would then parse the file structure to rebuild its internal mappings.

Hope that explains the requirement. Sorry if this request is already somewhere else - its a little fiddly to find through Jira.



 Comments   
Comment by Brian Lane [ 21/Aug/20 ]

Hi paul.reed@gprxdata.com,

As we discussed, I will go ahead and close this issue as gone away. As you continue to experiment with mtransfer, feel free to reach out to me directly if you have questions or encounter any problems.

Comment by Brian Lane [ 18/Feb/20 ]

We have recently released an updated version of mtools, which includes a new tool called mtransfer that should help address some of the requirements raised in this issue.

We have currently flagged it as experimental as we gather feedback. Feel free to reach out to me directly via brian.lane@mongodb.com if you encounter any issues getting mtools setup in your environment or have any feedback regarding mtransfer.

Comment by Brian Lane [ 08/Oct/19 ]

Hi paul.reed@gprxdata.com,

Unfortunately not. I will reach out to you via email to set up a meeting to discuss.

I want to get you to a later version w/ WiredTiger, but don't want this issue to be the blocker.

-Brian

Comment by Paul Reed [ 02/Oct/19 ]

Status is still Open on this one. Is there any movement forward with a solution which would work, and thus allow me to move to the latest version.

Any update ?

Comment by Matt Hughes [ 11/May/19 ]

On WT here with version 3.6.12 and would very much like to see this functionality, in 3.6 if possible as we are not ready to move to 4.x. I have terabytes of data and cannot dump to the destination and import due to storage restrictions, was very surprised to see this is not possible. This would be invaluable in recovery situations.

Comment by Paul Reed [ 23/Apr/19 ]

Currently we are sitting at : MongoDB server version: 3.6.2 on the MMAPv1.

This is the only issue at the moment preventing us moving to WT. The worry of corruption requiring a full architecture rebuild as noticed in some posts ( maybe in the past and not current ) is also a worry, however, the mechanics with directory backups somewhat negate that risk as well as allow us greater data update management.

 

 

Comment by Brian Lane [ 23/Apr/19 ]

Hi paul.reed, I can't make a guarantee at this time regarding implementation. However, we have made some changes in the 4.2 release with backup cursors that could make the work required here potentially not as bad.

What version of the server are you currently using? I assume you are still with MMAPv1 as well? Is this the main issue preventing you from migrating to a newer version w/ WiredTiger?

-Brian

Comment by Paul Reed [ 15/Apr/19 ]

Or at the least allow for the metadata of a database to reside in the db folder (therefore allowing folder copying)

Comment by Paul Reed [ 15/Apr/19 ]

Just seen you've changed status. Please please please implement this.

Comment by Paul Reed [ 04/Jul/18 ]

Please can you revisit this item now that MMAPv is deprecated.

Comment by Chris Kuethe [ 15/Jun/17 ]

This would also be helpful for emergency recovery situations: copy in the database directory to the root of a new dbpath, start up a new server, and attach and repair it.

Comment by Konstantin Manchev [ 26/Nov/16 ]

I do not understand how such critical issue has only 9 votes up to now with mine from today ... , please, consider this seems to be a serious limitation in flexibility in MongoDB/WiredTiger ...

Comment by Yuri Finkelstein [ 05/May/16 ]

I can't agree more with Paul. No matter how great are WT concurrency improvements are, the fact is that when one was able to build an image of a database offline on another machine, upload the entire directory to every member of the replica set, and just "activate" it by telling application to start writes to this database it was 100 times more efficient and predictable compared to mongorestore. For one, the secondaries don't need to replicate the new DB which is the case with mongorestore! The user can throttle the process of file copying to RS members and achieve smooth and predictable outcome. I just don't understand why mongodb is not seeing the merits of this process and instead is trying to convince customers than WT is better than mmapv1. This issue is not at all about what engine is better.

Comment by Daniel Pasette (Inactive) [ 06/Jan/16 ]

Hi Paul,
So, the concurrency improvements with WT may make your old approach unnecessary, though overall data throughput could cause service interruptions as well simply by writing a lot of data into your production environment. I can't say from here.

A couple questions:

  1. What version did you see db-level locking with large aggregations? Collection-level locking was introduced in v3.0.
  2. Are you using $out with agg? In order to create the output collection, it will take a db-level lock, but only for a very short period of time.

Also, there is the possibility that you could continue doing your processing on the side and simply use mongorestore to import the data into your prod servers. There are a few knobs you can use to control the level of concurrency, and thus load, with this approach.

Comment by Paul Reed [ 05/Jan/16 ]

Thanks for replying.

I am inferring the issue from experiences with users being locked out of access through MMAPv1.

I need to rebuild sandbox areas from datasnapshots 2 or 3 times a month from subsets of imported data. I have found that aggregation on large datasets will cause locking at db level, which causes our users to block on reads. Also, the large rebuild, which by necessity drops and recreates grouped up data, including lengthy index builds, would cause areas of data to go missing from sight. I suppose building into a temporary database and then swapping users to it in a versioned sort of way might be the work around, which I have not tried.
I take it that renaming databases is still a painful process regardless of engine - so maybe this is not even a goer.

The sheer convenience of backup and restore processes based on directory level copy was a massive factor in our using Mongo as our data store.
I guess I will just need to continue with MM for the foreseeable future. I trust you are not planning on decomissioning that engine ?

Paul

Comment by Daniel Pasette (Inactive) [ 05/Jan/16 ]

Hi Paul, I understand your use case but this is not a small change to WiredTiger and the MongoDB integration layer and we have no immediate plans to add this feature to MongoDB with WiredTiger. Have you tested the impact of the data import on your system with WiredTiger, or are you inferring the issue from experience with MMAPv1?

Thanks,
Dan

Comment by Paul Reed [ 05/Jan/16 ]

Any further thoughts on this: really would love for the meta to be stored in the database directory when directory level databases are enabled.

Comment by Paul Reed [ 03/Dec/15 ]

I wonder if there is any further thoughts on adding this re-sync meta process to allow for folder migration in and out of the WiredTiger engine. I am desperate to get to the benefits of being able to copy collection level files around replicasets (at the moment I copy the entire database with the MM engine, which is ok - but could be optimised better at collection level with WT) and also to get the speed benefits from WT.
Updating the entire replicaset with my group/index rebuilds really is not an option for me as it takes the system offline effectively for over 12 hours.

Comment by Ramon Fernandez Marina [ 05/Aug/15 ]

There is currently no process to manually move files around or recreate the necessary metadata when using the WiredTiger engine. This ticket remains open to consider this functionality in future releases.

Cheers,
Ramón.

Comment by Paul Reed [ 04/Aug/15 ]

So is there a process currently, or could there be a process to force the rebuild of this meta. Or possibly could the meta be stored in the directory - for directory based databases - and therefore folder copy would be doable again ?
This is a massive negative for me to not be able to manage with folders, i.e. to allow fallback to old snapshot in the event of corruption - or implement offline data import routines.

Comment by Michael Cahill (Inactive) [ 31/Jul/15 ]

This procedure won't work with WiredTiger regardless of sizeStorer: "copying over the database folder" will just lead to corruption because the WiredTiger metadata won't match the files.

Comment by Ramon Fernandez Marina [ 30/Jul/15 ]

paul.reed, my guess is that the moment your primary takes a single write operation during this process the sizeStorer.wt file may become out of sync (I'm also guessing this may happen after a checkpoint too), which would negate the benefits of the procedure.

I'll let others with more WiredTiger knowledge comment, but in the meantime if you're to try this procedure I'd strongly recommend you do it in a test setup.

Comment by Paul Reed [ 30/Jul/15 ]

If I did this, would it work around the metadata missing problem:

1) Against RS: create new database ( create all new possible collections with a {} insert )
2) Wait for sync
3) Take Secondary offline
4) Open standalone instance - perform massive import, drop collections, recreate collections, generate indexes.
5) Cycle RS servers, copying over the database folder with the one from the standalone.
6) Bring RS back up.

Comment by Paul Reed [ 30/Jul/15 ]

What information is stored in the metadata ?
If it does contain machine specific information, then possibly the solution/requirement would be to store this information also within the folder itself and then on startup to scan folders for this metadata and appropriate this back into dbpath ?

Comment by Ramon Fernandez Marina [ 19/Jun/15 ]

Thanks for the additional information paul.reed. Apologies if I wasn't clear: we do not consider this a WiredTiger issue because is not about using WiredTiger stand-alone, but as a storage engine inside MongoDB. If you are using WiredTiger without MongoDB then we should move this ticket back to the WT project, but as long as is related to MongoDB it should remain in SERVER with the rest of the MongoDB issues. Hope that clarifies things.

WiredTiger keeps additional metadata in files inside dbpath, so as you've found collection files are not detected when dropped in. I'm tagging this ticket as "Needs Triage" for further consideration.

Thanks,
Ramón.

Comment by Paul Reed [ 19/Jun/15 ]

Well:
2 times a month I run a dataimport which takes 7 hours to complete. It would have a detrimental effect on performance if it was writing to primary.
So I take a secondary offline, run the import into a new collection, copy the collection to all remaining servers in preparation for "move"ing the folder into the data path, this move then takes the rolling servers offline for the minimum amount of time. There is little oplog involvement.
Seven hours is a long time for an intense data import, and some core static data is changed over that time. So - just in case, on the building we snapshot data out by zipping away copies of folders. Its quicker to get back up from copies rather than rebuilding.
I believe various FAQ/tips in the mongo docs advise this approach.
I fail to see really how this is not a WiredT issue.

Comment by Ramon Fernandez Marina [ 18/Jun/15 ]

paul.reed, I've moved this ticket to the SERVER project, as my understanding of your description is that this is really a MongoDB issue.

Can you please elaborate on the "shift folders/files" part of your description above and provide a recipe/scenario where this was handy in the past? That should help us understand your suggestion better.

Thanks,
Ramón.

PS: yes, some times is hard to find things in JIRA

Comment by Paul Reed [ 18/Jun/15 ]

Also - folder copying very easily created a fallback safety net - for sensitive operations. This would be even better with the WT way of naming files based around the collection - rather than the database.

Generated at Thu Feb 08 03:49:37 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.