[SERVER-18652] Huge data loss after altering files in dbpath on a running instance Created: 26/May/15 Updated: 19/Sep/15 Resolved: 29/May/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Storage |
| Affects Version/s: | 3.0.2 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Bastien Diederichs | Assignee: | Ramon Fernandez Marina |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Operating System: | ALL |
| Steps To Reproduce: | In our case :
|
| Participants: |
| Description |
|
Hello, We are running a single dev server mongodb (almost every parameter is default, except directoryPerDB), no replication, just for testing (we'll definitely put replication, even on dev, after this problem...). This morning, we had a crash on our MongoDB dev server. The crash was provoked by us moving data files from a certain DB that was not used by MongoDB at this moment. We started mongodb again, and here's the log :
First lines, you see the last updates before the crash occured.
Server restarts after unclean shutdown, so the journal recovers. Except that when we connect to our mongo client, the data in two of our databases are back to their state of 3 days ago (before all of our processing over the week-end). The two databases in question are the ones that were mostly written to over the week-end. Obviously, we tried repairDatabase and validate. Both ran ok, but the data was still the same (e.g. old data). Of course, replication may have prevented that, but it seems rather alarming that, even with journaling enabled, we've lost this much data. Has anyone encountered something similar ? PS : You can see in the screenshot attached that the storage size increased over the week-end before completely dropping when mongod crashed. PS 2 : Note that this is the second time it occurs in a week, but the first time was even worse... we got back data a few months old instead. |
| Comments |
| Comment by Ramon Fernandez Marina [ 29/May/15 ] | |||||||
|
b.dieder@prismamedia.com, thanks for the additional information. Altering the contents of dbpath in any way while mongod is running leads to undefined, potentially dangerous behavior. In the future please always stop mongod before making changes to dbpath; you may want to consider using a replica set to be able to perform operations like these in a rolling fashion without losing service. Regards, | |||||||
| Comment by Bastien Diederichs [ 26/May/15 ] | |||||||
|
Hello Kaloian, I think I need to explain what I was doing with the data that I was copying. We have a configuration with a SSD and a HDD.
What happened is that the SSD was nearly full at the end of our treatment. As one of the databases (the database called "ssd_buffer" in the logs) on the SSD was not used anymore, I wanted to put it back on the HDD.
As the "ssd_buffer" database was not used anymore, I thought that Mongodb wouldn't see the movement. As I agree I should not have done that without switching off mongodb, it should not have lost that much data. I hope that I was clear enough on what I did and why? Anyway, thanks a lot for the anwser. | |||||||
| Comment by Kaloian Manassiev [ 26/May/15 ] | |||||||
|
Hi Bastien and Patrick, First I would like to point out that writing to the directory of an active running MongoDB instance is neither recommended nor supported. MongoDB treats the whole instance plus the journal as a single unit of consistency and copying data files from another instance creates a mismatch between data an journal. That being said, is it possible that when you are copying files from the remote MongoDB instance you are accidentally overwriting the data files of the database which appears to be losing data? These log lines indicate that MongoDB thinks there is no need to replay the journal, because the data is assumed to have already been flushed to disk:
Can you please make sure that the files you are copying do not have the same name as the files of a database already running on the active instance. -Kal. | |||||||
| Comment by Patrick Guiran (+33(0)1.73.05.46.23) [ 26/May/15 ] | |||||||
|
Hello, Note that this is the second issue of data lost in two weeks. Patrick |