[SERVER-32132] db.fsyncLock() does not stop writes to FTDC / diagnostic.data files Created: 01/Dec/17  Updated: 27/Oct/23  Resolved: 02/Dec/17

Status: Closed
Project: Core Server
Component/s: Admin
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: NIKHIL Assignee: Mark Agarunov
Resolution: Works as Designed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Steps To Reproduce:

Sequence of steps to be followed to reproduce this is mentioned below. Execute these steps on a running mongodb server.
1. First use db.fsyncLock() to fsync the data and take a lock.
2. Then use python util Shutil.copytree() to copy the entire data directory to another location.
3. Then I unlock using db.fsyncUnlock().

This is not reproducible every time. This problem is coming intermittently (once in 20 times)

Participants:

 Description   

As documented here, I expect Mongodb's db.fsyncLock() to ensure the data files do not change for Mongodb instance using WiredTiger storage engine.

We are using mongodb version 3.4 and low level file backup strategy via file copy (Python Shutil.copytree).

The problem I am facing intermittently is that sometimes my backup fails with error such as
<mongo_data_dir>/diagnostic.data/metrics.interim is unavailable to be copied to the destination path.

Sequence of steps which I am following :
1. First I use the atomic operation db.fsyncLock() to fsync the data and take a lock.
2. Then I use a low level file backup strategy via files copy. Exactly I am using a python util Shutil.copytree() to copy the entire data directory to another location.
3. Then I unlock using db.fsyncUnlock().

The problem I am facing is in second step. This problem is coming intermittently (once in 20 times)



 Comments   
Comment by Ramon Fernandez Marina [ 02/Dec/17 ]

I've adjusted the summary to more accurately reflect the issue and also updated the resolution.

Comment by Ramon Fernandez Marina [ 02/Dec/17 ]

To add to Mark's answer, the diagnostic.data/metrics.interim file is where each node stores diagnostic data until it reaches a certain size, then it gets renamed to something like metrics.2016-11-01T06-54-27Z-00000. It is because of this rename that you may get errors around this file not being available.

As Mark said above, you can safely ignore the diagnostic.data directory altogether. If you also want to make a backup copy of the files there, then you can configure your backup code to ignore the metrics.interim file and the errors you describe should go away.

Regards,
Ramón.

Comment by Mark Agarunov [ 01/Dec/17 ]

Hello nikhil578,

Thank you for the report. The behavior you're seeing is due to the fact that fsyncLock doesn't affect the diagnostic data directory. However, you should be able to safely exclude this file from the backup, as it only contains diagnostic metrics and does not contain any user or database data.

Thanks,
Mark

Generated at Thu Feb 08 04:29:17 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.