[SERVER-24785] backup_restore.js fails when observing FTDC interim files Created: 24/Jun/16  Updated: 20/Feb/18  Resolved: 29/Jul/16

Status: Closed
Project: Core Server
Component/s: Diagnostics
Affects Version/s: None
Fix Version/s: 3.2.20, 3.3.11

Type: Bug Priority: Critical - P2
Reporter: Andrew Morrow (Inactive) Assignee: Mark Benvenuto
Resolution: Done Votes: 0
Labels: bkp
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v3.2
Sprint: Platforms 17 (07/15/16), Platforms 18 (08/05/16)
Participants:
Linked BF Score: 0

 Description   

The backup_restore.js tests blacklists certain lock files that should not be copied when doing a backup. That blacklist was not updated to deal with ephemeral files generated by the FTDC subsystem.

Either FTDC needs to change its approach to ephemeral files, or the blacklist upon which backup_restore makes its decision needs to be updated. If the latter approach is taken, our documentation for how to perform a live system backup must be updated.



 Comments   
Comment by Githook User [ 20/Feb/18 ]

Author:

{'email': 'mark.benvenuto@mongodb.com', 'name': 'Mark Benvenuto', 'username': 'markbenvenuto'}

Message: SERVER-24785 backup_restore.js fails when observing FTDC interim files

(cherry picked from commit c4fc9c165026a710809df7751f00c342b5eb27f6)
Branch: v3.2
https://github.com/mongodb/mongo/commit/6dcfbb12c21b395136468666bc69a60705c24c91

Comment by Githook User [ 29/Jul/16 ]

Author:

{u'username': u'markbenvenuto', u'name': u'Mark Benvenuto', u'email': u'mark.benvenuto@mongodb.com'}

Message: SERVER-24785 backup_restore.js fails when observing FTDC interim files
Branch: master
https://github.com/mongodb/mongo/commit/c4fc9c165026a710809df7751f00c342b5eb27f6

Comment by Andrew Morrow (Inactive) [ 20/Jul/16 ]

mark.benvenuto and bruce.lucas - Do you have any preference among the above options? The latter two, at least to me, sound like papering over the problem, and would require that our users attempting to follow our backup procedures effectively do the same thing. On the other hand, I'm not clear on what would happen with FTDC if we disallowed it to write/rotate during the entire time we were in fsyncLock, nor is it clear to me how difficult this would be to implement. Would this effectively mean that we need to add suspend/resume functionality to FTDC? How hard would that be?

Comment by Mark Benvenuto [ 18/Jul/16 ]

We have a few choices:

  1. Make FTDC obey fsyncLock in this case by either not writing anything or at least not doing file rotation. The backup & restore process is done by getting an fsyncLock, and then copying the files.
  2. Filter out the entire diagnostic.data directory out in shell_utils_launcher.cpp::copyDir.
  3. Filter out just metrics.iterim.temp out in shell_utils_launcher.cpp::copyDir.
Generated at Thu Feb 08 04:07:25 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.