[SERVER-43049] Test failure file archiving can miss files. Created: 27/Aug/19 Updated: 06/Dec/22 Resolved: 09/Jan/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Testing Infrastructure |
| Affects Version/s: | 4.2.0, 4.3.1 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Daniel Gottlieb (Inactive) | Assignee: | Backlog - Server Tooling and Methods (STM) (Inactive) |
| Resolution: | Done | Votes: | 1 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Assigned Teams: |
Server Tooling & Methods
|
||||||||||||
| Operating System: | ALL | ||||||||||||
| Backport Requested: |
v4.2
|
||||||||||||
| Participants: | |||||||||||||
| Linked BF Score: | 15 | ||||||||||||
| Description |
|
I suspect the failure to add a file to the archive (because it was deleted after showing up in the list of files, presumably benign) aborts the remainder of the process.
|
| Comments |
| Comment by Daniel Gottlieb (Inactive) [ 09/Sep/19 ] |
|
FWIW, I tried a hacky better-effort solution. The (unverified) premise for an improvement is that TarFile.add will first search the directory for files to add. Then once the files are gathered, it will add them one by one to the archive. The assumption is that adding each individual file here requires actually copying the contents of the file into the archive. Thus the window for the last file in the archive disappearing before being copied (and failing the archive) is much larger than the previous files. My attempt was to shrink the window for failure. I kept the archival code that adds one directory and has TarFile.add recursively discover individual files to add to the archive. Instead, I tried using the exclude predicate. That function takes a filename string as input and the method would "succeed" if the file could be successfully opened, letting that file be archived. If the predicate failed to open the file, it returns a value telling TarFile.add to skip it. However the exclude predicate was removed somewhere in python3 and fully replaced with the filter predicate. This predicate takes a TarInfo as input. Unfortunately, the tarring library first opens the file in order to create the TarInfo object. |
| Comment by Max Hirschhorn [ 27/Aug/19 ] |
The try-except block within the for-loop is misleading because there's actually only one entry in the list. We call TarFile.add() with the name of the /data/db/jobN/ directory and rely on the tarfile module to do the recursive traversal through the directory for us. If an exception occurs due to a file being removed either during this traversal or when attempting to add it to the tarball, then we'll stop adding more files to the tarball. This issue should be addressed as a result of the proposal for |
| Comment by Daniel Gottlieb (Inactive) [ 27/Aug/19 ] |
|
It turns out we always continue on error (which I believe is correct), so I really don't know what happened here. Reassigning to TIG. |