[SERVER-59339] repodata in S3 bucket out of sync with repo.mongodb.org Created: 13/Aug/21  Updated: 27/Oct/23  Resolved: 27/Aug/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: INVADE International Ltd Assignee: Jonathan Streets (Inactive)
Resolution: Gone away Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

Hi.

We create a local copy of the CentOS repos for systems that have no internet access.

As there is no rsync access to the repos, we sync the data from the S3 bucket.

I've noticed that the repodata in AWS S3 is not having the old data removed. For example:
s3://repo.mongodb.org/yum/redhat/8/mongodb-org/5.0/x86_64/repodata/

contains numerous versions of the repodata, whereas:
https://repo.mongodb.org/yum/redhat/8/mongodb-org/5.0/x86_64/repodata/

only contains the latest version.

This causes a "dnf upgrade" to not identify the newer versions.

Can the S3 bucket be updated to mirror the repo?

Thanks in advance.



 Comments   
Comment by INVADE International Ltd [ 24/Aug/21 ]

Thanks for confirming.

 

Please close this issue.

Comment by Zakhar Kleyman [ 20/Aug/21 ]

third.line@invade.net

repomd.xml is what the package manager will load to get information about file lists and other indexes and there should only be 1 repomd.xml per yum/dnf repo. It's crucial to have it synced along with other files, so I'm glad you found a way to do that with "--exact-timestamps" parameter.

https://repo.mongodb.org/yum/redhat/8/mongodb-org/5.0/x86_64/repodata/ is actually hosted in s3://repo.mongodb.org/yum/redhat/8/mongodb-org/5.0/x86_64/repodata/. The list of files you see in the browser comes from a static index.html that only shows the latest files. But all files you see in that s3 bucket can also be accessed via https even if they are not listed on the index page.

Comment by Eric Sedor [ 16/Aug/21 ]

Thank you, I'm passing your report to an appropriate team for investigation.

Comment by INVADE International Ltd [ 13/Aug/21 ]

I've investigated this further.

The main issue appears to be that the updated repomd.xml is not being sync'd because it is the same size.

Based on:
https://awscli.amazonaws.com/v2/documentation/api/latest/reference/s3/sync.html

I have added the "--exact-timestamps" parameter to the sync command and now files that are the same size, but have a different timestamp, are being sync'd.

Not sure if the multiple repodata files should be in the S3 bucket so leaving this issue open.

If it's working as designed then this issue can be closed.

Generated at Thu Feb 08 05:46:59 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.