[SERVER-23165] When using gridFS, expiration time (ttl) set by client application works for documents in metadata collection but does not work in chunks collection Created: 16/Mar/16  Updated: 29/Apr/16  Resolved: 29/Apr/16

Status: Closed
Project: Core Server
Component/s: GridFS
Affects Version/s: None
Fix Version/s: None

Type: Question Priority: Minor - P4
Reporter: Taelen Lewis Assignee: Stennie Steneker (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Participants:

 Description   

When using gridFS, expiration time (ttl) set by client application works for documents in metadata collection but does not work in chunks collection. This results in chunks not getting cleaned up from the mongodb. However when we do delete operation, both meta data and chunks gets deleted as expected. Can you please confirm if ttl-expiration time is supposed to work with chunks ? (apparently it does not in our testing).



 Comments   
Comment by Stennie Steneker (Inactive) [ 29/Apr/16 ]

Hi Taelen,

GridFS is a driver specification for storing and retrieving large files in MongoDB. While GridFS stores files in two collections (files metadata and binary chunks), the MongoDB server is generally unaware of the relationship between documents in these collections. TTL indexes only apply to a single collection and require a date field in order to find expired documents to remove. The current GridFS API only specifies an uploadDate field for the files collection; chunks do not have any date information.

To automate expiry of GridFS documents within the current design, I would suggest writing an application/script which searches for expired GridFS files and removes these via the GridFS API. An expiry script could be scheduled to run periodically via cron or similar, and will have an equivalent outcome to a TTL index. An index on files.uploadDate should be added to support finding expired documents.

Alternatively, you could copy the uploadDate field from the files collection to the associated chunks after uploading a new document. This would allow TTL indexes to be set on both the files and chunks collections, but adds extra overhead (two new TTL indexes on the collections and an extra field on every chunk document) as compared to using the GridFS API to remove. The TTL indexes would also be independent, so this approach may result in errors if GridFS files are read close to their expiry and some chunks have already been deleted.

Regards,
Stephen

Generated at Thu Feb 08 04:02:34 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.