[SERVER-163] GridFS clean method Interrupted GridFS inserts leave unlinked entries in the .chunks collection - Some way to clean those up would be helpful Created: 17/Jul/09  Updated: 15/Jan/10  Resolved: 28/Oct/09

Status: Closed
Project: Core Server
Component/s: Usability
Affects Version/s: None
Fix Version/s: None

Type: New Feature Priority: Minor - P4
Reporter: Christopher Assignee: Mathias Stearn
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Participants:

 Description   

If, while performing a GridFS insert, the connection between the server and the client fails, the database ends up with a bunch of unlinked entries in the .chunks collection with no linked entry in .files. A simple way to prune these entries would be helpful for housekeeping purposes. At the very least, an example of how to do it in the MongoDB Shell would be extremely useful.



 Comments   
Comment by Mathias Stearn [ 28/Oct/09 ]

There is no safe way to do this server-side. I'll add some docs once 1.1.3 is out but here is a good method to do this if you can guarantee that no file are currently being added:

var ids = db.fs.chunks.distinct("files_id");
var file_ids = db.fs.files.distinct("_id");
var bad_ids = (find ids not in file_ids);
db.fs.chunks.remove({'file_ids': {'$nin': bad_ids}})

If you can't find a time when no files are being updated then store ids each night and remove anything still there after 1 day.

Comment by Eliot Horowitz (Inactive) [ 14/Oct/09 ]

we should add "cleangridfs" command
would be efficient if we did a query on both sorted by _id on files, and files_id on chunks, and just looked for files_id that didn't appear in files

Generated at Thu Feb 08 02:53:13 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.