[JAVA-1125] Improve GridFS.remove(query) method Created: 27/Feb/14  Updated: 05/Dec/17  Resolved: 19/Nov/15

Status: Closed
Project: Java Driver
Component/s: GridFS
Affects Version/s: 2.12.0
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: PETIT Yann Assignee: Unassigned
Resolution: Won't Fix Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

remove(query) on GridFS is currently performed by :

  • first issuing a select on files bucket
  • then for each file remove it and remove the associated chunks by files_id

First I think this kind of linked removal, should be fully handled by the server and not by the client, as this is a server feature.
Moreover the current implementation can result in thousands of different requests, and doesn't insure consistency anyway.

Thus the method efficiency could be improved by performing at most two requests, one on "files" collection using the query and the other on "chunks" collection using a in clause on files_id previously selected.

Tests made on a "50K files" bucket have showed that the remove time for 31K files was dropping from 385000ms to 825ms only (465x improvement)

PR #171 has been issued on github : https://github.com/mongodb/mongo-java-driver/pull/171



 Comments   
Comment by Ross Lawley [ 19/Nov/15 ]

As this ticket interacts with the legacy and effectively deprecated GridFS implementation I'm closing it as "Won't Fix".

There is a new GridFS spec which specifically doesn't have a remove(query) method, due to complexities that can happen if there were to be an error when deleting the data.

finalspy if you feel this is a mistake and it should be included for all drivers - then please can I ask you to open a Drivers ticket? And we can continue the conversation there.

Comment by PETIT Yann [ 02/May/14 ]

A new PR has been issued against branch 3.0.X this time.
It improves the previous solution by keeping the default behavior but allowing to pass a new parameter to force "bulk" remove.

I also added 2 tests which generate each 100 gridfs files with several chunks and remove them all using one time the legacy behavior and the other the "bulk remove". On my computer (single mongod 2.6 no shard, no replica) it results in a 2x to 4x improvement on 100 files. The largest the number of removed files is, the more efficient this new way to remove files/chunks is.

[Fixed by https://github.com/mongodb/mongo-java-driver/pull/192 ]

Generated at Thu Feb 08 08:53:51 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.