Improve GridFS.remove(query) method

XMLWordPrintableJSON

    • Type: Improvement
    • Resolution: Won't Fix
    • Priority: Major - P3
    • None
    • Affects Version/s: 2.12.0
    • Component/s: GridFS
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      remove(query) on GridFS is currently performed by :

      • first issuing a select on files bucket
      • then for each file remove it and remove the associated chunks by files_id

      First I think this kind of linked removal, should be fully handled by the server and not by the client, as this is a server feature.
      Moreover the current implementation can result in thousands of different requests, and doesn't insure consistency anyway.

      Thus the method efficiency could be improved by performing at most two requests, one on "files" collection using the query and the other on "chunks" collection using a in clause on files_id previously selected.

      Tests made on a "50K files" bucket have showed that the remove time for 31K files was dropping from 385000ms to 825ms only (465x improvement)

      PR #171 has been issued on github : https://github.com/mongodb/mongo-java-driver/pull/171

            Assignee:
            Unassigned
            Reporter:
            PETIT Yann
            None
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: