Uploaded image for project: 'Drivers'
  1. Drivers
  2. DRIVERS-2848

Use insertMany to upload GridFS chunks for better performance

    • Type: Icon: Improvement Improvement
    • Resolution: Unresolved
    • Priority: Icon: Unknown Unknown
    • None
    • Component/s: GridFS, Performance
    • None
    • Needed
    • Hide

      Summary of necessary driver changes

      •  

      Commits for syncing spec/prose tests
      (and/or refer to an existing language POC if needed)

      •  

      Context for other referenced/linked tickets

      •  
      Show
      Summary of necessary driver changes   Commits for syncing spec/prose tests (and/or refer to an existing language POC if needed)   Context for other referenced/linked tickets  
    • $i18n.getText("admin.common.words.hide")
      Key Status/Resolution FixVersion
      PHPLIB-1376 Blocked
      $i18n.getText("admin.common.words.show")
      #scriptField, #scriptField *{ border: 1px solid black; } #scriptField{ border-collapse: collapse; } #scriptField td { text-align: center; /* Center-align text in table cells */ } #scriptField td.key { text-align: left; /* Left-align text in the Key column */ } #scriptField a { text-decoration: none; /* Remove underlines from links */ border: none; /* Remove border from links */ } /* Add green background color to cells with FixVersion */ #scriptField td.hasFixVersion { background-color: #00FF00; /* Green color code */ } /* Center-align the first row headers */ #scriptField th { text-align: center; } Key Status/Resolution FixVersion PHPLIB-1376 Blocked

      Summary

      GridFS chunks upload should use insertMany/bulkWrite for better performance. This is helpful for large files or small chunk sizes where many chunks will be inserted especially now that the default "majority" write concern an each write has higher latency.

      We made this change in PYTHON-4146 which increased gridfs upload speed in one Atlas benchmark from 46 MB/s to 200 MB/s.

      Motivation

      Most drivers use insertOne to upload each chunk as data is being written to the GridFS stream. This is a simple but results in poor performance when multiple chunks could have been batched in a single insertMany call.

      Who is the affected end user?

      GridFS users.

      How does this affect the end user?

      Slow upload performance.

      How likely is it that this problem or use case will occur?

      Likely, especially with the default majority write concern.

      Is this ticket required by a downstream team?

      No.

      Is this ticket only for tests?

      No.

      Acceptance Criteria

      Update the GridFS spec to require insertMany/bulkWrite to be used when uploading chunks. Include a test to ensure a driver batches writes. We also need to add a test to ensure the bug in CSHARP-4900 is not introduced.

      Note: PyMongo batches at 32MB or 100,000 chunks, since the objective is to fill up a single OP_MSG as much as possible. While the driver could theoretically batch up to 48MB (maxMessageSizeBytes) we decided to use a smaller limit to simplify the implementation. We could reevaluate this decision when implementing this ticket.

            Assignee:
            Unassigned Unassigned
            Reporter:
            shane.harvey@mongodb.com Shane Harvey
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: