Uploaded image for project: 'C Driver'
  1. C Driver
  2. CDRIVER-5506

MongoDB C Driver mongoc_stream_read Hangs Due to Partial Data in GridFS Chunk

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 1.26.2
    • Affects Version/s: 1.19.1, 1.26.1
    • Component/s: GridFS
    • Labels:
    • C Drivers

      Title:
      MongoDB C Driver mongoc_stream_read Hangs Due to Partial Data in GridFS Chunk

      Summary:
      In certain conditions where a GridFS file chunk within a MongoDB database is partially missing data, the C Driver API function mongoc_stream_read enters an endless loop when attempting to read the affected chunk, causing the application to hang indefinitely.

      Description:

      Issue Overview:
      During our QA testing environment setup, we encountered an issue with GridFS file chunks where a chunk was partially missing data. This anomaly led to the mongoc_stream_read function entering an infinite loop during an attempt to read the affected chunk in full. The details of the scenario are as follows:

      Chunk Size Configuration: The chunk size was set to 200,000 bytes.

      File Size: A file with a size of 1,005,000 bytes (equivalent to five chunks of 200,000 bytes and an additional 5,000 bytes) was saved in GridFS.

      Damaged Chunk Data: Due to an unspecified reason (which we suspect may be related to memory or network resource constraints during data writing to MongoDB), the second chunk in the file was incomplete, containing only 100,000 bytes of data.

      Read Operation: The mongoc_stream_read function was used to read the entire second chunk from MongoDB. The read operation was performed in a loop, retrieving 4,096 bytes of data at a time until the expected 200,000 bytes were read. However, after reading 98,304 bytes, the function failed to return on subsequent attempts to read the next 4,096 bytes.

      The function prototype is as follows:

      mongoc_stream_read(mongoc_stream_t *stream, void *buf, size_t count, size_t min_bytes, int32_t timeout_msec)
      where count = 4096, min_bytes = 4096, and timeout_msec = 0 (indicating no timeout).

      Infinite Loop Location: Upon debugging, it was found that the code became stuck in an infinite loop within the mongoc_gridfs_file_readv function. Specifically, the loop failed to exit under the condition where the end of the chunk was reached but not all expected data was present.

      The problematic loop is within:

      mongoc_gridfs_file_readv(mongoc_gridfs_file_t *file, mongoc_iovec_t *iov, size_t iovcnt, size_t min_bytes, uint32_t timeout_msec)
      Steps to Reproduce:

      Save a large txt file in GridFS, ensuring it spans multiple chunks.
      Manually remove a portion of the data from one of the full chunks.
      Attempt to read the whole chunk using mongoc_stream_read.
      Expected Result:
      The function should return an error (-1) promptly when encountering a chunk with missing data, preventing the application from hanging.

      Actual Result:
      The function fails to return, leading to an endless wait state when it encounters the damaged chunk.

      Environment:
      The issue was observed using the MongoDB C Driver version 1.19.1 on Windows 11.

      Additional Information:

      Attempting to set min_bytes = 0 as a workaround led to another issue where mongoc_gridfs_file_readv could not properly handle reading data spanning multiple chunks.

            Assignee:
            kevin.albertson@mongodb.com Kevin Albertson
            Reporter:
            john.chen@netbraintech.com john chen
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: