[DRIVERS-559] GridFS with multi-document transaction support does not work Created: 09/Aug/18  Updated: 21/Feb/23

Status: Backlog
Project: Drivers
Component/s: GridFS
Fix Version/s: None

Type: Spec Change Priority: Major - P3
Reporter: Wan Bachtiar Assignee: Unassigned
Resolution: Unresolved Votes: 1
Labels: bernie+, gridfsv2
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on DRIVERS-2062 Modify GridFS spec to support session... Backlog
Related
related to JAVA-4890 Multidocument transaction support in ... Backlog
is related to JAVA-4887 GridFS doesn't support multi-document... Closed
is related to PYTHON-1332 Implement Drivers Sessions API Closed
is related to PYTHON-2244 Update documentation to promote GridF... Backlog
is related to PYTHON-2243 Raise informative error message when ... Closed

 Description   

In MongoDB 4.0, multi document transactions does not work with GridFS because:

  • Transaction is represented in single oplog entry, must be within 16MB limit.
  • Default transaction lifetime is only 60 seconds (big files could take a while to upload). Increasing this default is going to affect WT cache pressure.

GridFS with transaction would be a very special case (I don't know if there's any yet).
In PyMongo case, if a `ClientSession` is provided it will attempt to create an index for `fs.files` and `fs.chunks` collections grid_file.pyL196, currently creating an index in a transaction is prohibited in the server.



 Comments   
Comment by Shane Harvey [ 13/May/20 ]

Since MongoDB 4.4 allows createIndexes in multi-document transactions, transactions are now compatible with GridFS (with PyMongo 3.10.1):

>>> from gridfs import GridIn
>>> with client.start_session() as s, s.start_transaction():
...     with GridIn(root_collection=client.db.fs, session=s) as gin:
...         gin.write(b'my data')
...
>>> client.db['fs.files'].find_one()
{'_id': ObjectId('5ebc4ee4c9bd2e15eaa564d0'), 'md5': '1291e1c0aa879147f51f4a279e7c2e55', 'chunkSize': 261120, 'length': 7, 'uploadDate': datetime.datetime(2020, 5, 13, 19, 47, 48, 796000)}

On MongoDB 4.2, the server returns this error:

>>> from gridfs import GridIn
>>> with client.start_session() as s, s.start_transaction():
...     with GridIn(root_collection=client.db["fs.files"], session=s) as gin:
...         gin.write(b'my data')
...
Traceback (most recent call last):
  File "<stdin>", line 3, in <module>
  File "/Users/shane/Library/Python/3.8/lib/python/site-packages/gridfs/grid_file.py", line 413, in __exit__
    self.close()
  File "/Users/shane/Library/Python/3.8/lib/python/site-packages/gridfs/grid_file.py", line 321, in close
    self.__flush()
  File "/Users/shane/Library/Python/3.8/lib/python/site-packages/gridfs/grid_file.py", line 297, in __flush
    self.__flush_buffer()
  File "/Users/shane/Library/Python/3.8/lib/python/site-packages/gridfs/grid_file.py", line 289, in __flush_buffer
    self.__flush_data(self._buffer.getvalue())
  File "/Users/shane/Library/Python/3.8/lib/python/site-packages/gridfs/grid_file.py", line 267, in __flush_data
    self.__ensure_indexes()
  File "/Users/shane/Library/Python/3.8/lib/python/site-packages/gridfs/grid_file.py", line 210, in __ensure_indexes
    self.__create_index(self._coll.files, _F_INDEX, False)
  File "/Users/shane/Library/Python/3.8/lib/python/site-packages/gridfs/grid_file.py", line 205, in __create_index
    collection.create_index(
  File "/Users/shane/Library/Python/3.8/lib/python/site-packages/pymongo/collection.py", line 1995, in create_index
    self.__create_index(keys, kwargs, session, **cmd_options)
  File "/Users/shane/Library/Python/3.8/lib/python/site-packages/pymongo/collection.py", line 1890, in __create_index
    self._command(
  File "/Users/shane/Library/Python/3.8/lib/python/site-packages/pymongo/collection.py", line 235, in _command
    return sock_info.command(
  File "/Users/shane/Library/Python/3.8/lib/python/site-packages/pymongo/pool.py", line 603, in command
    return command(self.sock, dbname, spec, slave_ok,
  File "/Users/shane/Library/Python/3.8/lib/python/site-packages/pymongo/network.py", line 165, in command
    helpers._check_command_response(
  File "/Users/shane/Library/Python/3.8/lib/python/site-packages/pymongo/helpers.py", line 159, in _check_command_response
    raise OperationFailure(msg % errmsg, code, response)
pymongo.errors.OperationFailure: Cannot run 'createIndexes' in a multi-document transaction.

However I still think it makes sense to remove support for this feature until we have a cross drivers spec for transactions+sessions in GridFS. I've opened PYTHON-2243 to make PyMongo raise a client side error.

Comment by Bernie Hackett [ 13/May/20 ]

Until we have a real story for GridFS and transactions we should prohibit drivers from making up a story, and avoid bugs like this.

Comment by Bernie Hackett [ 13/May/20 ]

shane.harvey, is there a PyMongo bug here?

Comment by Esha Bhargava [ 13/Apr/20 ]

We should eventually add a test for this but we expect it to work in 4.4.

Comment by Scott L'Hommedieu (Inactive) [ 18/Jul/19 ]

I've opened docs ticket DOCSP-6255 to get this clarified in docs until we prioritize work on new features for GridFS.

Comment by A. Jesse Jiryu Davis [ 10/Aug/18 ]

I'd vote for documenting in drivers: "GridFSBucket does not support transactions."

Comment by A. Jesse Jiryu Davis [ 10/Aug/18 ]

OK, I understand you better. PyMongo (at least) has implemented sessions support for GridFSBucket, although it wasn't required to. Since PyMongo's GridFSBucket has sessions support, it got transactions support for GridFSBucket automatically, but transactions don't work with GridFSBucket. (You can't create an index in a transaction, nor do large or slow operations.) The errors this causes are unintuitive.

There's some distractions in the original Stack Overflow question: the user starts by creating a transaction in a session but they can't pass the session to GridFS, because GridFS is deprecated and doesn't have a session parameter. So they spend a minute uploading a file not in the transaction, and then they commit the transaction but it timed out while they were uploading. The user doesn't know about the new GridFSBucket API, which does support sessions and therefore sort of supports transactions. Anyway, next the user tries passing a session directly to a low-level API called GridIn, so they do manage to do the file upload in a transaction, but that doesn't work because GridIn tries to create indexes in the transaction.

Generated at Thu Feb 08 08:21:49 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.