[SERVER-47809] Transaction fails with StaleConfig inserting into newly created unsharded collection on v4.5.0-920-ge5e378e Created: 27/Apr/20  Updated: 29/Oct/23  Resolved: 26/May/20

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 4.7.0

Type: Bug Priority: Major - P3
Reporter: Shane Harvey Assignee: Marcos José Grillo Ramirez
Resolution: Fixed Votes: 0
Labels: PM-1645-Milestone-1
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by PYTHON-2214 Test failure - test.test_transactions... Closed
Problem/Incident
Related
is related to PYTHON-2189 Transaction spec test for bulk write ... Closed
Backwards Compatibility: Fully Compatible
Sprint: Sharding 2020-05-04, Sharding 2020-05-18, Sharding 2020-06-01
Participants:
Linked BF Score: 14

 Description   

A transaction that creates a new unsharded collection will fail when attempting to insert data:

>>> with client.start_session() as s, s.start_transaction():
...     client.db.command('create', 'test', session=s)   # Succeeds
...     client.db.test.insert_one({}, session=s)   # Fails
...
{'ok': 1.0, 'operationTime': Timestamp(1588016473, 4), '$clusterTime': {'clusterTime': Timestamp(1588016473, 4), 'signature': {'hash': b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00', 'keyId': 0}}, 'recoveryToken': {'recoveryShardId': 'demo-set-0'}}
Traceback (most recent call last):
  File "<stdin>", line 3, in <module>
  File "/Users/shane/git/mongo-python-driver/pymongo/collection.py", line 694, in insert_one
    self._insert(document,
  File "/Users/shane/git/mongo-python-driver/pymongo/collection.py", line 609, in _insert
    return self._insert_one(
  File "/Users/shane/git/mongo-python-driver/pymongo/collection.py", line 598, in _insert_one
    self.__database.client._retryable_write(
  File "/Users/shane/git/mongo-python-driver/pymongo/mongo_client.py", line 1500, in _retryable_write
    return self._retry_with_session(retryable, func, s, None)
  File "/Users/shane/git/mongo-python-driver/pymongo/mongo_client.py", line 1393, in _retry_with_session
    return func(session, sock_info, retryable)
  File "/Users/shane/git/mongo-python-driver/pymongo/collection.py", line 586, in _insert_command
    result = sock_info.command(
  File "/Users/shane/git/mongo-python-driver/pymongo/pool.py", line 594, in command
    return command(self.sock, dbname, spec, slave_ok,
  File "/Users/shane/git/mongo-python-driver/pymongo/network.py", line 150, in command
    helpers._check_command_response(
  File "/Users/shane/git/mongo-python-driver/pymongo/helpers.py", line 161, in _check_command_response
    raise OperationFailure(msg % errmsg, code, response)
pymongo.errors.OperationFailure: Transaction 7dc4b97f-00da-46b4-8658-6f647acf6b78:1 was aborted on statement 1 due to: an error from cluster data placement change :: caused by :: Encountered error from localhost:27019 during a transaction :: caused by :: sharding status of collection db.test is not currently known and needs to be recovered

This reproduces consistently on 4.5-latest, even after the fix in SERVER-47472:

mongodb-macos-x86_64-enterprise-4.5.0-920-ge5e378e/bin/mongos --version
mongos version v4.5.0-920-ge5e378e
Build Info:{"version":"4.5.0-920-ge5e378e"
 ,"gitVersion":"e5e378e2d10eccf4eb3aeda9b621e41854c24a5c"
 ,"modules":["enterprise"]
 ,"allocator":"system"
 ,"environment":{"distarch":"x86_64"
   ,"target_arch":"x86_64"}}



 Comments   
Comment by Githook User [ 26/May/20 ]

Author:

{'name': 'Marcos José Grillo Ramírez', 'email': 'marcos.grillo@mongodb.com', 'username': 'm4nti5'}

Message: SERVER-47809 Prevent createCollection as first statement in a transaction to fail with TransientTransactionError
Branch: master
https://github.com/mongodb/mongo/commit/30d3f823009e217302e3004e2b9ed4aea11e2792

Comment by Marcos José Grillo Ramirez [ 18/May/20 ]

This is happening because there is no collection metadata being set or refreshed right after the create collection statement. We'll work on a fix specifically for createCollection.

Comment by Shane Harvey [ 29/Apr/20 ]

Thanks again. I will update the Python test to use the with_transaction helper so that it can tolerate transient errors like this one. My reasoning for opening this as a server bug was that removing the explicit create_collection call and implicitly creating the collection always seems to succeed:

>>> with client.start_session() as s, s.start_transaction():
...     client.db.missing1.insert_one({}, session=s)   # Succeeds

It was surprising to me that explicit creation behaves differently than implicit creation.

Comment by Marcos José Grillo Ramirez [ 29/Apr/20 ]

Hi shane.harvey!, as we mentioned on SERVER-46679 and HELP-15010 all the transaction tests should have a retry logic because any command within a transaction can throw a Transient error. We'll take a look at this specific case, but there should be some retry logic on the driver tests.

Generated at Thu Feb 08 05:15:18 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.