[SERVER-60495] Retry FailedToSatisfyReadPreference in DDL coordinators Created: 06/Oct/21  Updated: 29/Oct/23  Resolved: 11/Oct/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 5.0.3, 5.1.0-rc0
Fix Version/s: 5.2.0, 5.0.4, 5.1.0-rc1

Type: Bug Priority: Major - P3
Reporter: Pierlauro Sciarelli Assignee: Pierlauro Sciarelli
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
is related to SERVER-59884 auto_retry_transaction.js does not re... Backlog
is related to SERVER-58407 Resharding components do not retry on... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v5.1, v5.0
Participants:
Linked BF Score: 144

 Description   

Some tests running on suites killing primary nodes resulted in DDL failures because of a FailedToSatisfyReadPreference exception. Such error must be included in the ones retriable in order not to release DDL coordinators half-way.



 Comments   
Comment by Githook User [ 12/Oct/21 ]

Author:

{'name': 'Pierlauro Sciarelli', 'email': 'pierlauro.sciarelli@mongodb.com', 'username': 'pierlauro'}

Message: SERVER-60495 Retry FailedToSatisfyReadPreference in DDL coordinators
Branch: v5.0
https://github.com/mongodb/mongo/commit/1a0f9aec5fbec1d7726a850a89f0f3ba0aee01f3

Comment by Githook User [ 11/Oct/21 ]

Author:

{'name': 'Pierlauro Sciarelli', 'email': 'pierlauro.sciarelli@mongodb.com', 'username': 'pierlauro'}

Message: SERVER-60495 Retry FailedToSatisfyReadPreference in DDL coordinators
Branch: v5.1
https://github.com/mongodb/mongo/commit/460172298bbe0e1301e80e0ca1138d9cc14e4b44

Comment by Githook User [ 08/Oct/21 ]

Author:

{'name': 'Pierlauro Sciarelli', 'email': 'pierlauro.sciarelli@mongodb.com', 'username': 'pierlauro'}

Message: SERVER-60495 Retry FailedToSatisfyReadPreference in DDL coordinators
Branch: master
https://github.com/mongodb/mongo/commit/e988a942048d98cbdacbffec86654b614827ee02

Comment by Max Hirschhorn [ 06/Oct/21 ]

pierlauro.sciarelli, after discussing SERVER-59884 with Randolph, I think it is currently a bug that FailedToSatisfyReadPreference isn't considered a retryable error by drivers for the commitTransaction command. (The UnknownTransactionCommitResult error label is set by drivers - for example in pymongo.)

I'm wondering if it is also a server bug that FailedToSatisfyReadPreference isn't considered a retryable error by drivers for retryable writes. I suspect that it is a bug because there's a fair chance the retry would succeed. Until DRIVERS-555 is completed, drivers will still unfortunately only retry once so if the primary of the replica set shard is unavailable for 30 seconds (twice kDefaultFindHostTimeout), then the retryable write will still fail and lead to an error/exception in the application.

It would be worth discussing/confirming with folks from the Replication and Drivers team but I feel like FailedToSatisfyReadPreference should have the same categories applied to it that ExceededTimeLimit currently has. (SERVER-35031 has some context on the difference between MaxTimeMSExpired and ExceededTimeLimit.)

Generated at Thu Feb 08 05:49:57 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.