[CDRIVER-3543] Consider refactoring how retryable writes labels are appended Created: 18/Feb/20  Updated: 28/Oct/23  Resolved: 26/Feb/20

Status: Closed
Project: C Driver
Component/s: libmongoc
Affects Version/s: None
Fix Version/s: 1.17.0-beta, 1.17.0

Type: Improvement Priority: Major - P3
Reporter: Kevin Albertson Assignee: Andreas Braun
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

The driver adds error labels for retryable writes as part of _mongoc_write_error_get_type per CDRIVER-3462.

In the review of CDRIVER-3462 I added some suggestions for refactoring this to be more consistent with existing code we have for adding error labels:

I wonder if it would make more sense to put this logic deeper in the command codepath.

For the network error check, we could follow suit with where a
TransientTransactionError label is added for network errors (see https://github.com/mongodb/specifications/blob/master/source/transactions/transactions.rst#error-reporting-changes).

That's in the function "network_error_reply" of mongoc-cluster.c. It could append a RetryableWriteError if cmd->is_retryable_write is true.

That would also apply in more cases, which might be an existing bug in our retry logic. Because network_reply_error is called in cases like failure to decompress a server response (https://github.com/mongodb/mongo-c-driver/blob/1.16.0/src/libmongoc/src/mongoc/mongoc-cluster.c/#L3013) which sets an error domain MONGOC_ERROR_PROTOCOL (not MONGOC_ERROR_STREAM) but still calls network_error_reply.

For the server error check, _mongoc_write_error_is_retryable could be moved into mongoc-cluster.c and that check could be done around here: https://github.com/mongodb/mongo-c-driver/blob/1.16.0/src/libmongoc/src/mongoc/mongoc-cluster.c/#L3049

Though this is just a suggestion. A nice thing about adding the error label it here is that it keeps the retryable writes logic a bit more self contained. But I'm up to discuss this further.



 Comments   
Comment by Githook User [ 26/Feb/20 ]

Author:

{'name': 'Andreas Braun', 'username': 'alcaeus', 'email': 'git@alcaeus.org'}

Message: CDRIVER-3543 treat certain protocol errors as retryable
Branch: master
https://github.com/mongodb/mongo-c-driver/commit/8f62a35093c647da69a7c8dc8a6e5817b2c91f4b

Comment by Kevin Albertson [ 25/Feb/20 ]

Per discussion, I got it wrong in the description. network_error_reply does not have the information available to know whether the write is retryable or not. And that's probably how it should be, since mongoc_cluster_run_opmsg is concerned with sending a single message and does not need to know about retryability. Instead, let's just check that the other possible network error (domain MONGOC_ERROR_PROTOCOL and code MONGOC_ERROR_PROTOCOL_INVALID_REPLY) is considered a retryable write.

Generated at Wed Feb 07 21:18:21 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.