[JAVA-1085] auto retry during failover Created: 16/Jan/14 Updated: 14/Jun/19 Resolved: 27/Feb/18 |
|
| Status: | Closed |
| Project: | Java Driver |
| Component/s: | Cluster Management, Error Handling |
| Affects Version/s: | None |
| Fix Version/s: | 3.6.0 |
| Type: | New Feature | Priority: | Major - P3 |
| Reporter: | Dan Bularzik | Assignee: | Unassigned |
| Resolution: | Duplicate | Votes: | 12 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||
| Description |
|
When the driver is configured to work with a replica set, it still favors a single node in the replica set. If that node becomes unavailable, the next request will invalidate the connection and return failure. A subsequent request will attempt to reconnect to the replica set, thereby establishing a connection to a different node, and therefore succeed. Thus, in a failover scenario, one (or maybe more) requests are lost. How we deal with this at the AKC is to use a "retry" aspect; it advises every interaction with MongoDB, and if the interaction returns an UncategorizedMongoException or DataAccessResourceFailureException, it resubmits the request one time. That way, in a failover, we give ourselves the chance to have zero lost requests. While we have a code solution for this, I'd like it if this were supported by the official distribution. I've included our aspect for reference. |
| Comments |
| Comment by Jeffrey Yemin [ 14/Jun/19 ] | ||||||||||||||||||||||||||||||||||||||||||
|
We haven't released a reactive streams beta yet, but it should probably work if you pull in the beta 3.11 java driver in your own Maven/Gradle configs. | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Nick Balkissoon [ 14/Jun/19 ] | ||||||||||||||||||||||||||||||||||||||||||
|
Loving the fast responses haha - but just to confirm, its not currently supported? (Most of our teams use the sync driver, but some mavericks are going reactive) | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Jeffrey Yemin [ 14/Jun/19 ] | ||||||||||||||||||||||||||||||||||||||||||
|
It will! | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Nick Balkissoon [ 14/Jun/19 ] | ||||||||||||||||||||||||||||||||||||||||||
|
Awesome - also, will retryable reads/writes be supported for the reactivestreams driver? | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Jeffrey Yemin [ 13/Jun/19 ] | ||||||||||||||||||||||||||||||||||||||||||
|
We don't generally see a lot of beta usage, so also not a lot of feedback positive or negative. That said, it passes all regression tests so I'm fairly confident in the quality. | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Nick Balkissoon [ 13/Jun/19 ] | ||||||||||||||||||||||||||||||||||||||||||
|
Thanks Jeff - we will use beta3 in the meantime - have you gotten positive feedback regarding this version / have any major concerns with using the beta version? | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Jeffrey Yemin [ 13/Jun/19 ] | ||||||||||||||||||||||||||||||||||||||||||
|
It will likely be released some time in August. | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Nick Balkissoon [ 13/Jun/19 ] | ||||||||||||||||||||||||||||||||||||||||||
|
Thanks Jeff! Does your team have a estimated release date for 3.11.0? My org is trying to evaluate whether to use beta3 for now or to wait for the GA release. | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Jeffrey Yemin [ 16/May/19 ] | ||||||||||||||||||||||||||||||||||||||||||
|
We will in the next release! See | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Nick Balkissoon [ 16/May/19 ] | ||||||||||||||||||||||||||||||||||||||||||
|
Fantastic feature for retry-able writes. Out of curiosity, any reason why we wouldn't support the same retry-ability for reads? | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Ross Lawley [ 27/Feb/18 ] | ||||||||||||||||||||||||||||||||||||||||||
|
MongoDB 3.6 and the Java Driver 3.6.x now supports retryable writes. Please see | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Kevin D. Keck [ 22/Dec/16 ] | ||||||||||||||||||||||||||||||||||||||||||
|
Yes, we happen to not really use $inc, and generally use idempotent write operations (mostly $set, $unset, $addToSet, and $pullAll), so simply re-trying is no problem for us. But it's worth noting that if it is an issue for your usage, not only could you write your own custom executor to only retry reads, you could even inspect the write operations that fail to decide whether it was safe to retry or not (only retrying if it used only operators known to be idempotent). | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Jeffrey Yemin [ 22/Dec/16 ] | ||||||||||||||||||||||||||||||||||||||||||
|
Thanks for posting your workaround, kdkeck. Just be aware that not all writes are idempotent, e.g $inc, and in those cases the RobustExecutor may end up applying them more than once. | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Kevin D. Keck [ 22/Dec/16 ] | ||||||||||||||||||||||||||||||||||||||||||
|
More details, as requested by Evgeny G: Here's the code we're currently using to replace the default OperationExecutor with a custom executor, in a DB instance (since we're still using the old API):
To do the same for a MongoDatabase, you should just need to replace all the references to DB:
Our RobustExecutor apparently still fails to recover in certain cases, but seems to nicely handle the bulk of our temporary glitches:
| ||||||||||||||||||||||||||||||||||||||||||
| Comment by Kevin D. Keck [ 03/Nov/16 ] | ||||||||||||||||||||||||||||||||||||||||||
|
I've solved this problem internally by implementing a custom OperationExecutor that performs a retry, and using injection to replace the default one in the MongoDatabase/DB instance whenever we call MongoClient.getDatabase()/Mongo.getDB() or DB.getSisterDB() (because it calls Mongo.getDB() internally). This has enabled me to easily implement an efficient retry policy across all operations across all collections, and it's working like a charm. The simplest API change to enable this to be done without having to break encapsulation and having to post-process all the MongoDatabase/DB instances before use would be to allow an OperationExecutor factory to be set on MongoClient/Mongo, to be used instead of the current hard-wired behaviour of invoking Mongo.createOperationExecutor(). | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Jeffrey Yemin [ 31/Mar/14 ] | ||||||||||||||||||||||||||||||||||||||||||
|
I think any retry policy has to consider the following three scenarios: First: let's say the application does a simple query, and the driver throws a com.mongodb.MongoException.Network exception. In this case the application does not know whether the query every made it to the server. But it doesn't matter. The application can simply retry the query, since queries have no visible side effects. There are practical issues to consider when retrying (like locking up your application server because all threads are busy retrying), but fundamentally it's safe to retry a query. Second: let's say the application tries to insert a single document using an application-generated _id, and the driver throws a com.mongodb.MongoException.Network exception. In this case the application does not know whether the write actually succeeded. In fact there are three possibilities.
If the application retries the insert, and the server returns a duplicate key error, the driver will throw a com.mongodb.MongoException.DuplicateKey exception. Now the application has a bit of a problem, because it can't tell the difference between case 1 and case 2 without taking some further action, like, for example, querying the server to see what the document looks like. Third: Let's say the application does an update using the $inc operator, and the driver throws a com.mongodb.MongoException.Network exception. Again, the application does not know whether the write actually succeeded. There are two interesting possibilities:
Again, the application can't tell which of the two possibilities occurred. Furthermore, querying the server doesn't help, because the application doesn't know the original value of the field that is being incremented. So the application has two choices:
Note that these three cases map to the definitions in the HTTP RFC for safe (GET, HEAD), idempotent (PUT, DELETE), and other (POST) request types. In summary, I would say that any retry policy implemented by the driver has to differentiate between these three scenarios or risk inadvertently violating the expected effects of each type of operation. | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Dan Bularzik [ 28/Mar/14 ] | ||||||||||||||||||||||||||||||||||||||||||
|
First off, as I look back at what I originally wrote and submitted, I see that I submitted code that utilized spring data mongodb, not the mongodb java client itself. So apologies for that bit of awkwardness. I appreciate y'all listening to the spirit of the enhancement request, not the specific implementation of it. Second, I applaud a couple of the design decisions you're considering here. Switching to an interface instead of a public abstract class is undoubtedly a good thing, and I like your inclination to avoid AOP. Personally, I only resort to aspects when I feel there's no other good choice. That being said, and knowing nothing about your code...if I were to implement this feature, I'd probably attempt it at a lower level. I'm assuming that there's some lower-level code that all communication with the server is funneled through. If you have error-handling code that surrounds that, I'd insert a call to a retry strategy interface in there; then based on what is passed to MongoClientOptions (i.e. whether the developer wants the retry or not), I'd populate the interface reference with an appropriate retry strategy implementation. The reason why I'd approach it this way is the same reason I resorted to AOP in my current code: there's a large number of methods you'd have to cover if your retry implementation is at the API interface level, and in those situations I always worry that I'd miss one (are there any retry scenarios outside of this one interface?), or forget to update one when I need to change the implemenation, etc. Also, from a design point of view, I'd personally prefer having "retry" something I configure via options, rather than an additional wrapper I may need to remember to add whiile coding. That way, I can change retry behavior via configuration (i.e. MongoClientConfig...settings for which I may externalize from my code) rather than having to change code. I know that your proposed implementation could be controlled this way too if you have your MongoCollection implementation generated by a factory method , so consider this a vote for control via config, regardless of whether you take my implementation suggestion. My $0.02. --Dan. | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Jeffrey Yemin [ 27/Mar/14 ] | ||||||||||||||||||||||||||||||||||||||||||
|
Hi Dan, Thanks for opening this issue, and apologies for not responding sooner. We've been kicking this one around, and one way we could handle this in the future is by simply wrapping the driver. In 3.0, there is a new interface called MongoCollection<T> that is a replacement for DBCollection (which is an abstract class). As an interface, it would be relatively straightforward to wrap an instance of the driver's implementation of the interface, and embed the retry logic in the wrapper. It would look something like:
Let me know what you think of this idea. |