[JAVA-1085] auto retry during failover Created: 16/Jan/14  Updated: 14/Jun/19  Resolved: 27/Feb/18

Status: Closed
Project: Java Driver
Component/s: Cluster Management, Error Handling
Affects Version/s: None
Fix Version/s: 3.6.0

Type: New Feature Priority: Major - P3
Reporter: Dan Bularzik Assignee: Unassigned
Resolution: Duplicate Votes: 12
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Java Source File MongoTemplateRetryAspect.java    
Issue Links:
Duplicate
is duplicated by JAVA-1073 Implement proper auto reconnect for e... Closed
Related
is related to JAVA-2570 All writes retryable support Closed
is related to DOCS-4180 Application Level Retry Patterns Closed
is related to JAVA-2411 MongoDB: java.lang.IllegalStateExcept... Closed

 Description   

When the driver is configured to work with a replica set, it still favors a single node in the replica set. If that node becomes unavailable, the next request will invalidate the connection and return failure. A subsequent request will attempt to reconnect to the replica set, thereby establishing a connection to a different node, and therefore succeed. Thus, in a failover scenario, one (or maybe more) requests are lost.

How we deal with this at the AKC is to use a "retry" aspect; it advises every interaction with MongoDB, and if the interaction returns an UncategorizedMongoException or DataAccessResourceFailureException, it resubmits the request one time. That way, in a failover, we give ourselves the chance to have zero lost requests.

While we have a code solution for this, I'd like it if this were supported by the official distribution. I've included our aspect for reference.



 Comments   
Comment by Jeffrey Yemin [ 14/Jun/19 ]

We haven't released a reactive streams beta yet, but it should probably work if you pull in the beta 3.11 java driver in your own Maven/Gradle configs.

Comment by Nick Balkissoon [ 14/Jun/19 ]

Loving the fast responses haha - but just to confirm, its not currently supported? (Most of our teams use the sync driver, but some mavericks are going reactive)

Comment by Jeffrey Yemin [ 14/Jun/19 ]

It will!

Comment by Nick Balkissoon [ 14/Jun/19 ]

Awesome - also, will retryable reads/writes be supported for the reactivestreams driver?

Comment by Jeffrey Yemin [ 13/Jun/19 ]

We don't generally see a lot of beta usage, so also not a lot of feedback positive or negative. That said, it passes all regression tests so I'm fairly confident in the quality.

Comment by Nick Balkissoon [ 13/Jun/19 ]

Thanks Jeff - we will use beta3 in the meantime - have you gotten positive feedback regarding this version / have any major concerns with using the beta version?

Comment by Jeffrey Yemin [ 13/Jun/19 ]

It will likely be released some time in August.

Comment by Nick Balkissoon [ 13/Jun/19 ]

Thanks Jeff! Does your team have a estimated release date for 3.11.0? My org is trying to evaluate whether to use beta3 for now or to wait for the GA release.

Comment by Jeffrey Yemin [ 16/May/19 ]

We will in the next release! See JAVA-3241 for details.

Comment by Nick Balkissoon [ 16/May/19 ]

Fantastic feature for retry-able writes.  Out of curiosity, any reason why we wouldn't support the same retry-ability for reads? 

Comment by Ross Lawley [ 27/Feb/18 ]

MongoDB 3.6 and the Java Driver 3.6.x now supports retryable writes. Please see JAVA-2570 for more details.

Comment by Kevin D. Keck [ 22/Dec/16 ]

Yes, we happen to not really use $inc, and generally use idempotent write operations (mostly $set, $unset, $addToSet, and $pullAll), so simply re-trying is no problem for us.

But it's worth noting that if it is an issue for your usage, not only could you write your own custom executor to only retry reads, you could even inspect the write operations that fail to decide whether it was safe to retry or not (only retrying if it used only operators known to be idempotent).

Comment by Jeffrey Yemin [ 22/Dec/16 ]

Thanks for posting your workaround, kdkeck. Just be aware that not all writes are idempotent, e.g $inc, and in those cases the RobustExecutor may end up applying them more than once.

Comment by Kevin D. Keck [ 22/Dec/16 ]

More details, as requested by Evgeny G:

Here's the code we're currently using to replace the default OperationExecutor with a custom executor, in a DB instance (since we're still using the old API):

  @SneakyThrows(ReflectiveOperationException.class)
  public static DB replaceOperationExecutor(DB db) {
    Field f = db.getClass().getDeclaredField("executor");
    f.setAccessible(true);
    f.set(db, new RobustExecutor((OperationExecutor) f.get(db)));
    return db;
  }

To do the same for a MongoDatabase, you should just need to replace all the references to DB:

  @SneakyThrows(ReflectiveOperationException.class)
  public static MongoDatabase replaceOperationExecutor(MongoDatabase db) {
    Field f = db.getClass().getDeclaredField("executor");
    f.setAccessible(true);
    f.set(db, new RobustExecutor((OperationExecutor) f.get(db)));
    return db;
  }

Our RobustExecutor apparently still fails to recover in certain cases, but seems to nicely handle the bulk of our temporary glitches:

  @AllArgsConstructor
  private static class RobustExecutor implements OperationExecutor {
    private final OperationExecutor wrappedExecutor;
 
    @Override
    public <T> T execute(ReadOperation<T> operation, ReadPreference readPreference) {
      try {
        return wrappedExecutor.execute(operation, readPreference);
      } catch (DuplicateKeyException e) {
        throw e;
      } catch (MongoException | IllegalStateException e) {
        log.warn("Retrying operation after catching: ", e);
        return wrappedExecutor.execute(operation, readPreference);
      }
    }
 
    @Override
    public <T> T execute(WriteOperation<T> operation) {
      try {
        return wrappedExecutor.execute(operation);
      } catch (DuplicateKeyException e) {
        throw e;
      } catch (MongoException | IllegalStateException e) {
        log.warn("Retrying operation after catching: ", e);
        return wrappedExecutor.execute(operation);
      }
    }
  }

Comment by Kevin D. Keck [ 03/Nov/16 ]

I've solved this problem internally by implementing a custom OperationExecutor that performs a retry, and using injection to replace the default one in the MongoDatabase/DB instance whenever we call MongoClient.getDatabase()/Mongo.getDB() or DB.getSisterDB() (because it calls Mongo.getDB() internally). This has enabled me to easily implement an efficient retry policy across all operations across all collections, and it's working like a charm.

The simplest API change to enable this to be done without having to break encapsulation and having to post-process all the MongoDatabase/DB instances before use would be to allow an OperationExecutor factory to be set on MongoClient/Mongo, to be used instead of the current hard-wired behaviour of invoking Mongo.createOperationExecutor().

Comment by Jeffrey Yemin [ 31/Mar/14 ]

I think any retry policy has to consider the following three scenarios:

First: let's say the application does a simple query, and the driver throws a com.mongodb.MongoException.Network exception. In this case the application does not know whether the query every made it to the server. But it doesn't matter. The application can simply retry the query, since queries have no visible side effects. There are practical issues to consider when retrying (like locking up your application server because all threads are busy retrying), but fundamentally it's safe to retry a query.

Second: let's say the application tries to insert a single document using an application-generated _id, and the driver throws a com.mongodb.MongoException.Network exception. In this case the application does not know whether the write actually succeeded. In fact there are three possibilities.

  1. The insert request got to the server, the document was inserted, but the successful response from the server never reached the client, and so a Network exception is thrown
  2. The insert request got to the server, the _id of the document already existed in the collection, but the server error never reached the client, and so a Network exception is thrown
  3. The insert request never even made it to the server, and so a Network exception is thrown

If the application retries the insert, and the server returns a duplicate key error, the driver will throw a com.mongodb.MongoException.DuplicateKey exception. Now the application has a bit of a problem, because it can't tell the difference between case 1 and case 2 without taking some further action, like, for example, querying the server to see what the document looks like.

Third: Let's say the application does an update using the $inc operator, and the driver throws a com.mongodb.MongoException.Network exception. Again, the application does not know whether the write actually succeeded. There are two interesting possibilities:

  1. The update request got to the server, the document was updated, but the successful response from the server never reached the client, and so a Network exception is thrown
  2. The update request never even made it to the server, and so a Network exception is thrown

Again, the application can't tell which of the two possibilities occurred. Furthermore, querying the server doesn't help, because the application doesn't know the original value of the field that is being incremented. So the application has two choices:

  1. Don't retry and risk losing the increment.
  2. Retry and risk applying the increment more than once

Note that these three cases map to the definitions in the HTTP RFC for safe (GET, HEAD), idempotent (PUT, DELETE), and other (POST) request types.

In summary, I would say that any retry policy implemented by the driver has to differentiate between these three scenarios or risk inadvertently violating the expected effects of each type of operation.

Comment by Dan Bularzik [ 28/Mar/14 ]

First off, as I look back at what I originally wrote and submitted, I see that I submitted code that utilized spring data mongodb, not the mongodb java client itself. So apologies for that bit of awkwardness. I appreciate y'all listening to the spirit of the enhancement request, not the specific implementation of it.

Second, I applaud a couple of the design decisions you're considering here. Switching to an interface instead of a public abstract class is undoubtedly a good thing, and I like your inclination to avoid AOP. Personally, I only resort to aspects when I feel there's no other good choice.

That being said, and knowing nothing about your code...if I were to implement this feature, I'd probably attempt it at a lower level. I'm assuming that there's some lower-level code that all communication with the server is funneled through. If you have error-handling code that surrounds that, I'd insert a call to a retry strategy interface in there; then based on what is passed to MongoClientOptions (i.e. whether the developer wants the retry or not), I'd populate the interface reference with an appropriate retry strategy implementation.

The reason why I'd approach it this way is the same reason I resorted to AOP in my current code: there's a large number of methods you'd have to cover if your retry implementation is at the API interface level, and in those situations I always worry that I'd miss one (are there any retry scenarios outside of this one interface?), or forget to update one when I need to change the implemenation, etc.

Also, from a design point of view, I'd personally prefer having "retry" something I configure via options, rather than an additional wrapper I may need to remember to add whiile coding. That way, I can change retry behavior via configuration (i.e. MongoClientConfig...settings for which I may externalize from my code) rather than having to change code. I know that your proposed implementation could be controlled this way too if you have your MongoCollection implementation generated by a factory method , so consider this a vote for control via config, regardless of whether you take my implementation suggestion.

My $0.02.

--Dan.

Comment by Jeffrey Yemin [ 27/Mar/14 ]

Hi Dan,

Thanks for opening this issue, and apologies for not responding sooner.

We've been kicking this one around, and one way we could handle this in the future is by simply wrapping the driver. In 3.0, there is a new interface called MongoCollection<T> that is a replacement for DBCollection (which is an abstract class). As an interface, it would be relatively straightforward to wrap an instance of the driver's implementation of the interface, and embed the retry logic in the wrapper. It would look something like:

public class RetryingMongoCollection<T> implements MongoCollection<T> {
    private final MongoCollection<T> proxied;
 
    public RetryingMongoCollection(final MongoCollection<T> proxied) {
        this.proxied = proxied;
    }
 
    @Override
    public WriteResult insert(final T t) {
        // retry logic here
        return proxied.insert(t);
    }
    //  ...

Let me know what you think of this idea.

Generated at Thu Feb 08 08:53:46 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.