[JAVA-5119] failed retries exception should show all unavailable servers Created: 23/Aug/23  Updated: 31/Aug/23  Resolved: 31/Aug/23

Status: Closed
Project: Java Driver
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Unknown
Reporter: Nikita Sokolov Assignee: Valentin Kavalenka
Resolution: Won't Do Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to JAVA-4974 try another srv-record on MongoSocket... Closed
Documentation Changes Summary:

1. What would you like to communicate to the user about this feature?
2. Would you like the user to see examples of the syntax and/or executable code and its output?
3. Which versions of the driver/connector does this apply to?


 Description   

colored textSomewhat related to JAVA-4974

https://github.com/mongodb/mongo-java-driver/pull/1181

With thrown exceptions only showing one of errors, it is hard to tell whether the operations are being retried or not. Patching our version of the Mongo Java driver to log all the errors has shown which errors are being retried and which are not.
Here is an example of an error being retried, as you can see two different servers are being tried:

org.springframework.data.mongodb.UncategorizedMongoDbException: Connection pool for ServerId{clusterId=ClusterId{value='64de2b2fa0992c193b36462f', description='joom-ranker'}, address=10-0-61-13.backend-mongos-production-user-critical.backend-mongos.svc.cluster.local:27017} is paused because another operation failed; nested exception is com.mongodb.MongoConnectionPoolClearedException: Connection pool for ServerId{clusterId=ClusterId{value='64de2b2fa0992c193b36462f', description='joom-ranker'}, address=10-0-61-13.backend-mongos-production-user-critical.backend-mongos.svc.cluster.local:27017} is paused because another operation failed
    at org.springframework.data.mongodb.core.MongoExceptionTranslator.translateExceptionIfPossible(MongoExceptionTranslator.java:133)
    at org.springframework.data.mongodb.core.ReactiveMongoTemplate.potentiallyConvertRuntimeException(ReactiveMongoTemplate.java:2814)
    at org.springframework.data.mongodb.core.ReactiveMongoTemplate.lambda$translateException$90(ReactiveMongoTemplate.java:2797)
    at reactor.core.publisher.Mono.lambda$onErrorMap$30(Mono.java:3325)
    at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onError(FluxOnErrorResume.java:88)
    at reactor.core.publisher.MonoFlatMap$FlatMapMain.secondError(MonoFlatMap.java:185)
    at reactor.core.publisher.MonoFlatMap$FlatMapInner.onError(MonoFlatMap.java:251)
    at reactor.core.publisher.MonoFlatMap$FlatMapMain.onError(MonoFlatMap.java:165)
    at reactor.core.publisher.MonoFlatMap$FlatMapMain.onError(MonoFlatMap.java:165)
    at reactor.core.publisher.FluxMap$MapSubscriber.onError(FluxMap.java:126)
    at reactor.core.publisher.MonoNext$NextSubscriber.onError(MonoNext.java:87)
    at reactor.core.publisher.MonoNext$NextSubscriber.onError(MonoNext.java:87)
    at reactor.core.publisher.MonoFlatMap$FlatMapMain.secondError(MonoFlatMap.java:185)
    at reactor.core.publisher.MonoFlatMap$FlatMapInner.onError(MonoFlatMap.java:251)
    at reactor.core.publisher.MonoPeekTerminal$MonoTerminalPeekSubscriber.onError(MonoPeekTerminal.java:251)
    at reactor.core.publisher.MonoCreate$DefaultMonoSink.error(MonoCreate.java:190)
    at com.mongodb.reactivestreams.client.internal.MongoOperationPublisher.lambda$sinkToCallback$31(MongoOperationPublisher.java:570)
    at com.mongodb.reactivestreams.client.internal.OperationExecutorImpl.lambda$execute$2(OperationExecutorImpl.java:92)
    at com.mongodb.internal.async.ErrorHandlingResultCallback.onResult(ErrorHandlingResultCallback.java:48)
    at com.mongodb.internal.async.function.AsyncCallbackSupplier.lambda$whenComplete$1(AsyncCallbackSupplier.java:95)
    at com.mongodb.internal.async.function.RetryingAsyncCallbackSupplier$RetryingCallback.onResult(RetryingAsyncCallbackSupplier.java:109)
    at com.mongodb.internal.async.ErrorHandlingResultCallback.onResult(ErrorHandlingResultCallback.java:48)
    at com.mongodb.internal.async.function.AsyncCallbackSupplier.lambda$whenComplete$1(AsyncCallbackSupplier.java:95)
    at com.mongodb.internal.async.ErrorHandlingResultCallback.onResult(ErrorHandlingResultCallback.java:48)
    at com.mongodb.internal.operation.OperationHelper.lambda$withAsyncSuppliedResource$7(OperationHelper.java:640)
    at com.mongodb.internal.connection.DefaultServer$1.onResult(DefaultServer.java:130)
    at com.mongodb.internal.connection.DefaultServer$1.onResult(DefaultServer.java:122)
    at com.mongodb.internal.async.ErrorHandlingResultCallback.onResult(ErrorHandlingResultCallback.java:48)
    at com.mongodb.internal.connection.DefaultConnectionPool.lambda$getAsync$0(DefaultConnectionPool.java:199)
    at com.mongodb.internal.connection.DefaultConnectionPool.lambda$getAsync$1(DefaultConnectionPool.java:216)
    at com.mongodb.internal.connection.DefaultConnectionPool$Task.doComplete(DefaultConnectionPool.java:1382)
    at com.mongodb.internal.connection.DefaultConnectionPool$Task.execute(DefaultConnectionPool.java:1368)
    at com.mongodb.internal.connection.DefaultConnectionPool$AsyncWorkManager.workerRun(DefaultConnectionPool.java:1309)
    at com.mongodb.internal.connection.DefaultConnectionPool$AsyncWorkManager.runAndLogUncaught(DefaultConnectionPool.java:1339)
    at com.mongodb.internal.connection.DefaultConnectionPool$AsyncWorkManager.lambda$initUnlessClosed$0(DefaultConnectionPool.java:1274)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: com.mongodb.MongoConnectionPoolClearedException: Connection pool for ServerId{clusterId=ClusterId{value='64de2b2fa0992c193b36462f', description='joom-ranker'}, address=10-0-61-13.backend-mongos-production-user-critical.backend-mongos.svc.cluster.local:27017} is paused because another operation failed
    at com.mongodb.internal.connection.DefaultConnectionPool$StateAndGeneration.lambda$pauseAndIncrementGeneration$0(DefaultConnectionPool.java:1491)
    at com.mongodb.internal.connection.ConcurrentPool$StateAndPermits.throwIfClosedOrPaused(ConcurrentPool.java:498)
    at com.mongodb.internal.connection.ConcurrentPool$StateAndPermits.acquirePermit(ConcurrentPool.java:401)
    at com.mongodb.internal.connection.ConcurrentPool.get(ConcurrentPool.java:152)
    at com.mongodb.internal.connection.DefaultConnectionPool.getPooledConnection(DefaultConnectionPool.java:304)
    at com.mongodb.internal.connection.DefaultConnectionPool.lambda$getAsync$1(DefaultConnectionPool.java:214)
    ... 10 common frames omitted
    Suppressed: com.mongodb.MongoConnectionPoolClearedException: Connection pool for ServerId{clusterId=ClusterId{value='64de2b2fa0992c193b36462f', description='joom-ranker'}, address=10-0-53-94.backend-mongos-production-user-critical.backend-mongos.svc.cluster.local:27017} is paused because another operation failed
        ... 16 common frames omitted
    Caused by: com.mongodb.MongoSocketReadTimeoutException: Timeout while receiving message
        at com.mongodb.internal.connection.AsynchronousChannelStream$BasicCompletionHandler.failed(AsynchronousChannelStream.java:263)
        at com.mongodb.internal.connection.AsynchronousChannelStream$BasicCompletionHandler.failed(AsynchronousChannelStream.java:233)
        at java.base/sun.nio.ch.Invoker.invokeUnchecked(Invoker.java:129)
        at java.base/sun.nio.ch.Invoker$2.run(Invoker.java:219)
        at java.base/sun.nio.ch.AsynchronousChannelGroupImpl$1.run(AsynchronousChannelGroupImpl.java:112)
        ... 3 common frames omitted
    Caused by: java.nio.channels.InterruptedByTimeoutException: null
        at java.base/sun.nio.ch.UnixAsynchronousSocketChannelImpl$1.run(UnixAsynchronousSocketChannelImpl.java:465)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
        ... 3 common frames omitted
Caused by: com.mongodb.MongoSocketReadTimeoutException: Timeout while receiving message
    at com.mongodb.internal.connection.AsynchronousChannelStream$BasicCompletionHandler.failed(AsynchronousChannelStream.java:263)
    at com.mongodb.internal.connection.AsynchronousChannelStream$BasicCompletionHandler.failed(AsynchronousChannelStream.java:233)
    at java.base/sun.nio.ch.Invoker.invokeUnchecked(Invoker.java:129)
    at java.base/sun.nio.ch.Invoker$2.run(Invoker.java:219)
    at java.base/sun.nio.ch.AsynchronousChannelGroupImpl$1.run(AsynchronousChannelGroupImpl.java:112)
    ... 3 common frames omitted
Caused by: java.nio.channels.InterruptedByTimeoutException: null
    at java.base/sun.nio.ch.UnixAsynchronousSocketChannelImpl$1.run(UnixAsynchronousSocketChannelImpl.java:465)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
    at … 

Here is another error which is not retried, though I believe that it should be:

org.springframework.data.mongodb.UncategorizedMongoDbException: The server at 10-0-48-200.backend-mongos-production-order.backend-mongos.svc.cluster.local:27017 is no longer available; nested exception is com.mongodb.MongoServerUnavailableException: The server at 10-0-48-200.backend-mongos-production-order.backend-mongos.svc.cluster.local:27017 is no longer available
    at org.springframework.data.mongodb.core.MongoExceptionTranslator.translateExceptionIfPossible(MongoExceptionTranslator.java:133)
    at org.springframework.data.mongodb.core.ReactiveMongoTemplate.potentiallyConvertRuntimeException(ReactiveMongoTemplate.java:2814)
    at org.springframework.data.mongodb.core.ReactiveMongoTemplate.lambda$translateException$90(ReactiveMongoTemplate.java:2797)
    at reactor.core.publisher.Flux.lambda$onErrorMap$28(Flux.java:6568)
    at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onError(FluxOnErrorResume.java:88)
    at reactor.core.publisher.MonoFlatMapMany$FlatMapManyInner.onError(MonoFlatMapMany.java:247)
    at reactor.core.publisher.FluxConcatMap$ConcatMapImmediate.onError(FluxConcatMap.java:256)
    at reactor.core.publisher.FluxCreate$BaseSink.error(FluxCreate.java:452)
    at reactor.core.publisher.FluxCreate$BufferAsyncSink.drain(FluxCreate.java:781)
    at reactor.core.publisher.FluxCreate$BufferAsyncSink.error(FluxCreate.java:726)
    at reactor.core.publisher.FluxCreate$SerializedSink.drainLoop(FluxCreate.java:229)
    at reactor.core.publisher.FluxCreate$SerializedSink.drain(FluxCreate.java:205)
    at reactor.core.publisher.FluxCreate$SerializedSink.error(FluxCreate.java:181)
    at reactor.core.publisher.LambdaMonoSubscriber.doError(LambdaMonoSubscriber.java:152)
    at reactor.core.publisher.LambdaMonoSubscriber.onError(LambdaMonoSubscriber.java:147)
    at reactor.core.publisher.FluxMap$MapSubscriber.onError(FluxMap.java:126)
    at reactor.core.publisher.MonoNext$NextSubscriber.onError(MonoNext.java:87)
    at reactor.core.publisher.MonoNext$NextSubscriber.onError(MonoNext.java:87)
    at reactor.core.publisher.MonoFlatMap$FlatMapMain.secondError(MonoFlatMap.java:185)
    at reactor.core.publisher.MonoFlatMap$FlatMapInner.onError(MonoFlatMap.java:251)
    at reactor.core.publisher.MonoPeekTerminal$MonoTerminalPeekSubscriber.onError(MonoPeekTerminal.java:251)
    at reactor.core.publisher.MonoCreate$DefaultMonoSink.error(MonoCreate.java:190)
    at com.mongodb.reactivestreams.client.internal.MongoOperationPublisher.lambda$sinkToCallback$31(MongoOperationPublisher.java:570)
    at com.mongodb.reactivestreams.client.internal.OperationExecutorImpl.lambda$execute$2(OperationExecutorImpl.java:92)
    at com.mongodb.internal.async.ErrorHandlingResultCallback.onResult(ErrorHandlingResultCallback.java:48)
    at com.mongodb.internal.async.function.AsyncCallbackSupplier.lambda$whenComplete$1(AsyncCallbackSupplier.java:95)
    at com.mongodb.internal.async.function.RetryingAsyncCallbackSupplier$RetryingCallback.onResult(RetryingAsyncCallbackSupplier.java:109)
    at com.mongodb.internal.async.ErrorHandlingResultCallback.onResult(ErrorHandlingResultCallback.java:48)
    at com.mongodb.internal.async.function.AsyncCallbackSupplier.lambda$whenComplete$1(AsyncCallbackSupplier.java:95)
    at com.mongodb.internal.async.ErrorHandlingResultCallback.onResult(ErrorHandlingResultCallback.java:48)
    at com.mongodb.internal.operation.OperationHelper.lambda$withAsyncSuppliedResource$7(OperationHelper.java:640)
    at com.mongodb.internal.connection.DefaultServer$1.onResult(DefaultServer.java:130)
    at com.mongodb.internal.connection.DefaultServer$1.onResult(DefaultServer.java:122)
    at com.mongodb.internal.async.ErrorHandlingResultCallback.onResult(ErrorHandlingResultCallback.java:48)
    at com.mongodb.internal.connection.DefaultConnectionPool.lambda$getAsync$0(DefaultConnectionPool.java:199)
    at com.mongodb.internal.connection.DefaultConnectionPool.lambda$getAsync$1(DefaultConnectionPool.java:216)
    at com.mongodb.internal.connection.DefaultConnectionPool$Task.doComplete(DefaultConnectionPool.java:1382)
    at com.mongodb.internal.connection.DefaultConnectionPool$Task.execute(DefaultConnectionPool.java:1368)
    at com.mongodb.internal.connection.DefaultConnectionPool$AsyncWorkManager.workerRun(DefaultConnectionPool.java:1309)
    at com.mongodb.internal.connection.DefaultConnectionPool$AsyncWorkManager.runAndLogUncaught(DefaultConnectionPool.java:1339)
    at com.mongodb.internal.connection.DefaultConnectionPool$AsyncWorkManager.lambda$initUnlessClosed$0(DefaultConnectionPool.java:1274)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: com.mongodb.MongoServerUnavailableException: The server at 10-0-48-200.backend-mongos-production-order.backend-mongos.svc.cluster.local:27017 is no longer available
    at com.mongodb.internal.connection.ConcurrentPool.poolClosedException(ConcurrentPool.java:297)
    at com.mongodb.internal.connection.ConcurrentPool$StateAndPermits.throwIfClosedOrPaused(ConcurrentPool.java:495)
    at com.mongodb.internal.connection.ConcurrentPool$StateAndPermits.acquirePermit(ConcurrentPool.java:401)
    at com.mongodb.internal.connection.ConcurrentPool.get(ConcurrentPool.java:152)
    at com.mongodb.internal.connection.DefaultConnectionPool.getPooledConnection(DefaultConnectionPool.java:304)
    at com.mongodb.internal.connection.DefaultConnectionPool.lambda$getAsync$1(DefaultConnectionPool.java:214)
    ... 10 common frames omitted 



 Comments   
Comment by Valentin Kavalenka [ 31/Aug/23 ]

Thanks, I see. The reason we are reluctant to start reporting failures from all attempts in the suppressed list is because we are implementing the client side operations timeout functionality, which will result in the number of attempts being dependent on this single timeout, rather than being limited by two attempts, as it currently is. Given a large or "infinite" timeout, and an unfortunate installation/network health, the number of attempts may be large, and we will have to limit the size of the reported suppressed list somehow. We would also need to consider doing such a change in all drivers. At this point the change does not seem worth it.

Should not the MongoServerUnavailableException be retried?

This would have been an improvement. I created DRIVERS-2712 Retry commands that fail because of the selected server being concurrently removed from the topology, so that we could make a decision on the improvement, and you could track what happens.

Comment by Nikita Sokolov [ 28/Aug/23 ]

No, we don't try to determine programmatically if the retries have happened, but when we are logging the exception (with suppressed retries) we can trace it to the exact request that got it and I don't think it is possible with the logs. Also, enabling debug logs would slow our application.

No, I don't think that our company is a MongoDB customer.

Should not the MongoServerUnavailableException be retried?

Comment by Valentin Kavalenka [ 28/Aug/23 ]

Hi faucct@joom.com

With thrown exceptions only showing one of errors, it is hard to tell whether the operations are being retried or not.

The driver logs "Unable to retry ..." when it does not retry an operation (I mentioned this here). When you say that having exceptions from all failed attempts in the suppressed list makes it easier to understand whether an operation was retried, it suggests to me that you are trying to determine this programmatically based on the exception thrown, as opposed to by analyzing logs. Is that the case?

  • If yes, I could you tell us more about your situation? This will hopefully allow us to make an informed decision.
  • If no, and you only analyze logs, then could you elaborate on why the "Unable to retry ..." / "Retrying ..." log messages emitted by the driver are not enough?

Note also that if your company is a MongoDB customer, you could open a support case, which may increase the visibility of the request.

Comment by PM Bot [ 23/Aug/23 ]

Hi faucct@joom.com, thank you for reporting this issue! The team will look into it and get back to you soon.

Generated at Thu Feb 08 09:03:47 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.