[JAVA-4387] MongoDb returns message length larger than the max message length Created: 03/Nov/21  Updated: 27/Oct/23  Resolved: 30/Nov/21

Status: Closed
Project: Java Driver
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: 运军 蒋 Assignee: Valentin Kavalenka
Resolution: Gone away Votes: 0
Labels: external-user
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Zip Archive POS-vin11c9a-eth0-2021_1020_0622_27.pcap.zip     PNG File image-2021-11-03-14-46-40-340.png     Text File service_1.log     Text File service_2.log    
Issue Links:
Cloners
clones JAVA-4210 MongoDb returns message length larger... Closed
Case:
Documentation Changes: Not Needed

 Description   

Hi guys,

We are facing an issue that happened several times a day.  Our java driver version is 4.0.5 and Mongo DB version is 4.0.9. We have 2 routers and several nodes. And we are using reactive stream.  This issue only happens in our production env. It usually first occurred while MongodbHealthIndicator checking the connections with mongo db. Please see the stacktrace as below:

org.springframework.dao.InvalidDataAccessResourceUsageException: The reply message length 1013281644 is greater than the maximum message length 48000000; nested exception is com.mongodb.MongoInternalException: The reply message length 1013281644 is greater than the maximum message length 48000000org.springframework.dao.InvalidDataAccessResourceUsageException: The reply message length 1013281644 is greater than the maximum message length 48000000; nested exception is com.mongodb.MongoInternalException: The reply message length 1013281644 is greater than the maximum message length 48000000 at org.springframework.data.mongodb.core.MongoExceptionTranslator.translateExceptionIfPossible(MongoExceptionTranslator.java:92) Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException: Error has been observed at the following site(s): |_ checkpoint ⇢ com.active.platform.modules.pos.service.config.ContextFilter [DefaultWebFilterChain] |_ checkpoint ⇢ org.springframework.security.web.server.authorization.AuthorizationWebFilter [DefaultWebFilterChain] |_ checkpoint ⇢ org.springframework.security.web.server.authorization.ExceptionTranslationWebFilter [DefaultWebFilterChain] |_ checkpoint ⇢ org.springframework.security.web.server.authentication.logout.LogoutWebFilter [DefaultWebFilterChain] |_ checkpoint ⇢ org.springframework.security.web.server.savedrequest.ServerRequestCacheWebFilter [DefaultWebFilterChain] |_ checkpoint ⇢ org.springframework.security.web.server.context.SecurityContextServerWebExchangeWebFilter [DefaultWebFilterChain] |_ checkpoint ⇢ org.springframework.security.web.server.authentication.AuthenticationWebFilter [DefaultWebFilterChain] |_ checkpoint ⇢ org.springframework.security.web.server.context.ReactorContextWebFilter [DefaultWebFilterChain] |_ checkpoint ⇢ org.springframework.security.web.server.header.HttpHeaderWriterWebFilter [DefaultWebFilterChain] |_ checkpoint ⇢ org.springframework.security.config.web.server.ServerHttpSecurity$ServerWebExchangeReactorContextWebFilter [DefaultWebFilterChain] |_ checkpoint ⇢ org.springframework.security.web.server.WebFilterChainProxy [DefaultWebFilterChain] |_ checkpoint ⇢ org.springframework.boot.actuate.metrics.web.reactive.server.MetricsWebFilter [DefaultWebFilterChain] |_ checkpoint ⇢ HTTP GET "/agencies/f953b9de-2260-4833-bf32-d7be2a9ae945/catalogs" [ExceptionHandlingWebHandler]Stack trace: at org.springframework.data.mongodb.core.MongoExceptionTranslator.translateExceptionIfPossible(MongoExceptionTranslator.java:92) at org.springframework.data.mongodb.core.ReactiveMongoTemplate.potentiallyConvertRuntimeException(ReactiveMongoTemplate.java:2814) at org.springframework.data.mongodb.core.ReactiveMongoTemplate.lambda$translateException$90(ReactiveMongoTemplate.java:2797) at reactor.core.publisher.Mono.lambda$onErrorMap$30(Mono.java:3286) at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onError(FluxOnErrorResume.java:88) at reactor.core.publisher.MonoFlatMap$FlatMapMain.secondError(MonoFlatMap.java:185) at reactor.core.publisher.MonoFlatMap$FlatMapInner.onError(MonoFlatMap.java:251) at reactor.core.publisher.MonoFlatMap$FlatMapMain.onError(MonoFlatMap.java:165) at reactor.core.publisher.MonoNext$NextSubscriber.onError(MonoNext.java:87) at reactor.core.publisher.MonoNext$NextSubscriber.onError(MonoNext.java:87) at com.mongodb.reactivestreams.client.internal.AbstractSubscription.onError(AbstractSubscription.java:142) at com.mongodb.reactivestreams.client.internal.SingleResultCallbackSubscription.lambda$requestInitialData$0(SingleResultCallbackSubscription.java:41) at com.mongodb.internal.async.client.AsyncFindIterableImpl$1.onResult(AsyncFindIterableImpl.java:188) at com.mongodb.internal.async.client.AsyncFindIterableImpl$1.onResult(AsyncFindIterableImpl.java:184) at com.mongodb.internal.async.ErrorHandlingResultCallback.onResult(ErrorHandlingResultCallback.java:48) at com.mongodb.internal.async.client.OperationExecutorImpl$1$1$1.onResult(OperationExecutorImpl.java:92) at com.mongodb.internal.async.ErrorHandlingResultCallback.onResult(ErrorHandlingResultCallback.java:48) at com.mongodb.internal.operation.FindOperation$3.onResult(FindOperation.java:724) at com.mongodb.internal.operation.OperationHelper$ReferenceCountedReleasingWrappedCallback.onResult(OperationHelper.java:411) at com.mongodb.internal.operation.CommandOperationHelper$10.checkRetryableException(CommandOperationHelper.java:493) at com.mongodb.internal.operation.CommandOperationHelper$10.onResult(CommandOperationHelper.java:478) at com.mongodb.internal.async.ErrorHandlingResultCallback.onResult(ErrorHandlingResultCallback.java:48) at com.mongodb.internal.connection.DefaultServer$DefaultServerProtocolExecutor$2.onResult(DefaultServer.java:248) at com.mongodb.internal.async.ErrorHandlingResultCallback.onResult(ErrorHandlingResultCallback.java:48) at com.mongodb.internal.connection.CommandProtocolImpl$1.onResult(CommandProtocolImpl.java:82) at com.mongodb.internal.connection.DefaultConnectionPool$PooledConnection$2.onResult(DefaultConnectionPool.java:517) at com.mongodb.internal.connection.UsageTrackingInternalConnection$2.onResult(UsageTrackingInternalConnection.java:111) at com.mongodb.internal.async.ErrorHandlingResultCallback.onResult(ErrorHandlingResultCallback.java:48) at com.mongodb.internal.connection.InternalStreamConnection$2$1.onResult(InternalStreamConnection.java:380) at com.mongodb.internal.connection.InternalStreamConnection$2$1.onResult(InternalStreamConnection.java:375) at com.mongodb.internal.connection.InternalStreamConnection$MessageHeaderCallback.onResult(InternalStreamConnection.java:635) at com.mongodb.internal.connection.InternalStreamConnection$MessageHeaderCallback.onResult(InternalStreamConnection.java:618) at com.mongodb.internal.connection.InternalStreamConnection$5.completed(InternalStreamConnection.java:513) at com.mongodb.internal.connection.InternalStreamConnection$5.completed(InternalStreamConnection.java:510) at com.mongodb.connection.netty.NettyStream.readAsync(NettyStream.java:232) at com.mongodb.internal.connection.InternalStreamConnection.readAsync(InternalStreamConnection.java:510) at com.mongodb.internal.connection.InternalStreamConnection.access$1000(InternalStreamConnection.java:75) at com.mongodb.internal.connection.InternalStreamConnection$2.onResult(InternalStreamConnection.java:375) at com.mongodb.internal.connection.InternalStreamConnection$2.onResult(InternalStreamConnection.java:364) at com.mongodb.internal.async.ErrorHandlingResultCallback.onResult(ErrorHandlingResultCallback.java:48) at com.mongodb.internal.connection.InternalStreamConnection$3.completed(InternalStreamConnection.java:467) at com.mongodb.internal.connection.InternalStreamConnection$3.completed(InternalStreamConnection.java:464) at com.mongodb.connection.netty.NettyStream$2.operationComplete(NettyStream.java:187) at com.mongodb.connection.netty.NettyStream$2.operationComplete(NettyStream.java:181) at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:577) at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:551) at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:490) at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:615) at io.netty.util.concurrent.DefaultPromise.setSuccess0(DefaultPromise.java:604) at io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:104) at io.netty.util.internal.PromiseNotificationUtil.trySuccess(PromiseNotificationUtil.java:48) at io.netty.channel.ChannelOutboundBuffer.safeSuccess(ChannelOutboundBuffer.java:717) at io.netty.channel.ChannelOutboundBuffer.remove(ChannelOutboundBuffer.java:272) at io.netty.channel.ChannelOutboundBuffer.removeBytes(ChannelOutboundBuffer.java:352) at io.netty.channel.socket.nio.NioSocketChannel.doWrite(NioSocketChannel.java:414) at io.netty.channel.AbstractChannel$AbstractUnsafe.flush0(AbstractChannel.java:934) at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.flush0(AbstractNioChannel.java:354) at io.netty.channel.AbstractChannel$AbstractUnsafe.flush(AbstractChannel.java:898) at io.netty.channel.DefaultChannelPipeline$HeadContext.flush(DefaultChannelPipeline.java:1372) at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:750) at io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:765) at io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1071) at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:497) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:748)Caused by: com.mongodb.MongoInternalException: The reply message length 1013281644 is greater than the maximum message length 48000000 at com.mongodb.internal.connection.MessageHeader.<init>(MessageHeader.java:41) at com.mongodb.internal.connection.InternalStreamConnection$MessageHeaderCallback.onResult(InternalStreamConnection.java:632) at com.mongodb.internal.connection.InternalStreamConnection$MessageHeaderCallback.onResult(InternalStreamConnection.java:618) at com.mongodb.internal.connection.InternalStreamConnection$5.completed(InternalStreamConnection.java:513) at com.mongodb.internal.connection.InternalStreamConnection$5.completed(InternalStreamConnection.java:510) at com.mongodb.connection.netty.NettyStream.readAsync(NettyStream.java:232) at com.mongodb.internal.connection.InternalStreamConnection.readAsync(InternalStreamConnection.java:510) at com.mongodb.internal.connection.InternalStreamConnection.access$1000(InternalStreamConnection.java:75) at com.mongodb.internal.connection.InternalStreamConnection$2.onResult(InternalStreamConnection.java:375) at com.mongodb.internal.connection.InternalStreamConnection$2.onResult(InternalStreamConnection.java:364) at com.mongodb.internal.async.ErrorHandlingResultCallback.onResult(ErrorHandlingResultCallback.java:48) at com.mongodb.internal.connection.InternalStreamConnection$3.completed(InternalStreamConnection.java:467) at com.mongodb.internal.connection.InternalStreamConnection$3.completed(InternalStreamConnection.java:464) at com.mongodb.connection.netty.NettyStream$2.operationComplete(NettyStream.java:187) at com.mongodb.connection.netty.NettyStream$2.operationComplete(NettyStream.java:181) at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:577) at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:551) at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:490) at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:615) at io.netty.util.concurrent.DefaultPromise.setSuccess0(DefaultPromise.java:604) at io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:104) at io.netty.util.internal.PromiseNotificationUtil.trySuccess(PromiseNotificationUtil.java:48) at io.netty.channel.ChannelOutboundBuffer.safeSuccess(ChannelOutboundBuffer.java:717) at io.netty.channel.ChannelOutboundBuffer.remove(ChannelOutboundBuffer.java:272) at io.netty.channel.ChannelOutboundBuffer.removeBytes(ChannelOutboundBuffer.java:352) at io.netty.channel.socket.nio.NioSocketChannel.doWrite(NioSocketChannel.java:414) at io.netty.channel.AbstractChannel$AbstractUnsafe.flush0(AbstractChannel.java:934) at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.flush0(AbstractNioChannel.java:354) at io.netty.channel.AbstractChannel$AbstractUnsafe.flush(AbstractChannel.java:898) at io.netty.channel.DefaultChannelPipeline$HeadContext.flush(DefaultChannelPipeline.java:1372) at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:750) at io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:765) at io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1071) at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:497) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:748)

 
After about 20+ times of this error(the reply message length is different every time). The error will become

 
The responseTo (2342443) in the response does not match the requestId (2342522) in the request

*The responseTo is always smaller than the request id *

The full stack trace is

org.springframework.dao.InvalidDataAccessResourceUsageException: The responseTo (2342443) in the response does not match the requestId (2342522) in the request; nested exception is com.mongodb.MongoInternalException: The responseTo (2342443) in the response does not match the requestId (2342522) in the request
	at org.springframework.data.mongodb.core.MongoExceptionTranslator.translateExceptionIfPossible(MongoExceptionTranslator.java:92)
	Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException: 
Error has been observed at the following site(s):
	|_ checkpoint ⇢ com.active.platform.modules.pos.service.config.ContextFilter [DefaultWebFilterChain]
	|_ checkpoint ⇢ org.springframework.security.web.server.authorization.AuthorizationWebFilter [DefaultWebFilterChain]
	|_ checkpoint ⇢ org.springframework.security.web.server.authorization.ExceptionTranslationWebFilter [DefaultWebFilterChain]
	|_ checkpoint ⇢ org.springframework.security.web.server.authentication.logout.LogoutWebFilter [DefaultWebFilterChain]
	|_ checkpoint ⇢ org.springframework.security.web.server.savedrequest.ServerRequestCacheWebFilter [DefaultWebFilterChain]
	|_ checkpoint ⇢ org.springframework.security.web.server.context.SecurityContextServerWebExchangeWebFilter [DefaultWebFilterChain]
	|_ checkpoint ⇢ org.springframework.security.web.server.authentication.AuthenticationWebFilter [DefaultWebFilterChain]
	|_ checkpoint ⇢ org.springframework.security.web.server.context.ReactorContextWebFilter [DefaultWebFilterChain]
	|_ checkpoint ⇢ org.springframework.security.web.server.header.HttpHeaderWriterWebFilter [DefaultWebFilterChain]
	|_ checkpoint ⇢ org.springframework.security.config.web.server.ServerHttpSecurity$ServerWebExchangeReactorContextWebFilter [DefaultWebFilterChain]
	|_ checkpoint ⇢ org.springframework.security.web.server.WebFilterChainProxy [DefaultWebFilterChain]
	|_ checkpoint ⇢ org.springframework.boot.actuate.metrics.web.reactive.server.MetricsWebFilter [DefaultWebFilterChain]
	|_ checkpoint ⇢ HTTP POST "/agencies/3c91bc7d-2581-46b8-ac39-9b46cdbb7bc4/catalogs" [ExceptionHandlingWebHandler]
Stack trace:
		at org.springframework.data.mongodb.core.MongoExceptionTranslator.translateExceptionIfPossible(MongoExceptionTranslator.java:92)
		at org.springframework.data.mongodb.core.ReactiveMongoTemplate.potentiallyConvertRuntimeException(ReactiveMongoTemplate.java:2814)
		at org.springframework.data.mongodb.core.ReactiveMongoTemplate.lambda$translateException$90(ReactiveMongoTemplate.java:2797)
		at reactor.core.publisher.Mono.lambda$onErrorMap$30(Mono.java:3286)
		at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onError(FluxOnErrorResume.java:88)
		at reactor.core.publisher.MonoFlatMap$FlatMapMain.secondError(MonoFlatMap.java:185)
		at reactor.core.publisher.MonoFlatMap$FlatMapInner.onError(MonoFlatMap.java:251)
		at reactor.core.publisher.MonoFlatMap$FlatMapMain.onError(MonoFlatMap.java:165)
		at reactor.core.publisher.MonoNext$NextSubscriber.onError(MonoNext.java:87)
		at reactor.core.publisher.MonoNext$NextSubscriber.onError(MonoNext.java:87)
		at com.mongodb.reactivestreams.client.internal.AbstractSubscription.onError(AbstractSubscription.java:142)
		at com.mongodb.reactivestreams.client.internal.SingleResultCallbackSubscription.lambda$requestInitialData$0(SingleResultCallbackSubscription.java:41)
		at com.mongodb.internal.async.client.AsyncFindIterableImpl$1.onResult(AsyncFindIterableImpl.java:188)
		at com.mongodb.internal.async.client.AsyncFindIterableImpl$1.onResult(AsyncFindIterableImpl.java:184)
		at com.mongodb.internal.async.ErrorHandlingResultCallback.onResult(ErrorHandlingResultCallback.java:48)
		at com.mongodb.internal.async.client.OperationExecutorImpl$1$1$1.onResult(OperationExecutorImpl.java:92)
		at com.mongodb.internal.async.ErrorHandlingResultCallback.onResult(ErrorHandlingResultCallback.java:48)
		at com.mongodb.internal.operation.FindOperation$3.onResult(FindOperation.java:724)
		at com.mongodb.internal.operation.OperationHelper$ReferenceCountedReleasingWrappedCallback.onResult(OperationHelper.java:411)
		at com.mongodb.internal.operation.CommandOperationHelper$10.checkRetryableException(CommandOperationHelper.java:493)
		at com.mongodb.internal.operation.CommandOperationHelper$10.onResult(CommandOperationHelper.java:478)
		at com.mongodb.internal.async.ErrorHandlingResultCallback.onResult(ErrorHandlingResultCallback.java:48)
		at com.mongodb.internal.connection.DefaultServer$DefaultServerProtocolExecutor$2.onResult(DefaultServer.java:248)
		at com.mongodb.internal.async.ErrorHandlingResultCallback.onResult(ErrorHandlingResultCallback.java:48)
		at com.mongodb.internal.connection.CommandProtocolImpl$1.onResult(CommandProtocolImpl.java:82)
		at com.mongodb.internal.connection.DefaultConnectionPool$PooledConnection$2.onResult(DefaultConnectionPool.java:517)
		at com.mongodb.internal.connection.UsageTrackingInternalConnection$2.onResult(UsageTrackingInternalConnection.java:111)
		at com.mongodb.internal.async.ErrorHandlingResultCallback.onResult(ErrorHandlingResultCallback.java:48)
		at com.mongodb.internal.connection.InternalStreamConnection$2$1.onResult(InternalStreamConnection.java:400)
		at com.mongodb.internal.connection.InternalStreamConnection$2$1.onResult(InternalStreamConnection.java:375)
		at com.mongodb.internal.connection.InternalStreamConnection$MessageHeaderCallback$MessageCallback.onResult(InternalStreamConnection.java:676)
		at com.mongodb.internal.connection.InternalStreamConnection$MessageHeaderCallback$MessageCallback.onResult(InternalStreamConnection.java:643)
		at com.mongodb.internal.connection.InternalStreamConnection$5.completed(InternalStreamConnection.java:513)
		at com.mongodb.internal.connection.InternalStreamConnection$5.completed(InternalStreamConnection.java:510)
		at com.mongodb.connection.netty.NettyStream.readAsync(NettyStream.java:232)
		at com.mongodb.internal.connection.InternalStreamConnection.readAsync(InternalStreamConnection.java:510)
		at com.mongodb.internal.connection.InternalStreamConnection.access$1000(InternalStreamConnection.java:75)
		at com.mongodb.internal.connection.InternalStreamConnection$MessageHeaderCallback.onResult(InternalStreamConnection.java:633)
		at com.mongodb.internal.connection.InternalStreamConnection$MessageHeaderCallback.onResult(InternalStreamConnection.java:618)
		at com.mongodb.internal.connection.InternalStreamConnection$5.completed(InternalStreamConnection.java:513)
		at com.mongodb.internal.connection.InternalStreamConnection$5.completed(InternalStreamConnection.java:510)
		at com.mongodb.connection.netty.NettyStream.readAsync(NettyStream.java:232)
		at com.mongodb.internal.connection.InternalStreamConnection.readAsync(InternalStreamConnection.java:510)
		at com.mongodb.internal.connection.InternalStreamConnection.access$1000(InternalStreamConnection.java:75)
		at com.mongodb.internal.connection.InternalStreamConnection$2.onResult(InternalStreamConnection.java:375)
		at com.mongodb.internal.connection.InternalStreamConnection$2.onResult(InternalStreamConnection.java:364)
		at com.mongodb.internal.async.ErrorHandlingResultCallback.onResult(ErrorHandlingResultCallback.java:48)
		at com.mongodb.internal.connection.InternalStreamConnection$3.completed(InternalStreamConnection.java:467)
		at com.mongodb.internal.connection.InternalStreamConnection$3.completed(InternalStreamConnection.java:464)
		at com.mongodb.connection.netty.NettyStream$2.operationComplete(NettyStream.java:187)
		at com.mongodb.connection.netty.NettyStream$2.operationComplete(NettyStream.java:181)
		at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:577)
		at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:551)
		at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:490)
		at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:615)
		at io.netty.util.concurrent.DefaultPromise.setSuccess0(DefaultPromise.java:604)
		at io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:104)
		at io.netty.util.internal.PromiseNotificationUtil.trySuccess(PromiseNotificationUtil.java:48)
		at io.netty.channel.ChannelOutboundBuffer.safeSuccess(ChannelOutboundBuffer.java:717)
		at io.netty.channel.ChannelOutboundBuffer.remove(ChannelOutboundBuffer.java:272)
		at io.netty.channel.ChannelOutboundBuffer.removeBytes(ChannelOutboundBuffer.java:352)
		at io.netty.channel.socket.nio.NioSocketChannel.doWrite(NioSocketChannel.java:414)
		at io.netty.channel.AbstractChannel$AbstractUnsafe.flush0(AbstractChannel.java:934)
		at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.flush0(AbstractNioChannel.java:354)
		at io.netty.channel.AbstractChannel$AbstractUnsafe.flush(AbstractChannel.java:898)
		at io.netty.channel.DefaultChannelPipeline$HeadContext.flush(DefaultChannelPipeline.java:1372)
		at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:750)
		at io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:765)
		at io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1071)
		at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
		at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
		at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
		at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
		at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
		at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
		at java.lang.Thread.run(Thread.java:748)
Caused by: com.mongodb.MongoInternalException: The responseTo (2342443) in the response does not match the requestId (2342522) in the request
	at com.mongodb.internal.connection.ReplyMessage.<init>(ReplyMessage.java:65)
	at com.mongodb.internal.connection.ReplyMessage.<init>(ReplyMessage.java:43)
	at com.mongodb.internal.connection.ResponseBuffers.getResponseDocument(ResponseBuffers.java:47)
	at com.mongodb.internal.connection.LoggingCommandEventSender.sendSucceededEvent(LoggingCommandEventSender.java:141)
	at com.mongodb.internal.connection.InternalStreamConnection$2$1.onResult(InternalStreamConnection.java:395)
	at com.mongodb.internal.connection.InternalStreamConnection$2$1.onResult(InternalStreamConnection.java:375)
	at com.mongodb.internal.connection.InternalStreamConnection$MessageHeaderCallback$MessageCallback.onResult(InternalStreamConnection.java:676)
	at com.mongodb.internal.connection.InternalStreamConnection$MessageHeaderCallback$MessageCallback.onResult(InternalStreamConnection.java:643)
	at com.mongodb.internal.connection.InternalStreamConnection$5.completed(InternalStreamConnection.java:513)
	at com.mongodb.internal.connection.InternalStreamConnection$5.completed(InternalStreamConnection.java:510)
	at com.mongodb.connection.netty.NettyStream.readAsync(NettyStream.java:232)
	at com.mongodb.internal.connection.InternalStreamConnection.readAsync(InternalStreamConnection.java:510)
	at com.mongodb.internal.connection.InternalStreamConnection.access$1000(InternalStreamConnection.java:75)
	at com.mongodb.internal.connection.InternalStreamConnection$MessageHeaderCallback.onResult(InternalStreamConnection.java:633)
	at com.mongodb.internal.connection.InternalStreamConnection$MessageHeaderCallback.onResult(InternalStreamConnection.java:618)
	at com.mongodb.internal.connection.InternalStreamConnection$5.completed(InternalStreamConnection.java:513)
	at com.mongodb.internal.connection.InternalStreamConnection$5.completed(InternalStreamConnection.java:510)
	at com.mongodb.connection.netty.NettyStream.readAsync(NettyStream.java:232)
	at com.mongodb.internal.connection.InternalStreamConnection.readAsync(InternalStreamConnection.java:510)
	at com.mongodb.internal.connection.InternalStreamConnection.access$1000(InternalStreamConnection.java:75)
	at com.mongodb.internal.connection.InternalStreamConnection$2.onResult(InternalStreamConnection.java:375)
	at com.mongodb.internal.connection.InternalStreamConnection$2.onResult(InternalStreamConnection.java:364)
	at com.mongodb.internal.async.ErrorHandlingResultCallback.onResult(ErrorHandlingResultCallback.java:48)
	at com.mongodb.internal.connection.InternalStreamConnection$3.completed(InternalStreamConnection.java:467)
	at com.mongodb.internal.connection.InternalStreamConnection$3.completed(InternalStreamConnection.java:464)
	at com.mongodb.connection.netty.NettyStream$2.operationComplete(NettyStream.java:187)
	at com.mongodb.connection.netty.NettyStream$2.operationComplete(NettyStream.java:181)
	at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:577)
	at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:551)
	at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:490)
	at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:615)
	at io.netty.util.concurrent.DefaultPromise.setSuccess0(DefaultPromise.java:604)
	at io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:104)
	at io.netty.util.internal.PromiseNotificationUtil.trySuccess(PromiseNotificationUtil.java:48)
	at io.netty.channel.ChannelOutboundBuffer.safeSuccess(ChannelOutboundBuffer.java:717)
	at io.netty.channel.ChannelOutboundBuffer.remove(ChannelOutboundBuffer.java:272)
	at io.netty.channel.ChannelOutboundBuffer.removeBytes(ChannelOutboundBuffer.java:352)
	at io.netty.channel.socket.nio.NioSocketChannel.doWrite(NioSocketChannel.java:414)
	at io.netty.channel.AbstractChannel$AbstractUnsafe.flush0(AbstractChannel.java:934)
	at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.flush0(AbstractNioChannel.java:354)
	at io.netty.channel.AbstractChannel$AbstractUnsafe.flush(AbstractChannel.java:898)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.flush(DefaultChannelPipeline.java:1372)
	at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:750)
	at io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:765)
	at io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1071)
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.lang.Thread.run(Thread.java:748)

I investigated the related source code, I guess that when the first error happened. It seems that mongo has replied an unknown message to the driver. When the driver decode the message follow the wire protocol, it found that the message length is too big(reading the first 16 bytes from the stream) and throw an exception then ignored the rest bytes. The following requests will keep doing the same until meeting the first correct reply message but the responseTo in the reply message will be smaller than the current request id.

I can't reproduce this on my laptop. I'm wondering why the driver needs to check the message size or the responseTo is matching the request id. Is there a concern regarding this? Why not check if the reply message is matching the wire protocol?
I guess this issue is more likely a network issue or protocol issue. Could you advise me on how to find the root cause?



 Comments   
Comment by PM Bot [ 30/Nov/21 ]

There hasn't been any recent activity on this ticket, so we're resolving it. Thanks for reaching out! Please feel free to comment on this if you're able to provide more information.

Comment by Valentin Kavalenka [ 16/Nov/21 ]

Changing the status to "waiting for reporter". It will remain in this status for some time before being closed automatically if there is no activity. gloojiang86@gmail.com, if you discover evidence that points to a problem in the driver code and the ticket is closed by that time, please clone it again.

Comment by 运军 蒋 [ 16/Nov/21 ]

Yeah. I totally understand that. We'll keep investigating this. Thanks for your efforts.

Comment by Valentin Kavalenka [ 15/Nov/21 ]

Hi gloojiang86@gmail.com,

I understand that this problem is a difficult one to investigate, but I also do not see a way to continue investigating it without actively and directly investigating it in the environment where it is reproducible. This is beyond what the driver dev team can do, especially given that despite all your and our efforts there is still no compelling evidence that the problem is in the driver code. If you manage to eventually discover such evidence, please do report your findings.

Comment by 运军 蒋 [ 05/Nov/21 ]

Thanks for your quick reply on this.

  1. This issue is not reproducible when NeuVector is disabled. The only difference is when enabling NeuVector in k8s, the tso is disabled. 
  2. I haven't tried tls.
  3. We do have done the experiments that you mentioned. We grabbed all the inbound and outbound segments in three points. The 1st point is between our service and NeuVector Enforcer([^POS-vin11c9a-eth0-2021_1020_0622_27.pcap]). The 2nd point is between NeuVector Enforcer and Nic of our work node which hosts our service. The 3rd point is in the VM hosting our mongo db router. You could match the logs in service_1.log with [^POS-vin11c9a-eth0-2021_1020_0622_27.pcap] 
  4. Another finding is that if we add more CPU resources, the issue will have a lower chance to occur.  This matches what I find in JAVA-4122.
  5. Another information is that our API level framework is Spring Webflux, which also use netty as web server. My suspiction is that is it related to java NIO as when I relaced nettyStream with AsynchronousSocketChannelStream, the error message is still the same?

I was hoping I could find some corrupted tcp segments in the pcaps, But failed. You could look into the captured segments along with the service_1.log.

Thanks for your time.

Comment by Valentin Kavalenka [ 04/Nov/21 ]

Hi gloojiang86@gmail.com,

Thank you for providing more details on the matter. Before discussing them, I'd like to clarify whether you have been able to reproduce the problem with NeuVector disabled? Another question I have is whether you tried enabling TLS in the client? I know that this is not a simple task, but to my understanding doing so has a potential of telling us directly whether the network data gets corrupted or not.

1

If my understanding correct, you found four bytes composing a specific message header <messageLength=975335023, requestId=926299441, responseTo=959721525, opCode=960050487> in TCP segments sent to multiple client sockets, which suggests the bytes found in those segments do not represent the MongoDB message header, because requestId is unique within a client JVM process. However, even in the event that they all do, then the same message header was sent to different sockets, and, thus, it is not possible to derive that driver reading this header from the socket 10.42.19.195:53222 (as per service_1.log) necessary read data intended for a different socket, as the socket 10.42.19.195:53222 could have also been one of those where this message header was sent to (this is all assuming that the search results on image-2021-11-03-14-46-40-340.png are not exhaustive, and that the conversion to bytes was done correctly, i.e., according to the binary protocol, taking into account the correct endianness).

A better experiment that I can think of would be to grab all inbound TCP segments for a problematic client socket, and try to match them with message headers attached to operation events in the history recorded by the debugger (see events with the DECODE_MESSAGE_HEADER code). I think this experiment can give a more compelling answer to whether the driver read the same message headers that a network sniffer captured, or not. The advantage of this experiment is that with necessity some headers will be matched with inbound TCP data, confirming that the conversion to bytes is done correctly. However, even the proposed experiment may produce incorrect results depending on how Wireshark searches for data: it is theoretically possible for a MongoDB message header to be split between TCP segments, and if Wireshark searches for a byte string only within each captured segment, then it may not find the header. This is something to keep in mind when a header is not found by Wireshark.

The experiment you conducted and the experiment I proposed both answer the following question: "Does the driver read from a socket the data that comes from the network for that socket and only that data?". If the answer is "yes", then the data is corrupted and the point at which TCP segments are sniffed tells us that the data was corrupted before this point. If the answer is "no", then it means either that the driver (or libraries it uses) has a bug, or that the data was corrupted somewhere between the sniffing point and the driver. The last part is what makes it crucial to put the sniffing point as close to the driver as possible. E.g., if a client is working within a Docker container, then sniffing is better be done also within the container after the point at which NeuVector does its job. At which point was the data sniffed in your experiments?

2

My understanding is that you are still using org.mongodb:mongodb-driver-reactivestreams:4.0.5. Observing multiple consequent errors on the same socket is a problem that was identified and fixed in 4.4.0 as a result of your help in JAVA-4210, and was mentioned in a comment in JAVA-4210. That's why in service_2.log you can see the exception message

...runId=0899165a-4ceb-4ee0-94aa-4624a1820daa debuggerId=4 messageSequenceNumber=2, t=2021-10-25T05:42:54.026Z...
...connectionId=connectionId{localValue:4} ... localAddress=/10.42.0.244:42024...
...
1038:OperationEvent ... Detected an invalid reply message header MessageHeader{messageLength=1702063201, requestId=578497661, responseTo=1685025392, opCode=1232364405}
...
1041:OperationEvent ... Detected an invalid reply message header MessageHeader{messageLength=825893476, requestId=859322678, responseTo=842021172, opCode=1635131948}
...
}]}
]}
}

followed by the exception message for the same connection

...runId=0899165a-4ceb-4ee0-94aa-4624a1820daa debuggerId=4 messageSequenceNumber=3, t=2021-10-25T05:42:54.111Z...
...connectionId=connectionId{localValue:4} ... localAddress=/10.42.0.244:42024...
...
1038:OperationEvent ... Detected an invalid reply message header MessageHeader{messageLength=1702063201, requestId=578497661, responseTo=1685025392, opCode=1232364405}
...
1041:OperationEvent ... Detected an invalid reply message header MessageHeader{messageLength=825893476, requestId=859322678, responseTo=842021172, opCode=1635131948}
...
1044:OperationEvent ... Detected an invalid reply message header MessageHeader{messageLength=1952541042, requestId=1299083113, responseTo=1768976481, opCode=577988462}
...
}]}
]}
}

Once the driver failed to successfully read from a stream, it is supposed to close it, but sometimes it doesn't and instead continues reading from an incorrect position in the stream, and reads junk again and again. You can see that "Detected an invalid reply message header" events keep accumulating: the message with messageSequenceNumber=2 has 2 such events, while the message with messageSequenceNumber=3 has 2 of the same events plus 1 new event. You can also see that each time the message header is different, because the driver keeps reading data that is not a message header trying to interpret it as one. It is possible for it to eventually encounter a 4-byte sequence that has messageLength that is not too high and not too low, in which case the driver will likely complain about responseTo, which, I believe, is exactly what you observed.

Given that the observations you presented in item 2 are based on the behavior the driver has as a direct consequence of having the aforementioned bug, I do not think it is a good idea to include those observations in the analysis of the problem. I propose to pay attention only to the first action of reading invalid data from a socket and actions related to that same socket before this happened.

Are there other communication channels other than Jira's comments which we can communicate more efficiently?

If you have commercial support from MongoDB, then you could open a support case. I expect that communications there are more interactive. However, for a Java Driver Jira ticket communications are intended to happen via Jira.

Comment by 运军 蒋 [ 03/Nov/21 ]

@Valentin Kavalenka, we have some new findings that indicate this issue is not caused by corruption. We did several rounds of experiments and capture the TCP packets between our service and mongo. The packets indicated that there is no corruption and we confirmed that the TSO is disabled. Furthermore, we found something really interesting from the logs logged by the debugger.

  1. I converted the "corrupted" 16 bytes(OperationEvent{mode=u, code=DECODE_MESSAGE_HEADER, type=F, attachments=[a{StreamReadPosition{successReadBytes=244171, msgHeaderMessageLengthBytes=244155}}, a{MessageHeader{messageLength=975335023, requestId=926299441, responseTo=959721525, opCode=960050487}}) in service_1.logto hex and filter the hex in wireshark to see if I could find the corrupted frames in the pcaps. Fortunately, I found the matched frames shown in image-2021-11-03-14-46-40-340.png. These matched frames are all our service send response to the API client. This is wired that the driver could read the data not belonging to the connection.
  2. Another finding is that as I mentioned in JAVA-4210, after the driver reports many times of message header is greater than the max length, the driver will raise the responseTo is not match the requestedId and the responseTo id is always less than the requested id. From the  service_2.log there is a such log " The responseTo (22162) in the response does not match the requestId (22823)", when search the 22162 in the log, you can first find this log "a{CommandMessageData{id=22162, name=killCursors, containsPayload=false, responseExpected=true, namespace=posService.$cmd, speculativeAuthenticate=false}}]},
    u20281041:OperationEvent{mode=u, code=DECODE_MESSAGE_HEADER, type=F, attachments=[a{StreamReadPosition{successReadBytes=117079, msgHeaderMessageLengthBytes=117047}}, a{MessageHeader{messageLength=825893476, requestId=859322678, responseTo=842021172, opCode=1635131948}}" which indicate that there is a "corrupted" header when driver read the 16 bytes after sending the command message which request id is 22162. This finding shows that the driver reads some bytes not belong to its connection and then after several requests it finally read the correct stream, but at that stage the responseTo is still the request id which supposed to be read when the first error happened.

I can't reopen JAVA-4210, so I cloned a new ticket. Please take a look. Are there other communication channels other than Jira's comments which we can communicate more efficiently?

Thanks a lot.

Generated at Thu Feb 08 09:01:56 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.