[JAVA-2945] Mongo Driver with bad connection hanging forever and not timing out Created: 24/Aug/18 Updated: 27/Oct/23 Resolved: 28/Aug/18 |
|
| Status: | Closed |
| Project: | Java Driver |
| Component/s: | Connection Management |
| Affects Version/s: | 3.5.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | RameshPitani | Assignee: | Unassigned |
| Resolution: | Works as Designed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Description |
|
Hi, We are having a weird issue with Mongo java driver 3.5.0. We have following error initially returned by Mongo and application is handling error smoothly. But after this error, whatever next request picks mongo connection, Next request is hanging forever and not timing out after a certain interval.
Caused by: com.mongodb.MongoQueryException: Query failed with error code 35 and error message '' on server localhost:27017Caused by: com.mongodb.MongoQueryException: Query failed with error code 35 and error message '' on server localhost:27017 at com.mongodb.connection.ProtocolHelper.getQueryFailureException(ProtocolHelper.java:131) at com.mongodb.connection.GetMoreProtocol.execute(GetMoreProtocol.java:88) at com.mongodb.connection.GetMoreProtocol.execute(GetMoreProtocol.java:49) at com.mongodb.connection.DefaultServer$DefaultServerProtocolExecutor.execute(DefaultServer.java:159) at com.mongodb.connection.DefaultServerConnection.executeProtocol(DefaultServerConnection.java:289) at com.mongodb.connection.DefaultServerConnection.getMore(DefaultServerConnection.java:254) at com.mongodb.operation.QueryBatchCursor.getMore(QueryBatchCursor.java:218) at com.mongodb.operation.QueryBatchCursor.hasNext(QueryBatchCursor.java:105) at com.mongodb.MongoBatchCursorAdapter.hasNext(MongoBatchCursorAdapter.java:46) at com.mongodb.DBCursor.hasNext(DBCursor.java:145) at com.mongodb.DBCursor.fillArray(DBCursor.java:910) at com.mongodb.DBCursor.toArray(DBCursor.java:655) at com.mongodb.DBCursor.toArray(DBCursor.java:643)
|
| Comments |
| Comment by RameshPitani [ 27/Aug/18 ] |
|
Hi Jeff, No. I have informed about this on 08/24. IBM exposed mongo interface on top of TPF mainframe system. This allows java application to talk to TPF Mainframe system using NoSQL. Initially, when I raised this defect, I thought 2nd Issue(Connection hung) is related to Java Mongo driver issue. Today IBM confirmed that there is a bug on mainframe system (Closing Mongo Connection during cursor close) for 2nd error. For 1st error((Query failed with error code 35 and error message)), IBM Mongo server can not handle if Find and GetMore come on different Connection. This is the limitation of IBM mongo server. https://www.ibm.com/support/knowledgecenter/en/SSB23S_1.1.0.15/gtpd5/cmdbifcontainer.html Your prompt responses helped us to narrow down the problem. There is no problem with mongo java driver. Thank you for all your help. We can close this JIRA. |
| Comment by Jeffrey Yemin [ 27/Aug/18 ] |
|
pitani can you clarify? Are you not running against a MongoDB binary that was built by MongoDB, Inc? If not, where is the binary from? |
| Comment by RameshPitani [ 27/Aug/18 ] |
|
The version of MongoDB on TPF is compatible with 2.6, however there are, certain 2.6 commands are not supported. |
| Comment by Jeffrey Yemin [ 27/Aug/18 ] |
|
pitani for question 1, I'm asking what version of the MongoDB server that you are connecting to, not the driver version. Driver and server are versioned independently of each other. |
| Comment by RameshPitani [ 27/Aug/18 ] |
|
Here I attached Thread Dump file when request hung. |
| Comment by RameshPitani [ 27/Aug/18 ] |
|
Jeff, We have observed an interesting clue in the log about Connection Hang Issue in the log (2nd Issue after Query failed with error code 35 and error message).
2018-08-27 12:52:23,333|DEBUG|||CleanCursors-2-thread-1|org.mongodb.driver.protocol.killcursor||Killing cursors [1] on connection [connectionId\{localValue:23, serverValue:0}] to server mongodb.prod.sabre.com:27017
|
| Comment by RameshPitani [ 27/Aug/18 ] |
|
Thank you Jeff. 1. This is our version. Please let us know if you see any problem with version I am using. <spring-data-mongodb.version>1.8.4.RELEASE</spring-data-mongodb.version> 2. Yes. We are getting Error "Query failed with error code 35 and error message" even when we tested against the standalone server. mongo.host=ACPSTSB.vpars.sabre.com Spring Context file: <mongo:db-factory dbname="tpfdf" mongo-ref="mongo"/> <mongo:mapping-converter id="mongoConverter" base-package="com.sabre.checkin.connector.tpfdf.converter.mongo"> <bean id="mongoTemplate" class="org.springframework.data.mongodb.core.MongoTemplate"> <constructor-arg name="mongoDbFactory" ref="mongoDbFactory"/> <constructor-arg ref="mongoConverter"/></bean>
|
| Comment by Jeffrey Yemin [ 27/Aug/18 ] |
|
Ramesh, the only way to force it would be to set the maximum connection pool size to one, but that would mean MongoClient wouldn't have any concurrency. A few further questions:
|
| Comment by RameshPitani [ 27/Aug/18 ] |
|
Hi Jeff, Reg. "Query failed with error code 35 and error message". I have one question.. Z/TPF Mongo Team claims that above error happens if the find and get more are on different connections. Is there a way to force Original Find and GetMore use same mongo connection? Will @Transactional work or any other solution? |
| Comment by Jeffrey Yemin [ 27/Aug/18 ] |
|
If a killCursors message is being sent from a CleanCursors thread, it means that the cursor was not closed by the application (e.g., via a call to DBCursor.close). |
| Comment by RameshPitani [ 27/Aug/18 ] |
|
We have configured only one server(F5 Load Balancer) in spring context XML file. There is no replica Set. This F5 load balancer server talks to 2 ZTPF mongo servers behind. Even if we hit single ZTPF mongo server directly with out going through F5 Load Balancer, We have this error. <mongo:mongo-client host="#{ckiMongoProperties['mongo.host']}" port="#{ckiMongoProperties['mongo.port']}" credentials="#{ckiMongoProperties['mongo.credentials']}"> |
| Comment by Jeffrey Yemin [ 27/Aug/18 ] |
|
Yes, there is no requirement that a GetMore request uses the same ConnectionId as the previous ones for a given query. The only requirement is that it routes it to the same server. Are you seeing any evidence that the GetMore is routed to a different server address? |
| Comment by RameshPitani [ 26/Aug/18 ] |
|
Hi Jeff, I have observed something in my log file(ErrorLog1.txt) related to error "Query failed with error code 35 and error message ''. This ErrorLog1.txt contains log messages related to the same transaction. If you look at lines 2 and 4. It has different ConnectionId Value. 1st line is Mongo query with In clause of 150 addresses. Mongo driver cursor decided to use Get More documents multiple times. 1st time, requestErrorLog1.txt When I saw successful transactions, It used same connection Id value for 1st time as well as 2nd time(Get More), I am not seeing any problem. If the 2nd time (Get More) used different ConnectionId value, then we have the error "Query failed with error code 35 and error message ''. In TPF Mainframe logs error is "MONG0035E 09.51.26 Cursor ID 1 passed on the request is not valid or is not an active cursor." Do you know why 2nd time (GetMore) is using different ConnectionId value compared to 1st one?
|
| Comment by Jeffrey Yemin [ 24/Aug/18 ] |
|
Hi Ramesh, I understand your question, and my requests are attempting to prove your hypotheses. The command monitoring events, in particular the ConnectionId, will tell us that the two requests are actually using the same connection. The thread dump will prove that the request is actually waiting for a socket read to complete, and the server logs may shed light on whether the second request actually arrived at the server. Generally, if a driver receives a response to a request, as it appeared to here (it's reporting error code 35), then it assumes the socket is still in a good state and just returns it to the pool. It's only on an IOException that the connection is forcibly closed by the driver. |
| Comment by RameshPitani [ 24/Aug/18 ] |
|
Hi Jeff, We will try command listener with mongo java driver 3.5.0. ZTPF has mongo interface and the Client application(Java) is using Mongo Java driver(3.5.0) to communicate to ZTPF system using Mongo interface. So the 1st error(Query failed with error code 35 and error message '' on server) might be related to ZTPF system. My question mainly about 2nd request. 2nd request hung forever and mongo java driver is not timing out after 10 seconds. If Mongo Socket connection went bad after 1st error, why driver is not closing bad connection and why 2nd request using bad mongo connection? How can close this bad connection forcebly? Is there a way to close it?
|
| Comment by Jeffrey Yemin [ 24/Aug/18 ] |
|
Sorry to hear that you're having issues with MongoDB using the Java driver. Are you able to reproduce this error? If so, a few things that would help us to investigate further:
If you can share all that information with us, we can proceed with an investigation. |