[JAVA-3244] Ways to timeout long mongo write operation Created: 22/Mar/19  Updated: 11/Sep/19  Resolved: 02/Apr/19

Status: Closed
Project: Java Driver
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Zhexuan Chen Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

Hi,

We are using mongo-java-driver 3.4.2, and our database structure is:

sharded-cluster: # of shard: 3

Have replica set every dc across US.

Have 3 primarys, all the others are secondary.

We now experience cross-dc network issue in one datacenter to the primary in another datacenter. So some of the requests have very long latency (16min), and succeed.

Our solution is to add retry template in client side. So for the request that has a long latency, it will trigger retry and the retry one can succeed very fast. The problem is, we cannot kill the previous one (which with long latency), and eventually the mongodb execute 2 requests for it.

We investigate why the long latency, and found the network from mongos to mongod has huge package loss, it may be the root cause.

We wish to timeout the long latency write operation. But found mongo doc and mongo-java-driver doc, no useful timeout found.

 

Work we have done:

From MongoClientOption: add socketTiemout, connectionTimeout, maxConnectionIdleTime, maxConnectionLifeTime. All with no use. Request can still take very long(longest is 16min)



 Comments   
Comment by Jeffrey Yemin [ 02/Apr/19 ]

For read operations, you can use maxTimeMS to control the execution time of the operation on the server.

For write operations, the driver can be configured to [retry writes|https://docs.mongodb.com/manual/core/retryable-writes/] in some situations.  Alternatively, killOp can be used, but you'll have to handle idempotency in the application.  

Comment by Zhexuan Chen [ 01/Apr/19 ]

Hi Jeff.

  1. They happened both read and write. And we highly doubt the root cause is the network issue, like package loss etc. 
  2. I am going to say it is not idempotent. The order of write requests getting successful in db matters. If the request takes too long, it will cause the upcoming requests be earlier than this one and cause data mismatch, so we add retry in the client. 
Comment by Jeffrey Yemin [ 01/Apr/19 ]

Hi zchen12345. What are the actual operations that are taking so long? Are they read operation, write operations, or both. If they are write operations, are the writes idempotent, in which case retrying them in the client is safe?

Generated at Thu Feb 08 08:59:08 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.