[JAVA-3457] Gracefully handle mongos nodes exiting via mongodb+srv:// Created: 10/Oct/19  Updated: 27/Oct/23  Resolved: 27/Nov/19

Status: Closed
Project: Java Driver
Component/s: Cluster Management
Affects Version/s: None
Fix Version/s: None

Type: New Feature Priority: Major - P3
Reporter: Ben Picolo Assignee: Jeffrey Yemin
Resolution: Works as Designed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

We recently set up a shared cluster of MongoS servers in kubernetes via the fairly new mongodb+srv record support (https://www.mongodb.com/blog/post/mongodb-3-6-here-to-SRV-you-with-easier-replica-set-connections)

In Kubernetes, when nodes enter a terminating state, they are removed from from both the SRV record broadcast, and their DNS resolution will also no longer succeed. In some cases (depends on configuration), they may still be available to handle connections for some amount of time, until the pod has fully terminated.

The Mongo java driver currently scans SRV records every 60 seconds, which is hardcoded].]

 

When a mongos pod enters termination, that leaves an up-to-60-second gap where, to my understanding, we can hit issues in the java mongo driver through the following path.

 

  1. The mongodb java driver selects a random host from known available hosts - assume it has chosen a recently terminated host
  2. If the connection pool needs to spawn a new connection, the driver does a dns lookup on the host. link
  3. The DNS lookup fails for the recently shut down host. This throws an exception which invalidates all active connections to this host (including currently-functioning connections) link 
  4. Until the SrvRecordMonitor refreshes it's host pool, all queries have a 1/pool_size chance of failing because server selection is random. Operation retries don't fully handle failure, but reduce the chance of query failure to (1/pool_size ^ retry_count)

 

There seem to be a couple potential mechanisms for improving this. I can imagine blacklisting hosts that have experienced dns failures until the next refresh when using mongodb+srv, but there seem to be several reasonable options.

 

We'd be happy to contribute a patch here if there's an agreed upon handling strategy for us to pursue.

 



 Comments   
Comment by Jeffrey Yemin [ 27/Nov/19 ]

Closing this out as I believe we've answered all the open questions, and demonstrated how to orchestrate a service such that there is no visible application effects.

If you have further questions, please post them and we can re-open.

Comment by Jeffrey Yemin [ 17/Oct/19 ]
  • heartbeatFrequency: decreasing this value will allow the server monitors to determine that a server is unavailable faster.  Note though that in an active application, application threads will fail more frequently than this and change the state to unavailable before the server monitor gets around to finding out
  • heartbeatConnectTimeout, heartbeatSocketTimeout: these control how fast the server monitor will fail in the face of network errors.  More of an issue if your server doesn't come down cleanly though.  If you bring the mongos process down in an orderly fashion, the server should promptly notify the client that the socket is no good, and the client doesn't have to wait for timeouts
  • connectTimeout, socketTimeout: similar to above, but applies to operations initiated by your application.
Comment by Ben Picolo [ 17/Oct/19 ]

Which timeouts and server monitor frequencies are adjustable that help out here?

 

The second part you mention may be the missing piece of the puzzle here, but will have to figure out if there's a strategy for us to disallow new connections efficiently. I'll look into that path, and appreciate the response on this.

 

Unfortunately, I don't believe we get tailored control over the timings for SRV records in kubernetes (that's a path we were looking into as well).

Comment by Jeffrey Yemin [ 17/Oct/19 ]

bpicolo@squarespace.com, the driver does handle application shutdown. Though there is a window during which one or more application threads may get exceptions, the window is fairly short, and can be controlled by the client through the setting of various timeouts and server monitor frequencies.

The problem you seem to be having is due to the host being removed from DNS entirely prior to shutting the mongos process down. I can think of a few things you could do to improve your situation:

  1. Delay DNS removal for 60 seconds after updating the SRV record to exclude the mongos.  If you do that you won't get any application errors, and the driver will have time to update its list of mongos servers
  2. Alternatively, shut down the mongos process before making any DNS changes. The driver will detect that the mongos process has closed its connections, and that mongos will no longer be selected for any operations.

 

 

Comment by Louis Plissonneau (Inactive) [ 15/Oct/19 ]

andrey.belik if you manually kill/remove the pod, it will spin up a new one almost immediately

when mongos crashes on the pod, the automation agent will try to restart mongos processes

the liveness (every 30 seconds) and readiness (every 5 seconds) probes will detect the loss but they have a failure rate (to prevent over-reacting), so it will take 3 minutes minimum for kubernetes to react (we need 6 liveness failures in a row, and it's longer for readiness probe)

 

Thinking about this, it's about time we need to revisit the liveness probe

Comment by Ben Picolo [ 14/Oct/19 ]

@Andrey - worth clarifying, the driver currently handles neither case, as far as I can tell (clean or unclean application shutdown).

Comment by Andrey Belik (Inactive) [ 14/Oct/19 ]

louis.plissonneau please confirm if I am correct here. All mongos is fronted with Service that exposes SRV Records.

When mongos is terminated K8S controller updates DNS pretty much immediately (but it is eventual consistency model) 

When mongos crashes it will be detected by K8S and that could take longer (few seconds) and then it will be taken our from DNS and new provisioned.

 

Comment by Ben Picolo [ 10/Oct/19 ]

I'll check whether that would be a factor for us - I'm not sure what sort of SLA we have in place. Let me consult some folk in my organization.

Comment by Jeffrey Yemin [ 10/Oct/19 ]

bpicolo@squarespace.com

No problem with opening a ticket directly here, but just be advised that there is no SLA in place when you do it this way.

 

Comment by Ben Picolo [ 10/Oct/19 ]

I am not - we thought that this board may be the best first point of discussion, but happy to redirect wherever would be best.

Comment by Jeffrey Yemin [ 10/Oct/19 ]

It was not a bot .  

Changed it back to what you intended.

Are you in contact with our technical support organization on this already by any chance?

Comment by Ben Picolo [ 10/Oct/19 ]

@jeff.yemin - I see you or a bot version of you tweaked some wording for me (thanks!) Want to note that "shared" was intentional, though. The sharding isn't new in this case, the shared MongoS fleet is.

Comment by Ben Picolo [ 10/Oct/19 ]

I don't appear to have permissions to edit my ticket, but here's the link I had intended for the DefaultSrvRecordMonitorFactory: 

https://github.com/mongodb/mongo-java-driver/blob/f0124e36f5d7bbf8442570d1304f73ca6f5b16a1/driver-core/src/main/com/mongodb/internal/connection/DefaultDnsSrvRecordMonitorFactory.java#L28

 

Also worth mentioning - we're currently using the latest 3.x driver.

Generated at Thu Feb 08 08:59:39 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.