[DOCS-10952] Azure -> Atlas timeouts Created: 27/Oct/17  Updated: 27/Oct/23  Resolved: 05/Feb/18

Status: Closed
Project: Documentation
Component/s: Atlas
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Justin Costello Assignee: Ravind Kumar (Inactive)
Resolution: Gone away Votes: 0
Labels: azure, production-notes
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Participants:
Days since reply: 6 years, 1 week, 2 days ago
Epic Link: DOCSP-1743

 Description   

We have had a number of customers experiencing timeouts when connecting from their Applications hosted on Azure to their Atlas clusters. Updating our docs to include some steps customers can take to improve or eliminate the timeouts would be useful.

Possible reason for timeouts

Taking into account that you're hosting your application on Azure and the nature of the error that you're observing, it may be worth noting that the TCP keepalive on the Azure load balancer is 240 seconds by default, which can cause it to silently drop connections if the TCP keepalive on your Azure systems is greater than this value.

Some recommendations we typically make to customers:

  • Adjusting the maxIdleTimeMS to 120000 should improve the issue.
  • It can be worth looking into the possibility of having both the app server(s) and Atlas cluster residing in the same Azure region as this will also have a net positive impact.


 Comments   
Comment by Ravind Kumar (Inactive) [ 05/Feb/18 ]

Resolved via MMS-4434

If we can confirm this is still an issue, please create a new ticket and link to the related HELP or MMS-SUPPORT ticket

Comment by Ravind Kumar (Inactive) [ 01/Feb/18 ]

I'm mostly unsure that we should recommend these in a context that does not involve a TSE or support personnel, given the restrictions of Atlas. Given the limited control users have over their hosts, some of these settings can't really be applied (e.g. our existing prod notes for tcp_keepalive_time indicate that a restart of the mongod/mongos is required, which Atlas customers cannot control).

That said I have seen at least one other ticket related to "Production Notes on Azure for Atlas", so it seems like this needs to be a more public-facing document.

I do think we need a TSE or two to vet the procedures in context to what Atlas users can do on their own, vs what they require support to assist with

NOTE: Per MMS-4434 it looks like we may have pushed a fix on our end to resolve these issues. Am waiting to confirm what latency mitigations are still required.

Comment by Shannon Bradshaw (Inactive) [ 01/Feb/18 ]

ravind.kumar, can you clarify why you are suggesting moving this to TS-WRITING? Given that these are recommendations we are regularly making to customers and are in-inline with other recommendations in the Azure section of the Atlas prod notes, I'm not clear why we would not simply fold them in.

Comment by Ravind Kumar (Inactive) [ 26/Jan/18 ]

justin.costello would you object to us moving this to TS-WRITING? Production Notes tend to be difficult to maintain, and there's a lot of nuance to these recommendations that are context-specific to the customer we can't capture succinctly. These recs might make more sense as a knowledge base article where the contributions are managed by the TSE team.

Comment by Justin Costello [ 27/Oct/17 ]

susan.hannon, luke.phillippi and andy.walsh, any updates/improvement to ticket welcome. Not quite sure where would be best location for this in Atlas docs.

Generated at Thu Feb 08 08:01:44 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.