[SERVER-17667] Slow performance against Secondary when Primary is down (with auth) Created: 19/Mar/15  Updated: 26/Mar/15  Resolved: 26/Mar/15

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 2.6.7
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Alex Lerner Assignee: Randolph Tan
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-15375 initShardVersion triggers inline RS r... Closed
Related
Operating System: ALL
Participants:

 Description   
  • This was reported in Windows environment
  • auth is used
  • accessing data via MongoS.
  • readPreference is PrimaryPreferred
    When Primary goes down, queries are retrieved from the secondary, but with 1 second delay

Developer Notes:
1. App has persistent connection to mongos with auth so it is not creating fresh connections to mongos
2. Topology
App -> MongoS (on Computer A) -> Primary (on Computer B), and Secondary (on Computer A)
3. MongoS tries to make a connection to Primary (which is down as part of test case), receives a ICMP Port Unreachable message, and then tries to connect to the Secondary. The 1 second delay is largely the ICMP port unreachable timeout.



 Comments   
Comment by Randolph Tan [ 25/Mar/15 ]

Here are the observed behavior on Windows based on my experiments:

v2.6.7
Delay also observed with auth off.
Once the delay is observed, subsequent queries will still experience delays.

master (almost equivalent to v3.0.1)
Delay only happens once, subsequent queries will not attempt to downed primary so the delay never occurs again.

Looking further, I discovered that the delay in 2.6.7 was coming from the ShardingConnectionHook::onCreate trying to initialize the version of the connection. This has been completely removed by SERVER-15375, so the delay no longer happens in the new mongos.

Comment by Andy Schwerin [ 19/Mar/15 ]

alex.lerner, you report this as affecting 2.6.7. Do you know if it affects 2.6.9? 3.0.1? I want to make sure we diagnose the correct root cause.

Generated at Thu Feb 08 03:45:13 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.