[SERVER-11843] Removal of replica-set members doesn't work for sharded DB when configServers are down Created: 25/Nov/13  Updated: 16/Dec/13  Resolved: 16/Dec/13

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 2.4.6
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: Krishnachaitanya Thummuru Assignee: Unassigned
Resolution: Done Votes: 0
Labels: sharding
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

CentOS


Attachments: Text File mongos.log    
Operating System: Linux
Steps To Reproduce:

Provided in description summary

Participants:

 Description   

We have Geo-redundancy setup for sharded database with below Configuration:

Site-1

Shard#1 - set01 - (Host1: member#1- Primary DB, Host2:member#2- Secondary DB)
Shard#2 - set02 - (Host3:member#1- Primary DB, Host4: member#2- Secondary DB)
Shard#3 - set03 - (Host5: member#1- Primary DB, Host6: member#2- Secondary DB)

Host7: Config Server1
Host8: Config Server2
Host9: Arbiter

Site-2
Shard#1 - set01 - (Host1:member#3- Secondary DB, Host2: member#4- Secondary DB)
Shard#2 - set02 - (Host3:member#3- Secondary DB, Host4: member#4- Secondary DB)
Shard#3 - set03 - (Host5:member#3- Secondary DB, Host6: member#4- Secondary DB)
Host7: Config Server3

Issue:
When we have entire site down, then in-case such we remove all the failed members of site-1.
However, as when site-1 is completely down it mean both the config servers also goes down.
Now, the issue is the removal of members of site1 doesn't gets updated in config server metadata.
When we execute sh.status still we see the all the members of replica-set.
As an work around we bring up the config server and then again removed the members, then only it updated the metadata.
Is this limitation or bug with mongo which requires all the config server to be up and running to remove the replica-set members.



 Comments   
Comment by Krishnachaitanya Thummuru [ 16/Dec/13 ]

Thanks Stephen
This is related to metadata and as per your comments this expected behavior.
So, you can close this ticket.
However, we have another issue when two config servers are down out of three config server, then we see the degradation in performance of application.
We are working with our customer and open separate jira ticket for this issue.

Comment by Stennie Steneker (Inactive) [ 16/Dec/13 ]

Hi Krishnachaitanya,

Please be advised that I'm closing this issue due to inactivity. As per Scott's earlier replies, it is expected that the sharding metadata on the config servers will not be updated if 1 or more config servers are not available.

If there are any specific errors to investigate we would need answers to the additional questions posed.

Thanks,
Stephen

Comment by Scott Hernandez (Inactive) [ 25/Nov/13 ]

Please include answers to the following info:

  • What is the client error? What kind of timeout?
  • What is the client write concern? W:Majority?
  • When did this happen?
  • Was the write to the shard which was down possibly?
  • Can you reproduce this or does it only happen randomly?
  • If it happens again, can you increase logging to capture more diagnostic information?
Comment by Krishnachaitanya Thummuru [ 25/Nov/13 ]

Attached mongos log for your analysis.
Unable to attach complete log due to size constraint

Comment by Krishnachaitanya Thummuru [ 25/Nov/13 ]

Hi Scott,

Thanks for update.
We will try out and update you with the results.
There is another issue which we have observed that when two config servers were not available the sh.status was taking time (approx 3-4 sec) to return the
output.
However, once we turn on other down config server, output of sh.status() returns in milli-sec and also transaction from application side which were timing out were became stable.

Comment by Scott Hernandez (Inactive) [ 25/Nov/13 ]

The members list for each shard in sh.status is just the seed list, so as long as one of those members is still valid then each mongos will discover and use the correct (and up) members. By definition no writes to the config servers for shard metadata can be done if not all of them are online. When all the config servers are back online the shard member list will be updated automatically.

If you are seeing any errors please upload the logs where those errors occur.

Generated at Thu Feb 08 03:26:54 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.