[SERVER-23398] Cannot start mongos with secondary config servers available, if they have never seen a primary Created: 29/Mar/16 Updated: 07/Mar/22 Resolved: 07/Mar/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Viacheslav Kulyk | Assignee: | Andrew Witten (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | neweng, sharding-nyc-subteam2 | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Participants: | |||||||||
| Story Points: | 2 | ||||||||
| Description |
|
Hi, mongos does not start if primary config server is not available, even though there is secondary config server available. Expected: mongos reads configuration from secondary config server and starts OK This behavior is critically bad for our customers, as they are not able to restart the node with mongos and config server, when it is network disconnected. |
| Comments |
| Comment by Lamont Nelson [ 07/Mar/22 ] | |||||||||||||||||||||
|
See andrew.witten's explanation in the previous comment. | |||||||||||||||||||||
| Comment by Andrew Witten (Inactive) [ 07/Feb/22 ] | |||||||||||||||||||||
|
I think this isn't happening anymore. It seems to me like config_rs_no_primary.js is testing this case. Some relevant logs from a run of this test case are here First, the expected config server node is elected primary:
Then the other two are stopped:
and
as a result, the primary no longer has a quorum and therefore steps down:
Then we successfully bring up a new mongos. It seems like this is testing this case. | |||||||||||||||||||||
| Comment by Lamont Nelson [ 30/Nov/21 ] | |||||||||||||||||||||
|
This ticket has been put into our backlog. We will check if this is still an issue. | |||||||||||||||||||||
| Comment by Kaloian Manassiev [ 15/Nov/21 ] | |||||||||||||||||||||
|
Given that MongoS is allowed to be stale (since the shards have the authoritative versions), it should be possible to start a router without having seen a primary/done a majority write. Shards cannot on the other hand, if they had an in-progress migration when they started, because they need to recover the latest shard version. | |||||||||||||||||||||
| Comment by Viacheslav Kulyk [ 23/Nov/16 ] | |||||||||||||||||||||
|
Any updates about the possible fix date ? | |||||||||||||||||||||
| Comment by Viacheslav Kulyk [ 30/May/16 ] | |||||||||||||||||||||
|
Hi Andy, I am trying to get the use case for the setup we described (please see attached) and I can`t. Also, the change Denis proposes is only applied if we set Mongo to apply it, so will not break any existing clients. Regards | |||||||||||||||||||||
| Comment by Andy Schwerin [ 27/May/16 ] | |||||||||||||||||||||
|
That's not quite the point. You cannot know that when you bring the other nodes back up who they will elect as primary. If they do not elect the former primary, it is possible that the new primary will be missing some writes from the old primary, and the old primary will be forced to roll them back. If somebody manages to read from the old primary before that occurs, they will have inconsistent routing tables. | |||||||||||||||||||||
| Comment by Denis Bystrov [ 17/May/16 ] | |||||||||||||||||||||
|
I've got your point, but in our model we don't have any write to secondary while network or primary (majority) are down, so when Primary is available the secondary node will be in sync with this primary and mongos will get consistent data. | |||||||||||||||||||||
| Comment by Andy Schwerin [ 17/May/16 ] | |||||||||||||||||||||
|
Thanks for the patch, denis.bystrov@avid.com. The problem with the proposal that you and vkulyk submitted is what happens when the secondary config servers finally see a primary. It is possible when that happens that they will roll back some or all of their data, but neither mongos nor the shard servers have a mechanism to deal with config data rollback. As a result, the routing tables maintained by mongos and the shards may become inconsistent, leading to writes being accepted at the wrong nodes and loss of data. This patch is only sufficient if you somehow know that the config server you contact will never roll back data that is visible in the local read concern, and this cannot be established automatically. | |||||||||||||||||||||
| Comment by Denis Bystrov [ 17/May/16 ] | |||||||||||||||||||||
|
Hey, Here is out pull request | |||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 17/May/16 ] | |||||||||||||||||||||
|
Thanks for your submission vkulyk. While we're not able to provide support in the SERVER project, I'd encourage you to read the Contributor Guide and submit a pull request via github so we can consider your patch. Don't forget to sign the Contributor Agreement if you haven't done so already. Thanks, | |||||||||||||||||||||
| Comment by Viacheslav Kulyk [ 17/May/16 ] | |||||||||||||||||||||
|
Hi, Attached is a fix candidate for the issue , in form of diff with the source code (fix.txt). It contains changes which will allow us to read from secondary node on config replica. It is made configurable, so it won’t break existing behavior. Could you please review the code and let us know: Thank you | |||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 12/Apr/16 ] | |||||||||||||||||||||
|
vkulyk, as per Andy's comment it is not clear at the moment how to implement this functionality using the existing technology, and a lot of discussion and design is needed. I would not expect this ticket to be part of a stable release in less than 12 months, possibly more. | |||||||||||||||||||||
| Comment by Viacheslav Kulyk [ 12/Apr/16 ] | |||||||||||||||||||||
|
I see that the issue was added to Planning Bucket A. Please advise, at least very roughly, how soon can we expect the issue to be solved and released? In 1/3/6/12 months ? | |||||||||||||||||||||
| Comment by Viacheslav Kulyk [ 07/Apr/16 ] | |||||||||||||||||||||
|
Thank you! Will be waiting for updates. | |||||||||||||||||||||
| Comment by Andy Schwerin [ 06/Apr/16 ] | |||||||||||||||||||||
|
I was talking with spencer about this, and it might be possible to allow mongos nodes to read config data with local read concern, since the shard servers enforce the versioning protocol. It will take a great deal of further study to confirm, but it's an interesting possibility. | |||||||||||||||||||||
| Comment by Viacheslav Kulyk [ 01/Apr/16 ] | |||||||||||||||||||||
|
I`d like to add. I hope that you will find the way to start mongos, reading from the secondary config server. I believe it is possible. It is critical for us. And I think it will make Mongo more powerful, being ready for the cross data center configuration I described above. Regards | |||||||||||||||||||||
| Comment by Viacheslav Kulyk [ 31/Mar/16 ] | |||||||||||||||||||||
|
Hi Ramon, Thank you for going on investigating. I`ve attached a screenshot with the configuration example and case description. As for "Even if mongos was able to start no metadata operations would be allowed" - I believe this is fine, as we do not care about re-chunking, we care about possibility to read and write documents. Let re-chunking happen after connection to majority of config servers is restored. Actually, this is what happens by design if connection is lost, but no reboot done. I would expect that mongos does the same after reboot, starting, but not allowing metadata operations until connection to majority of config servers is restored. Thank you | |||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 30/Mar/16 ] | |||||||||||||||||||||
|
vkulyk, this ticket remains open to investigate what the options going forward are. However it would be helpful if you could elaborate on your use case. What's the end goal of removing a config server from the replica set? Even if mongos was able to start no metadata operations would be allowed on the cluster. Also, what's the state of the data bearing nodes in the cluster? Can they talk to the config server primary? If yes, then I'm concerned that the mongos that's talking to the secondary node may get an inconsistent view of the cluster. Can you please provide more information of what you're trying to do? Thanks, | |||||||||||||||||||||
| Comment by Viacheslav Kulyk [ 30/Mar/16 ] | |||||||||||||||||||||
|
Is there a chance to have some fix in next Mongo release? E.g. to save some latest snapshot in secondary config and to use it, if the majority is not available. Or to create a snapshot, based on secondary config data (as all needed configuration is actually there), if the majority is not available? | |||||||||||||||||||||
| Comment by Spencer Brody (Inactive) [ 30/Mar/16 ] | |||||||||||||||||||||
|
Unfortunately a replica set config server is only available for reads if it has at some point since it was started been in contact with a majority of the config servers. | |||||||||||||||||||||
| Comment by Viacheslav Kulyk [ 30/Mar/16 ] | |||||||||||||||||||||
|
Thank you for clarification. | |||||||||||||||||||||
| Comment by Spencer Brody (Inactive) [ 30/Mar/16 ] | |||||||||||||||||||||
|
This is due to the lack of a snapshot for serving read concern majority reads. If the only reachable config servers have never been in contact with a majority for the lifetime of the process, they will have no snapshot that they can confirm has been committed to a majority of the config servers, thus all reads from the mongos will time out. | |||||||||||||||||||||
| Comment by Viacheslav Kulyk [ 30/Mar/16 ] | |||||||||||||||||||||
|
Hi Thomas, Thank you for a quick response. Just noticed a typo in case 2 description. Cannot edit, so here is the correct:
Also I`d like to add, that the issue is even wider than "when one node is network disconnected". In case if we have data center A and B, and primary config server is in data center A, and network connection is lost between data centers, any node with mongos in data center B cannot be restarted, while each B node has a secondary config server. Regards | |||||||||||||||||||||
| Comment by Kelsey Schubert [ 29/Mar/16 ] | |||||||||||||||||||||
|
Hi vkulyk, Thank you for the report. We are able to reproduce this behavior and are investigating. Please continue to watch this ticket for updates. Kind regards, |