[SERVER-5405] mongos does not send reads to secondaries after replica restart when using keyFiles Created: 26/Mar/12 Updated: 11/Jul/16 Resolved: 09/May/12 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 2.0.3 |
| Fix Version/s: | 2.0.6, 2.1.1 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Kristina Chodorow (Inactive) | Assignee: | Greg Studer |
| Resolution: | Done | Votes: | 3 |
| Labels: | buildbot | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||||||
| Description |
|
It looks like, on certain types of errors, secondaries are taken out of circulation and never put in again (even if they are healthy). See log @ end of https://groups.google.com/forum/?fromgroups#!topic/mongodb-user/KLqbtxLNzUQ |
| Comments |
| Comment by auto [ 11/May/12 ] |
|
Author: {u'login': u'gregstuder', u'name': u'Greg Studer', u'email': u'greg@10gen.com'}Message: Conflicts: client/dbclient_rs.cpp |
| Comment by auto [ 11/May/12 ] |
|
Author: {u'login': u'', u'name': u'Randolph Tan', u'email': u'randolph@10gen.com'}Message: Authenticate connection to replica members with keyFile credentials when calling Conflicts: client/dbclient_rs.cpp |
| Comment by Randolph Tan [ 09/May/12 ] |
|
Buildbot failure was caused by test update in |
| Comment by Ian Whalen (Inactive) [ 09/May/12 ] |
|
I'm reopening this because it appears to have reemerged on master at http://buildbot.mongodb.org/builders/Linux%2064-bit/builds/4421/steps/test_9/logs/stdio - the error logs look identical to previous failures. |
| Comment by Andy Schwerin [ 02/May/12 ] |
|
I'm not planning another RC for 2.0.5. If I do one, it will be to fix a regression from 2.0.4, only. It can be targeted for 2.0.6, though. Just mark the two bugs as "backport: yes", without a specified 2.0.x target version, and we'll triage it in a few weeks for 2.0.6. -Andy |
| Comment by Eric Milkie [ 01/May/12 ] |
|
Windows 32-bit test now fixed. |
| Comment by auto [ 24/Apr/12 ] |
|
Author: {u'login': u'gregstuder', u'name': u'Greg Studer', u'email': u'greg@10gen.com'}Message: |
| Comment by Eric Milkie [ 23/Apr/12 ] |
|
Also http://buildbot.mongodb.org/builders/Linux%2032-bit%20debug/builds/1627/steps/test_9/logs/stdio |
| Comment by Randolph Tan [ 20/Apr/12 ] |
|
The reason why the test fails in Windows is because Windows build uses the shutdown command (as opposed to the kill method in Linux builds) when stopMongod is called. Since shutdown command requires admin auth, it will never succeed in the test setup. This patch adds an extra parameter to allow passing the admin user and password when calling the stopMongod. I initially tried calling shutdown instead of using stopMongod (which is called by stopSet) in the test but it was problematic because the test script would not wait for the mongod servers to fully shutdown and can make the test fail sporadically. I also think that adding this infrastructure would also allow us to write auth test easier in the future. This is just a short term fix. Better stopMongod implementation will be addressed in |
| Comment by auto [ 20/Apr/12 ] |
|
Author: {u'login': u'', u'name': u'Randolph Tan', u'email': u'randolph@10gen.com'}Message: Updated test for |
| Comment by Randolph Tan [ 13/Apr/12 ] |
|
Based on the logs, after the test kills all the members of the replica set and when it tries to start it up again, the members detect unclean shutdown and will not start up. |
| Comment by Eric Milkie [ 13/Apr/12 ] |
|
The above commit broke the Windows 32-bit build; slaveok_routing.js is not passing there. |
| Comment by auto [ 12/Apr/12 ] |
|
Author: {u'login': u'', u'name': u'Randolph Tan', u'email': u'randolph@10gen.com'}Message: Authenticate connection to replica members with keyFile credentials when calling |
| Comment by Randolph Tan [ 12/Apr/12 ] |
|
Detailed Cause: Fix: |
| Comment by Randolph Tan [ 10/Apr/12 ] |
|
Updated test to make it replicate the bug. The reason why this wasn't manifesting in the earlier test is because mongod allows you to access the server even with auth on when connecting locally and it does not have an admin user. So, the new test script now adds an admin user to the replica shard to replicate the behavior as if you were connecting remotely. |
| Comment by Randolph Tan [ 09/Apr/12 ] |
|
Update: Bug reproduced. In order for the bug to manifest, mongos should be running on a different machine from the sharded replica set. Caused by: ReplicaSetMonitor never authenticates the connection when it tries to call replSetGetStatus (which requires admin priviledges) when trying to refresh the replica connection states. |
| Comment by Randolph Tan [ 04/Apr/12 ] |
|
Status update: Unable to reproduce. Attached test used. To make this run in v2.0.3, you need to copy the js files from shell directory and cpp files (except bench.cpp) from scripting directory and rebuild the mongo shell binary. Test summary: Previous incarnations of test: |
| Comment by Randolph Tan [ 03/Apr/12 ] |
|
Needs |