[SERVER-18671] SecondaryPreferred can end up using unversioned connections Created: 27/May/15 Updated: 25/Jan/17 Resolved: 20/Jan/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Admin, Sharding |
| Affects Version/s: | 3.0.3 |
| Fix Version/s: | 3.0.10, 3.2.3, 3.3.1 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Marcin Lipiec | Assignee: | Randolph Tan |
| Resolution: | Done | Votes: | 4 |
| Labels: | code-and-test | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||
| Backport Completed: | |||||||||||||
| Steps To Reproduce: | Run test.js after applying repro.diff. You should be to see the log near the end of the test. |
||||||||||||
| Sprint: | Sharding E (01/08/16), Sharding F (01/29/16), Sharding 11 (03/11/16) | ||||||||||||
| Participants: | |||||||||||||
| Description |
|
When mongos tries to setup the version for the connection to be used for queries, it checks if the primary is down with this: https://github.com/mongodb/mongo/blob/r3.1.5/src/mongo/client/parallel.cpp#L574
However, if you look at the implementation of isFailed:
It can return false if the _master is not initialized (when the replica set connection has not yet talked to the master). The reason this was fine in v2.6 is mongos used to eagerly call setShardVersion on every connection created and by the above codepath is reached, _master is guaranteed to be set unless an error occurred. This is no longer true in v3.0 as Original description from user:
|
| Comments |
| Comment by Githook User [ 23/Feb/16 ] |
|
Author: {u'username': u'renctan', u'name': u'Randolph Tan', u'email': u'randolph@10gen.com'}Message: (cherry picked from commit 1d611a8c7ee346929a4186f524c21007ef7a279d) |
| Comment by Quentin Schroeder [ 09/Feb/16 ] |
|
We are also running into this issue and would love to see the fix backported to 3.0. I'll keep an eye on this ticket to see what decisions are made. |
| Comment by Ramon Fernandez Marina [ 08/Feb/16 ] |
|
pperekalov, we're assessing whether a backport to 3.0 is doable safely. Any updates will be posted on this ticket. |
| Comment by Pavel Perekalov [ 05/Feb/16 ] |
|
Can you tell, please, when this issue will be backported to 3.0 |
| Comment by Githook User [ 29/Jan/16 ] |
|
Author: {u'username': u'renctan', u'name': u'Randolph Tan', u'email': u'randolph@10gen.com'}Message: |
| Comment by Githook User [ 20/Jan/16 ] |
|
Author: {u'username': u'renctan', u'name': u'Randolph Tan', u'email': u'randolph@10gen.com'}Message: |
| Comment by Ramon Fernandez Marina [ 02/Jul/15 ] |
|
marcin.lipiec@nokaut.pl, this is to let you know that I've finally been able to reproduce these messages and we're investigating. |
| Comment by Ramon Fernandez Marina [ 20/Jun/15 ] |
|
Thanks for the update marcin.lipiec@nokaut.pl. I'm still trying to reproduce this issue, but no luck so far. Have you tried removing the read preference to eliminate secondary reads? |
| Comment by Marcin Lipiec [ 18/Jun/15 ] |
|
We also tried with version 3.0.4 of mongos and the same messages constantly appeared in logs. |
| Comment by Piotr Duda [ 08/Jun/15 ] |
|
Any ideas about this issue? |
| Comment by Marcin Lipiec [ 01/Jun/15 ] |
|
Ramon, are there any news/ideas? |
| Comment by Marcin Lipiec [ 28/May/15 ] |
|
Actually we have two kind of applications:
|
| Comment by Ramon Fernandez Marina [ 27/May/15 ] |
|
Thanks for uploading the logs marcin.lipiec@nokaut.pl. I assume that the connections to the mongos are not doing any operations that alter collection metadata, but can you elaborate on what kind of operations are these? In particular I'm interested in knowing if you have any read preference set, as this warning is dependent on a specific read preference. Thanks, |
| Comment by Marcin Lipiec [ 27/May/15 ] |
|
Yes, on step 4 warnings appeared. I have attached logs you asked to the issue. |
| Comment by Ramon Fernandez Marina [ 27/May/15 ] |
|
marcin.lipiec@nokaut.pl, if I understand correctly you completed steps 1 to 3 successfully, but on step 4 you started seeing the warnings above, is that correct? Also, can you please send us the full logs for the mongos you upgraded from the time it started running with 3.0.3 until it was rolled back to 2.6.9? Thanks, |
| Comment by Marcin Lipiec [ 27/May/15 ] |
|
We have successfully upgraded cluster's meta data with mongos --upgrade. Then we proceeded to upgrade our remaining mongos processes. Problems appeared after upgrading one of them. After restart of mongos, endpoints started connecting to upgraded instance and then those errors appeared. Those errors were seen until we downgraded to 2.6.9 again (on version 3.0.3 they were consistently appearing in logs). We're sure that during the process network was available all the time. |
| Comment by Ramon Fernandez Marina [ 27/May/15 ] |
|
marcin.lipiec@nokaut.pl, this warning means that the primary was temporarily unavailable and mongos decided to skip the shard version handshake (since it can't with no primary) and proceed to talk with the secondary (this is only possible if the query/command has the right read preference). This means that reads can potentially be stale depending on how up to date the secondaries are. In your case the warning is printed for multiple shards, so this may indicate a temporary network problem in the machine running this 3.0.3 mongos. Can you elaborate on which step of the upgrade process did this warning appear, and whether there were warning/errors on your system log about network unavailability? Thanks, |