[SERVER-2471] Issue with slaveok failover for mongos Created: 02/Feb/11 Updated: 12/Jul/16 Resolved: 02/Mar/11 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication, Sharding, Stability |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Greg Studer | Assignee: | Greg Studer |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Linux 2.6.35-25-generic #44-Ubuntu SMP Fri Jan 21 17:40:44 UTC 2011 x86_64 GNU/Linux |
||
| Attachments: |
|
| Operating System: | ALL |
| Participants: |
| Description |
|
Despite being set as slaveok, mongos seems unable to use slaves for queries once primary in replica set goes down. Test which reproduces the issue attached, if test fails an error is thrown from the final two lines ( coll.findOne() ). Further calls to coll.findOne() when test is run with load('shard_shutdown.js') and shell still open causes different connection timeout errors, assuming related. Duplicated on multiple systems (ubuntu linux) but not reproducible everywhere, seems to be system-dependent. |
| Comments |
| Comment by Eliot Horowitz (Inactive) [ 06/Mar/11 ] |
|
Was there a commit for this? |
| Comment by Greg Studer [ 07/Feb/11 ] |
|
Sequence of events: 1. Primary server in shard replica set goes down. You can get the same effect with any command where slaveok is not true (for example, turning slaveok off than on again). The error resets the thread-local connection, and new connections are not allowed when the replica set is down. |
| Comment by Greg Studer [ 04/Feb/11 ] |
|
getNextError is called by default when the output of the previous command is undefined ( actually the variable name "db" is hardcoded in, if you use another variable for your db you won't get this behavior ). The query gets a cursor, but it seems like there is an issue populating the result variable from that cursor. Looking into it. |
| Comment by Eliot Horowitz (Inactive) [ 04/Feb/11 ] |
|
the shell shouldn't call getLastError for a findOne() ... |
| Comment by Greg Studer [ 03/Feb/11 ] |
|
Error: Thu Feb 3 12:55:41 uncaught exception: getlasterror failed: { ", On subsequent requests to coll.findOne(): dbclient error communicating with server: ubuntu:31100 Think I've managed to track down what's happening - the query returns ok, but the subsequent default mongo shell call to getLastError fails to use slaveok. This causes the assertation error and somehow borks the connection for further queries. Hardcoding the slaveok flag in ClientInfo::getLastError seems to fix this, but not sure it's the best solution. |
| Comment by Eliot Horowitz (Inactive) [ 03/Feb/11 ] |
|
Can you send output when you run this? |
| Comment by Greg Studer [ 03/Feb/11 ] |
|
Seems like a race condition.... shutting down, then waiting, then querying sometimes works. |