[JAVA-231] Failed to retrieve any result when using SlaveOK with all slaves are down Created: 11/Dec/10 Updated: 17/Mar/11 Resolved: 16/Feb/11 |
|
| Status: | Closed |
| Project: | Java Driver |
| Component/s: | Cluster Management |
| Affects Version/s: | 2.3 |
| Fix Version/s: | 2.5 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Joseph Wang | Assignee: | Antoine Girbal |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
[joseph.wang@lpsdb1.la2 ~]$ uname -a |
||
| Attachments: |
|
| Description |
|
Java driver 2.3. We have 3 mongo servers. Each has 48G with 24 CPUs. The servers are running with replica set. [joseph.wang@lpsdb1.la2 ~]$ /usr/local/mongodb-linux-x86_64-1.6.3/bin/mongo localhost:4110 , , { "_id" : 2, "name" : "mongo-prod-mem3.lps.la2.estalea.net:4110", "health" : 1, "state" : 2, "uptime" : 11178, "lastHeartbeat" : "Sat Dec 11 2010 13:04:26 GMT-0800 (PST)" } ], The connection pool was connecting to all three servers. We manually brought down two slaves to see if we can get query result (as part of fault tolerance testing). MongoConnection.java shows our singleton connection pool code. As you can see, we set slaveOk at the query level. if (db != null) { cur = coll.find(dbQuery).addOption( DBObject dbObject = db.getLastError(); if (enable_debug) { log.debug("BaseTableQueryEngine: Run query " + dbQuery.toString()); log.debug("BaseTableQueryEngine: Found " + cur.count() + " in " + (System.currentTimeMillis() - fStart)); } while (cur.hasNext()) { long time = (timeout - (System.currentTimeMillis() - fStart)); } if (enable_debug) { log.debug("BaseTableQueryEngine: tuples " + tuples.size() + " in time " + (System.currentTimeMillis() - fStart)); }} |
| Comments |
| Comment by Antoine Girbal [ 16/Feb/11 ] |
|
I tested this case and was able to read fine from last slave, with 2 servers down from replica set. |
| Comment by Antoine Girbal [ 13/Dec/10 ] |
|
jar from trunk |
| Comment by Antoine Girbal [ 13/Dec/10 ] |
|
this is most likely related to bug basically the Java driver was ignoring the slaveOk option when looking for a master. |
| Comment by Scott Hernandez (Inactive) [ 12/Dec/10 ] |
|
SlaveOk means that queries can be sent to the slave, not they must be, IMO. |
| Comment by Joseph Wang [ 12/Dec/10 ] |
|
if there is a way to determine that all slaves are down, i don't mind reissuing the query w/o slave_ok s.t. it will get to primary/master. |
| Comment by Eliot Horowitz (Inactive) [ 12/Dec/10 ] |
|
Correct - slave_ok means reads hit slaves, and writes hit master. the correct thing is probably to read from the master if all slaves are down. the only issue is if you do queries that are really slow and you assume is going to happen on slave |
| Comment by Joseph Wang [ 12/Dec/10 ] |
|
My understanding from 2.2 driver fix was that SlaveOK meant querying slave for query, but still hit master for write/update. When all slaves are down, we need to have an option to specify hitting master. |
| Comment by Eliot Horowitz (Inactive) [ 12/Dec/10 ] |
|
Its a tad unclear. |
| Comment by Joseph Wang [ 11/Dec/10 ] |
|
Yes, that will be desirable. If no slave is available, query from the primary/master even if SLAVE_OK is set at the query/db/collection level. |
| Comment by Scott Hernandez (Inactive) [ 11/Dec/10 ] |
|
It seems like if the non-master pool is empty then the master should be used, yes? |