[SERVER-8872] error 13388 shard version not ok in Client::Context Created: 06/Mar/13 Updated: 10/Dec/14 Resolved: 15/Aug/13 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Querying |
| Affects Version/s: | 2.2.3 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Kay Agahd | Assignee: | Randolph Tan |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Linux 64 Bit |
||
| Attachments: |
|
||||||||||||||||
| Issue Links: |
|
||||||||||||||||
| Operating System: | Linux | ||||||||||||||||
| Steps To Reproduce: |
|
||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
When we run a slow query, we encounter the "13388 shard version not ok in Client::Context" error even when we flush the router config just before sending the query. Does this mean that Mongo can't execute a long running query because the config changed in the meanwhile? How to cope with? |
| Comments |
| Comment by Kay Agahd [ 14/Jun/14 ] |
|
I already do so, thank you. |
| Comment by Daniel Pasette (Inactive) [ 14/Jun/14 ] |
|
The issue is only resolved because it is a duplicate of |
| Comment by Kay Agahd [ 14/Jun/14 ] |
|
Why the status of this ticket is "resolved" if the ticket |
| Comment by Randolph Tan [ 02/Aug/13 ] |
|
This assert happens when the connection was able to establish the correct shardVersion but the shardVersion got bumped up because of a migration. Slow queries are susceptible to this error because this error check is done every time we reacquire the lock after a yield. I have also attached a related ticket ( |
| Comment by Randolph Tan [ 02/Apr/13 ] |
|
Attached truncated logs (last 100k lines) from running the test on master branch: failed_dd_master.log - binaries built with --dd and was able to reproduce the error after running the script. |
| Comment by Randolph Tan [ 01/Apr/13 ] |
|
Hi, We were able to successfully reproduce the bug so we might not need the logs any more. Attaching test script. |
| Comment by Kay Agahd [ 01/Apr/13 ] |
|
We are runnning 3 mongos, 3 config servers, 3 shards (each one consisting of 3 mongod's). Hardware and configuration of mongod's are identical. They are running on dedicacted servers (no virtualisation). Tomorrow, I'll set up a fourth mongos with level 3 logs in order to reproduce it and send you the logs. |
| Comment by Randolph Tan [ 01/Apr/13 ] |
|
Thanks for the report. Would you be able to provide a mongos log with log level 3 and mongod logs with log level 1? Can you also share the setup of the environment - how many mongos, shards (replica sets?). Thanks! |
| Comment by Kay Agahd [ 01/Apr/13 ] |
|
Yes, I'm pretty sure that there were active migrations. Must I stop the balancer when executing longer running queries? |
| Comment by Daniel Pasette (Inactive) [ 01/Apr/13 ] |
|
Sorry for the delayed response. Can you tell me if there are a bunch of migrations active in your cluster when you get this error? There was some work done on how commands are run in a sharded cluster in 2.2, and I'd like to follow up. This error is not expected behavior. |