[SERVER-7298] thousands of "waiting till out of critical section" Created: 09/Oct/12 Updated: 08/Mar/13 Resolved: 12/Oct/12 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 2.2.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Kay Agahd | Assignee: | Tad Marshall |
| Resolution: | Incomplete | Votes: | 0 |
| Labels: | crash, replicaset, sharding | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
linux 64 bit |
||
| Issue Links: |
|
||||||||||||||||
| Operating System: | Linux | ||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
We are running mongodb v2.2.0 Linux 64 Bit, 3 shards each having 3 nodes. This happened to other nodes as well already to another time. Could it be related to v2.2.0 or to authentification? We're using both since a few time and we encounter this error only since then. Should we downgrade or disable authentification? |
| Comments |
| Comment by Ian Whalen (Inactive) [ 13/Nov/12 ] |
|
Klebert, the issue in question, |
| Comment by Klébert Hodin [ 13/Nov/12 ] |
|
We don't use authentification and had the same issue after upgrading our whole cluter from 2.0.6 to 2.2.1. |
| Comment by alex giamas [ 14/Oct/12 ] |
|
Thanks all for the answers, added 7034 to my watch list. |
| Comment by Tad Marshall [ 12/Oct/12 ] |
|
|
| Comment by Kay Agahd [ 12/Oct/12 ] |
|
Yes, ok, please follow up the support ticket, so this one can be closed. |
| Comment by Tad Marshall [ 12/Oct/12 ] |
|
We don't have any indication at this point that this is related to authentication. We think that the fundamental problem is the lack of a timeout on the connection to the config server, making it possible for a single non-responsive config server to "hang" multiple mongod processes. That issue ( Alex, you can add yourself as a "watcher" of agahd, we can follow up in the SUPPORT ticket you created, so we can close this one unless you have more that you want to add here. Tad |
| Comment by Kay Agahd [ 12/Oct/12 ] |
|
Alex: on our side it was just a guess that it's related to authentication. Maybe Tad can confirm that. |
| Comment by alex giamas [ 12/Oct/12 ] |
|
Regardless of the solution, could you post if it's related to authentication or not? "blackholed hosts" would hint towards a yes but we need to make sure. In case it's not it would help those of us not using authentication from not putting it in our "blocker's list" for upgrade. |
| Comment by Kay Agahd [ 09/Oct/12 ] |
|
Thanks Tad! Your explanation and the related issues reflect what we've experienced. The whole system seemed to be down even though only 1 mongod node was affected. I've created a private jira in order to submit you our confidental logs: Yes, we are in mms. Our group name is idealo. Thanks! |
| Comment by Tad Marshall [ 09/Oct/12 ] |
|
Hi agahd, This may be related to Can you post a full log to this ticket so that we can compare symptoms with the cases we have seen? Are your servers in MMS? Can you post a link? Tad |