[SERVER-3739] mongos: "too many attempts to update config, failing" Created: 01/Sep/11 Updated: 11/Jul/16 Resolved: 29/Oct/11 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 1.8.1 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Theo Hultberg | Assignee: | Greg Studer |
| Resolution: | Done | Votes: | 2 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
sharded cluster with three shards, three config servers, connecting through mongos |
||
| Operating System: | ALL |
| Participants: |
| Description |
|
Not sure what's happening here, but mongos seem to be very confused about which databases exist. It threw errors at the application with the message "too many attempts to update config, failing", and at the same time this can be found in the mongos log:
Then it died. Running "show dbs" in the mongo console while connected to the mongos clearly shows that the database in question exists. This is not the first problem we've encountered where mongos is confused about which databases exist, and frankly we're getting scared of using sharding because it's so easily corrupted. I haven't found or heard any way to fix the problem but to clean the whole cluster and start over. If you're wondering about the date in the database name we use a application side partitioning scheme, mostly because we need to remove old data, but also partly because it's so easy to get a corrupted sharding config, and in such a case we don't want as little of our active data in that database as possible. This may be related to This is some more context from the mongos logs:
|
| Comments |
| Comment by Eliot Horowitz (Inactive) [ 25/Nov/11 ] | ||||||||||||
|
@jonas - can you try 2.0.2-rc1? | ||||||||||||
| Comment by Jonas Kramer [ 25/Nov/11 ] | ||||||||||||
|
I just stumbled upon this problem. Upgraded mongos to 2.0.1 (mongod already was at 2.0.1) but it still occurs:
| ||||||||||||
| Comment by Eliot Horowitz (Inactive) [ 16/Nov/11 ] | ||||||||||||
|
You can have a 2.0.1 mongos with 1.8.3 mongod - though we recommend upgrading that as well. | ||||||||||||
| Comment by beier cai [ 15/Nov/11 ] | ||||||||||||
|
Can we upgrade to 2.0.1 mongos but still running mongod in 1.8.3? | ||||||||||||
| Comment by Alan Shang [ 09/Nov/11 ] | ||||||||||||
|
@Michael We had to restart 1.8.4 mongos a few times daily, day or night, and were forced to upgrade to 2.0.1 yesterday. So far so good. | ||||||||||||
| Comment by Eliot Horowitz (Inactive) [ 29/Oct/11 ] | ||||||||||||
|
For all those who saw this - please try 2.0.1 and let us know if you see it again. | ||||||||||||
| Comment by Eliot Horowitz (Inactive) [ 29/Oct/11 ] | ||||||||||||
|
This is the same as | ||||||||||||
| Comment by Michael Schurter [ 28/Oct/11 ] | ||||||||||||
|
alans We've only used 2.0.0 and have this problem. May upgrade to 2.0.1, but none of the resolved issues look related. | ||||||||||||
| Comment by Alan Shang [ 28/Oct/11 ] | ||||||||||||
|
PHP threw the same error "too many attempts to update config, failing" on 1.8.3 and 1.8.4. It only happened in one of the web servers. Mongos didn't die but queries got stuck. Restarting mongos put it in normal state. This looks like a bug that still exist in 1.8.4 and 2.0.0. Any progress made? | ||||||||||||
| Comment by Michael Schurter [ 24/Oct/11 ] | ||||||||||||
|
Fixed it by running:
on the troubled mongos. | ||||||||||||
| Comment by Michael Schurter [ 24/Oct/11 ] | ||||||||||||
|
Seeing lots of these lines in a mongos with more verbose logging turned on:
FWIW our shard key is _id which is a uuid, so the distribution doesn't really seem to ever change (after initial balancing obviously). | ||||||||||||
| Comment by Michael Schurter [ 24/Oct/11 ] | ||||||||||||
|
CPU load was a bit high on one of our sharded mongods at this point, but lock% was below 50%, so it seems like it should. lock% on config servers seems low. One of the config servers is doing around 1k queries/s. The others look bored. The main mongods (2 shards of 2 servers + 1 arbiter) are doing around 3k-4k updates per sec and slightly higher queries per second. Disk usage is nonexistent; all data fits in RAM. | ||||||||||||
| Comment by Michael Schurter [ 24/Oct/11 ] | ||||||||||||
|
Just started getting these with pymongo 2.0.1 and the previously mentioned setup:
| ||||||||||||
| Comment by Michael Schurter [ 20/Oct/11 ] | ||||||||||||
|
We're seeing this in 2.0.0 with 2 mongoses, 3 mongod config servers, and 2 shards (which are replica sets with 2 mongods and 1 arbiter a piece). We're seeing this in the Java client driver 2.6.5 which does the bulk of our updates (~3000/s average but between 2k-5k/s with another 3k queries per second). We haven't seen this using pymongo 2.0.1, but it has a far lower update rate (double digit updates per second, thousands of reads per second). | ||||||||||||
| Comment by Theo Hultberg [ 18/Oct/11 ] | ||||||||||||
|
Yes, we've seen it in 1.8.3, more or less every day for since reporting it. | ||||||||||||
| Comment by Mathias Stearn [ 17/Oct/11 ] | ||||||||||||
|
Have you seen this with 1.8.3 or later? | ||||||||||||
| Comment by Eliot Horowitz (Inactive) [ 02/Sep/11 ] | ||||||||||||
|
If you're doing a lot of drops, 2.0.0 is going to have a bunch of things that make it a lot better. | ||||||||||||
| Comment by Theo Hultberg [ 01/Sep/11 ] | ||||||||||||
|
Ooops, I wrote 1.8.1, but meant 1.8.2 (we've run all versions from 1.8.0 to 1.8.2 including a few RC's just to get around sharding issues). We have upgraded to 1.8.3 because of | ||||||||||||
| Comment by Mathias Stearn [ 01/Sep/11 ] | ||||||||||||
|
There were some race conditions in 1.8.1 mongos that were fixed in 1.8.2. I would suggest upgrading ASAP. 1.8.3 is the latest 1.8 release. If you still see this in 1.8.3 please post back with a mongoexport of the config.chunks collection. If there is any private data you can create a case in Community Private that is only visible to you and 10gen employees. |