[SERVER-5433] Stale config and unable to move chunks Created: 28/Mar/12  Updated: 15/Aug/12  Resolved: 04/Apr/12

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 2.0.1
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Preetham Derangula Assignee: Greg Studer
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: Linux
Participants:

 Description   

History
Our production shard network was running 7 shards and 7 routers(it was silly to have 7, but it was a mistake) and 3 config servers.Main Router were running of file descriptors and then it was shutdown and limit was increased.But the database couldnt get lock and so database was repaired and started.After this one of the config servers was holding lot of data in the moveChunk folder and has exhausted all the space and processes were unresponsive. I dont know why it was holding all the data in the moveChunk folder- a mystery for me. Once the whole shard network was restarted, many data writes were complaining of stale config, and subsequently were retrying and then exhausting all the file descriptors and then Router became totally unresponsive and was throwing socket exceptions.An idea to run the shard on one config server rather than 3 config servers was considered and tested in staging environment and applied to prod sharding network.Now I saw stale config warning and then unable to transfer data errors and then finally out of file descriptors error. I am running 2.0.1 Do you think any of these problems would go away if I upgrade it to 2.1.0 or would I be in worse condition? Any idea why all these errors are happening?



 Comments   
Comment by Greg Studer [ 04/Apr/12 ]

Closing for now, feel free to open new question tickets or use the groups for new issues.

Comment by Greg Studer [ 03/Apr/12 ]

Generally mongoses are placed on the app server - this distributes the (small) mongos load evenly as you add new app servers. Generally shards are optimized for low latency and high i/o, and react poorly to competing for resources with other processes, so ideally they would be standalone as well.

Config servers very low latency, but don't store much data, so should be centrally located machines with high bandwidth.

Comment by Preetham Derangula [ 02/Apr/12 ]

>"Config servers only hold metadata - and they shouldn't be handling migrations (unless your config server is also a shard - not recommended). How exactly is your cluster set up, and on what hardware? You'll need to make sure that your ulimits are correctly set for fds and processes on every server, and the config servers (you should use three) should ideally be on dedicated machines. All the mongoses will need to be able to communicate with all the config servers and shards, and all shards will need to be able to communicate with each other and the config servers"
I am planning to redesign the topology for the shard network.You have mentioned that config server should ideally be a dedicated machine. How about Router- Does it need to be a dedicated machine too or is it ok to run a shard on Router machine?
Thanks,
Preetham

Comment by Greg Studer [ 30/Mar/12 ]

> If all the config files are deleted and the whole shard network is started, would it function normally?
No, there would be all sorts of problems. Without the config data, there's no way to tell where the data for sharded collections is located, and, by default, it would just look on a single shard. I don't think the problem is the config data here though - you're out of space and there's network weirdness.

Comment by Preetham Derangula [ 30/Mar/12 ]

Thanks for the advice!
One more question on the thrashing config server and recreating config servers.If this approach is implemented, does all the indexes and sharded collections have to be recreated or is there any thing else that needs to be done? If all the config files are deleted and the whole shard network is started, would it function normally? As you have said previously, config server holds metadata, I am just wondering what config information would be lost and I need to have so that the existing data on the shard is functional.
mms.10gen.com- I would try this out and inform our support team- thanks for that!

Comment by Greg Studer [ 30/Mar/12 ]

[Balancer] balancer move failed: { cause: { assertion: "Can't take a write lock while out of disk space", assertionCode: 14031, errmsg: "db assertion failure", ok: 0.0 }, errmsg: "moveChunk failed to engage TO-shard in the data transfer: db assertion failure", ok: 0.0 } from: shard6 to: shard5 chunk: { id: "audit.tsauditentry-UID"23850964410"", lastmod: Timestamp 83000|1, ns: "audit.tsauditentry", min: { UID: "23850964410" }, max: { UID: "23851209616" }, shard: "shard6" }

This error is pretty clear - you're out of disk space on one of the machines, and therefore writes and indirectly the balancer have been stopped. The balancer has probably been stopped for awhile for other reasons, which is why data is building up differently between different shards.

db.printShardingStatus() failing with a socket exception still indicates there are connectivity problems between machines. This may be why balancing stopped and your data built up in the first place.

To get back to normal, you need to increase the disk space on the "full" machine and track down the connectivity problem (the mongos/mongod logs can help you here). Then you need to let the data in the sharded collection balance. Be aware that the balancing rate can/will be slower than the rate of inserts if you're pushing lots of data into MongoDB, you'll need to monitor your system to determine if this is occurring. MMS (mms.10gen.com) is useful for these kinds of tasks.

Comment by Preetham Derangula [ 30/Mar/12 ]

Now , I have checked disk space allotted on all the machines. One of the machines seems to have lots of data on it, when compared with others. I see in the log a problem related to that.
********
[Balancer] balancer move failed: { cause:

{ assertion: "Can't take a write lock while out of disk space", assertionCode: 14031, errmsg: "db assertion failure", ok: 0.0 }

, errmsg: "moveChunk failed to engage TO-shard in the data transfer: db assertion failure", ok: 0.0 } from: shard6 to: shard5 chunk: { id: "audit.tsauditentry-UID"23850964410"", lastmod: Timestamp 83000|1, ns: "audit.tsauditentry", min:

{ UID: "23850964410" }

, max:

{ UID: "23851209616" }

, shard: "shard6" }
**********
Going back to your comment
"Also, mongodb won't prevent you from running out of space even with balancing on - it will try to spread load across the cluster, but it won't be perfect and you'll need to ensure you have enough room for the data" - this situation is entirely possible.
But when I look at the db.stats(), I see lots of data entirely concentrated on that one machine. I was not expecting perfectly balancing in the shards, but approximate balancing.Following is the output of the db.stats() command.[ I was unable to get db.printShardingStatus() out put as it was throwing "uncaught exception: error

{ "$err" : "socket exception", "co de" : 11002 }

"]

db.stats ----> out put
If you look at "LRCHB00363" machine section in the following - it has asynchronous share of data in the shard network. It was the machine that was running one of the config machines that we were running and had shut down.Do you think there is any theory that can explain his asynchronous share of data?
mongos> db.stats()
{
"raw" : {
"LRCHB00319:40001" :

{ "db" : "audit", "collections" : 9, "objects" : 11558178, "avgObjSize" : 1007.4102402645123, "dataSize" : 11643826876, "storageSize" : 12771815424, "numExtents" : 125, "indexes" : 20, "indexSize" : 1948577904, "fileSize" : 19251855360, "nsSizeMB" : 16, "ok" : 1 }

,
"LRCHB00365:40002" :

{ "db" : "audit", "collections" : 9, "objects" : 11281959, "avgObjSize" : 1014.4145299588484, "dataSize" : 11444583136, "storageSize" : 12058107904, "numExtents" : 114, "indexes" : 20, "indexSize" : 1906635024, "fileSize" : 17105420288, "nsSizeMB" : 16, "ok" : 1 }

,
"LRCHB00366:40003" :

{ "db" : "audit", "collections" : 9, "objects" : 11946664, "avgObjSize" : 1018.4887031224783, "dataSize" : 12167542324, "storageSize" : 14119792640, "numExtents" : 126, "indexes" : 20, "indexSize" : 2023649936, "fileSize" : 19251855360, "nsSizeMB" : 16, "ok" : 1 }

,
"LRCHB00362:40004" :

{ "db" : "audit", "collections" : 9, "objects" : 11802688, "avgObjSize" : 1013.5105755570256, "dataSize" : 11962149108, "storageSize" : 13794045952, "numExtents" : 130, "indexes" : 20, "indexSize" : 1976548000, "fileSize" : 19251855360, "nsSizeMB" : 16, "ok" : 1 }

,
"LRCHB00363:40005" :

{ "db" : "audit", "collections" : 9, "objects" : 13500900, "avgObjSize" : 1261.4095741765364, "dataSize" : 17030164520, "storageSize" : 48476135168, "numExtents" : 117, "indexes" : 20, "indexSize" : 3094215376, "fileSize" : 53594816512, "nsSizeMB" : 16, "ok" : 1 }

,
"LRCHB00364:40006" :

{ "db" : "audit", "collections" : 9, "objects" : 12624161, "avgObjSize" : 1073.0492692544083, "dataSize" : 13546346736, "storageSize" : 15748235264, "numExtents" : 135, "indexes" : 20, "indexSize" : 2303767872, "fileSize" : 21398290432, "nsSizeMB" : 16, "ok" : 1 }

,
"LRCHB00374:40007" :

{ "db" : "audit", "collections" : 9, "objects" : 11531961, "avgObjSize" : 1007.7132022905731, "dataSize" : 11620909348, "storageSize" : 12899643392, "numExtents" : 124, "indexes" : 20, "indexSize" : 1950859008, "fileSize" : 19251855360, "nsSizeMB" : 16, "ok" : 1 }

},
"objects" : 84246511,
"avgObjSize" : 1061.3557877548187,
"dataSize" : 89415522048,
"storageSize" : 129867775744,
"numExtents" : 871,
"indexes" : 140,
"indexSize" : 15204253120,
"fileSize" : 169105948672,
"ok" : 1
}

Thanks a lot ,
Preetham

Comment by Greg Studer [ 30/Mar/12 ]

Also, while 2.1.0 is a dev build and not recommended for production, the latest stable mongodb is 2.0.4, so I'd use that for new systems.

Comment by Greg Studer [ 30/Mar/12 ]

> We had to turnoff config's on other machines as router was throwing stale config exceptions and we wanted to get rid of the config machine thats being complained about.

That's not what the stale config exceptions mean - they're normal in most cases (though shouldn't be showing up in your app) and refer to the shards, not to the config servers. Basically they mean that the mongos needs to refresh it's config info, which happens from time to time.

> Question: Is it possible to thrash config and start from scratch without losing the data? Is that a right direction?
You won't lose the data directly, but you'll almost certainly overwrite it, since the mongos uses that info for directing traffic.

I think the next step here is to restart all your mongos and mongod processes, and then run db.printShardingStatus() from the mongos. I don't know what hardware you're running on either, but if the mongod shards and config server are competing for resources on a limited system this could cause strange issues.

Comment by Preetham Derangula [ 30/Mar/12 ]

We initially had shard,router and config running on the same machine, all of them on linux box/es- each machine is a VM. After seeing some problems on the router,which I will explain earlier in the issue, we turned off configs on the router machine. Now, Router,shard and config run on one machine and other instances have only shards running.We had to turnoff config's on other machines as router was throwing stale config exceptions and we wanted to get rid of the config machine thats being complained about.
Other Info:Initially, we had problems with router as it had less hard disk space allotted and shard ran out of disk space sooner than the other machines.We have extended the space now and its not an issue
Question: Is it possible to thrash config and start from scratch without losing the data? Is that a right direction?

The following is the ulimit -a output and its similiar on each machine.
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 73728
max locked memory (kbytes, -l) 32
max memory size (kbytes, -m) unlimited
open files (-n) 65535
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 73728
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

Comment by Greg Studer [ 30/Mar/12 ]

Also, mongodb won't prevent you from running out of space even with balancing on - it will try to spread load across the cluster, but it won't be perfect and you'll need to ensure you have enough room for the data.

Comment by Greg Studer [ 30/Mar/12 ]

I don't think upgrading will solve your problem, it seems like there's a configuration issue.

> After this one of the config servers was holding lot of data in the moveChunk folder and has exhausted all the space

Config servers only hold metadata - and they shouldn't be handling migrations (unless your config server is also a shard - not recommended). How exactly is your cluster set up, and on what hardware? You'll need to make sure that your ulimits are correctly set for fds and processes on every server, and the config servers (you should use three) should ideally be on dedicated machines. All the mongoses will need to be able to communicate with all the config servers and shards, and all shards will need to be able to communicate with each other and the config servers.

Comment by Preetham Derangula [ 30/Mar/12 ]

All the servers are having limit of 65,000+. That shouldn't be an issue

Comment by Scott Hernandez (Inactive) [ 30/Mar/12 ]

Please check each server and make sure your ulimit -n is about a few thousand. That is most likely the issue.
http://www.mongodb.org/display/DOCS/Too+Many+Open+Files

You will need to restart each instance to have changes applied to the process.

Comment by Preetham Derangula [ 30/Mar/12 ]

Some more history of the problem.I have 7 sharded collections and one of the collections is receiving data randomly. This collection is fairly big(65GB), when compared with other collections.This happens once I restart the router and shards.After some time I get this error, "out of file descriptors" and all the collections wont receive any writes.
Now as I tail the logs, I see the following.Its complaining that one of the shards is running out of disk space and locks cant be acquired. I think this might be due to failure of the balancing..

Fri Mar 30 04:45:31 [Balancer] chose [shard6] to [shard5] { id: "audit.tsauditentry-UID"23850964410"", lastmod: Timestamp 83000|1, ns: "audit.tsauditentry", min:

{ UID: "23850964410" }

, max:

{ UID: "23851209616" }

, shard: "shard6" }
Fri Mar 30 04:45:31 [Balancer] moving chunk ns: audit.tsauditentry moving ( ns:audit.tsauditentry at: shard6:LRCHB00364:40006 lastmod: 83|1 min:

{ UID: "23850964410" }

max:

{ UID: "23851209616" }

) shard6:LRCHB00364:40006 -> shard5:LRCHB00363:40005
Fri Mar 30 04:45:31 [Balancer] moveChunk result: { cause:

{ assertion: "Can't take a write lock while out of disk space", assertionCode: 14031, errmsg: "db assertion failure", ok: 0.0 }

, errmsg: "moveChunk failed to engage TO-shard in the data transfer: db assertion failure", ok: 0.0 }
Fri Mar 30 04:45:31 [Balancer] balancer move failed: { cause:

{ assertion: "Can't take a write lock while out of disk space", assertionCode: 14031, errmsg: "db assertion failure", ok: 0.0 }

, errmsg: "moveChunk failed to engage TO-shard in the data transfer: db assertion failure", ok: 0.0 } from: shard6 to: shard5 chunk: { id: "audit.tsauditentry-UID"23850964410"", lastmod: Timestamp 83000|1, ns: "audit.tsauditentry", min:

{ UID: "23850964410" }

, max:

{ UID: "23851209616" }

, shard: "shard6" }
Fri Mar 30 04:45:31 [Balancer] distributed lock 'balancer/lrchb00319:27019:1332972211:1804289383' unlocked.
Fri Mar 30 04:45:41 [Balancer] distributed lock 'balancer/lrchb00319:27019:1332972211:1804289383' acquired, ts : 4f7580c51936b41879c27c1b
Fri Mar 30 04:45:41 [Balancer] chose [shard6] to [shard5] { id: "audit.tsauditentry-UID"23850964410"", lastmod: Timestamp 83000|1, ns: "audit.tsauditentry", min:

{ UID: "23850964410" }

, max:

{ UID: "23851209616" }

, shard: "shard6" }
Fri Mar 30 04:45:41 [Balancer] moving chunk ns: audit.tsauditentry moving ( ns:audit.tsauditentry at: shard6:LRCHB00364:40006 lastmod: 83|1 min:

{ UID: "23850964410" }

max:

{ UID: "23851209616" }

) shard6:LRCHB00364:40006 -> shard5:LRCHB00363:40005
Fri Mar 30 04:45:41 [Balancer] moveChunk result: { cause:

{ assertion: "Can't take a write lock while out of disk space", assertionCode: 14031, errmsg: "db assertion failure", ok: 0.0 }

, errmsg: "moveChunk failed to engage TO-shard in the data transfer: db assertion failure", ok: 0.0 }
Fri Mar 30 04:45:41 [Balancer] balancer move failed: { cause:

{ assertion: "Can't take a write lock while out of disk space", assertionCode: 14031, errmsg: "db assertion failure", ok: 0.0 }

, errmsg: "moveChunk failed to engage TO-shard in the data transfer: db assertion failure", ok: 0.0 } from: shard6 to: shard5 chunk: { id: "audit.tsauditentry-UID"23850964410"", lastmod: Timestamp 83000|1, ns: "audit.tsauditentry", min:

{ UID: "23850964410" }

, max:

{ UID: "23851209616" }

, shard: "shard6" }
Fri Mar 30 04:45:41 [Balancer] distributed lock 'balancer/lrchb00319:27019:1332972211:1804289383' unlocked.
Fri Mar 30 04:45:51 [Balancer] distributed lock 'balancer/lrchb00319:27019:1332972211:1804289383' acquired, ts : 4f7580cf1936b41879c27c1c
Fri Mar 30 04:45:51 [Balancer] chose [shard6] to [shard5] { id: "audit.tsauditentry-UID"23850964410"", lastmod: Timestamp 83000|1, ns: "audit.tsauditentry", min:

{ UID: "23850964410" }

, max:

{ UID: "23851209616" }

, shard: "shard6" }
Fri Mar 30 04:45:51 [Balancer] moving chunk ns: audit.tsauditentry moving ( ns:audit.tsauditentry at: shard6:LRCHB00364:40006 lastmod: 83|1 min:

{ UID: "23850964410" }

max:

{ UID: "23851209616" }

) shard6:LRCHB00364:40006 -> shard5:LRCHB00363:40005
Fri Mar 30 04:45:51 [Balancer] moveChunk result: { cause:

{ assertion: "Can't take a write lock while out of disk space", assertionCode: 14031, errmsg: "db assertion failure", ok: 0.0 }

, errmsg: "moveChunk failed to engage TO-shard in the d

Comment by Eliot Horowitz (Inactive) [ 30/Mar/12 ]

Can you attach the logs or at least a sample of the errors?

Generated at Thu Feb 08 03:08:53 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.