[SERVER-7850] Config server floods log with: CMD fsync: sync:1 lock:0 Created: 05/Dec/12 Updated: 15/Feb/13 Resolved: 09/Jan/13 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 2.2.1 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | James Blackburn | Assignee: | Andre de Frere |
| Resolution: | Duplicate | Votes: | 1 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Operating System: | Linux | ||||||||||||
| Participants: | |||||||||||||
| Description |
|
We have a sharded cluster of 5 mongod's. They are started like this:
These are connected to by 96 nodes running mongos. There are three config servers are run like:
We see the config server log file flooded with:
The config server has continuous 30% usage with no load on the cluster. It could be that each mongos is trying to balance, and this causes the config server to be permanently loaded. Disabling balancing reduces the load considerably to 5-10% CPU (cluster still not loaded). However we still see this fsync command every 30s or so from every connection. Is this expected? Can we enable balancing on only one mongos? |
| Comments |
| Comment by James Blackburn [ 07/Jan/13 ] |
|
Ok, I've voted on |
| Comment by Andre de Frere [ 10/Dec/12 ] |
|
Hi James, There is currently a ticket to reduce the amount of logging that the config servers do around the fsync command. You can vote and comment on that issue at You can reduce the impact of this by rotating the logs periodically, or by using the --syslog command line option to handle the log files in the same manner as the systems syslog. High CPU on the config server should not translate into reduced performance for accessing data via the mongos. The mongos maintains a cached copy of the config database which it will use in lieu of directly accessing the config database. The CPU load you are seeing is likely brought about by connection handling for the number of mongos nodes, rather than load on the cluster. Balancing operations do not take place on the config servers themselves, so increased load on the config server should not impact the performance on balancing and migration. The config data is generally small, and therefore rapid fsync operations should not have undue effects on other users on that node - that being said, isolating the config servers to their own hosts would eliminate the possibility of both effects on other users, and the potential of config servers being brought down due external circumstances. |
| Comment by James Blackburn [ 07/Dec/12 ] |
|
Hi Andre, Thanks for the reply. From my POV the problems this causes are: 1) Excessive log files: 300M Other worries:
|
| Comment by Andre de Frere [ 07/Dec/12 ] |
|
Hi James, Seeing the config server logs contain mostly fsync commands is expected behaviour. This is due to the way that updates to the config database are persisted. The config servers will incur some load, even with apparently little else going on in the cluster. This is because every mongos will need to connect to the server during the normal course of operations in order to update their cached config. This load will be increased by the number of mongos nodes in your environment. If the balancing is switched off in the environment, this means that the chance of get a staleConfigDetected (and therefore requiring new config) will be reduced to approximately zero. This means that no new config will be required, meaning that the load on the config server will be greatly reduced. When balancing starts, the mongos responsible will take out a distributed lock, preventing any of the other mongos nodes from starting balancing operations. You do not need to enable balancing on only one node, this is already accounted for. The additional load that you will see on one config server (over the other two) will be due to the way the mongos processes connect to the config servers. They will use the first config server in the - Could you let us know what information you were looking for in the config server logs? |
| Comment by James Blackburn [ 06/Dec/12 ] |
|
Can I provide any more information to aid in debugging this? |