[SERVER-2114] Don't use select timeouts for fast coarse timing Created: 18/Nov/10 Updated: 02/Aug/18 Resolved: 22/Jun/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Networking |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Roger Binns | Assignee: | Andrew Morrow (Inactive) |
| Resolution: | Done | Votes: | 36 |
| Labels: | polish, pull-request | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
64 bit Ubuntu 10.10 |
||
| Attachments: |
|
||||||||||||||||||||||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||||||||||||||||||
| Operating System: | Linux | ||||||||||||||||||||||||||||||||||||||||||||||||
| Sprint: | Platforms 16 (06/24/16) | ||||||||||||||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||||||||||||||
| Linked BF Score: | 0 | ||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
The mongod process causes wakeups 100 times per second as can be seen by calling strace on the process. This is done by calling having a one hundredth of a second timeout on a select call on the listening socket. These wakeups increase power consumption and utilization on virtual machines. It should be noted that this happens all the time even if there are no connections or any other activity happening. Powertop will consistently show mongod as a top offender. An explanation on the mailing list was given: > One main reason is we need a somewhat coarse timer that is as fast as possible. Linux has has a system call for a number of years for getting a timer value - clock_gettime. Except for ancient 32 bit hardware this call is implemented using the VDSO mechanism - a page the kernel maps into process memory. clock_gettime is implemented without there being a system call at all! Heck they even added "a somewhat coarse timer that is as fast as possible"! http://lwn.net/Articles/342018/ You should be able to replace the timeouts and variable they were updating with calls to clock_getttime. |
| Comments |
| Comment by Andrew Morrow (Inactive) [ 22/Jun/16 ] |
|
Hi watchers of this ticket - I'm closing this ticket as 'done'. As of https://github.com/mongodb/mongo/commit/710159c9602a6738e6455cfb26bc2d70a0454ae2, the select timeout in the Listener class is no longer used for fast coarse timing. The Listener does still have a select timeout, currently at 250ms. However, that select timeout is there not for timekeeping, but to work around the fact that closing a listening socket is not guaranteed to unblock select, which currently frustrates the shutdown logic for the Listener. We expect this to be a temporary situation, ultimately resulting in the listen path not having any timeout. This also represents an order of magnitude reduction in wakeups. Note that we have not yet implemented If the primary reason for following this ticket was interest in reduced power/battery consumption in mongod or mongos, please add yourself as a watcher on Also note that the patches posted here are unlikely to apply correctly after the recent changes to the listener. |
| Comment by Andy Schwerin [ 15/Jun/15 ] |
|
I think the best way to move forward is to do a little more work abstracting the use of timers so that we can better select a time source based on the facilities available on the (virtual) hardware. renctan recently introduced a TickSource interface to be used for doing elapsed time measurement (rather than reading the calendar time). If we got consistent about using that for elapsed time measurement, we could choose an implementation at start-up, and hang it off the ServiceContext. The question I've still got to answer is whether we need a slow-but-precise timer on systems where the fastest clock device isn't very precise. I suspect the answer is usually "no". That we rarely, if ever, need a precise measurement of time if the cost of that measurement is more than a few microseconds. I'm hoping to spend a little time thinking about this later in the summer, but my plate is pretty full right now. I know that acm wants to kill the elapsed time tracker, too, because it's an extra job for the networking code that he'd rather it not have. |
| Comment by Ben McCann [ 11/Jun/15 ] |
|
The issue that's been holding up the PR is that Mongo is still supporting RHEL 5 and its 2.6.18 kernel, but CLOCK_MONOTONIC_COARSE wasn't introduced until 2.6.32. Mongo is planning to continue support for RHEL 5 in at least the next major release series - 3.2, so that's not likely to change soon. Any thoughts on moving forward? Should we work the current way for RHEL 5 and the new way for everyone else or would that be too messy? |
| Comment by Mark Callaghan [ 07/Feb/15 ] |
|
A random anecdote, but in the past mongod has been the largest CPU consumer on my Ubunutu VM when some random binary depended on it but it wasn't doing anything. |
| Comment by Greg White [ 06/Feb/15 ] |
|
That's a shame, because it make using mongo as part of a dev stack on a notebook particularly painful. On my laptop, it's the single largest power draw, even when idle. |
| Comment by Andrew Morrow (Inactive) [ 06/Feb/15 ] |
|
Hi All - I know there is a lot of interest in this ticket. We are holding off on changes in this area because we are currently evaluating the overall server network stack with an eye towards a significant refactoring or rewrite, much as has happened with other critical server subsystems during the 2.4, 2.6, and 3.0 release cycles. It is our expectation that the issue identified in this ticket would likely be tackled as part of such a refactoring, currently planned for the upcoming development cycle. Thanks, |
| Comment by Greg White [ 28/Jan/15 ] |
|
3.0.0-rc6 has a serious regression in power usage. It's now using ~4x the power the 2.8-rc5 used - wakeups went from ~60 a second to to > 210 wakeups/sec. Edit: That's using WiredTiger. Using the mmap storage engine, things are between 2-3x worse at ~140 wakeups/sec. |
| Comment by Vladimir Zoubritsky [ 25/Jun/14 ] |
|
Applying the mongo-timer-fix patch with the patches from https://jira.mongodb.org/browse/SERVER-9580 has helped us reduce CPU usage on development machines. We have packaged the patches for Ubuntu at https://build.opensuse.org/package/show/home:dottedmag:mongodb/mongodb. |
| Comment by Ben McCann [ 27/Feb/14 ] |
|
I've run into this being a problem as well. It would be really great to get this patch reviewed. It's a pretty small limited change. |
| Comment by Timo Lindemann [ 26/Feb/14 ] |
|
I have been using Calvin's Patch since 2.4.6. My scenario is that I am a developer running mongodb on my work laptop. (I asked Calvin for an update to the patch since it failed to apply in git master.) The patch makes wakeups go from 200 per second to 20 per second for MongoDB, measured with powertop, and that is idle mongodb, just started up for like ten minutes, no ops at all, with version 2.4.6. I realize my use case of mongodb eating my battery is perhaps only of minor importance to the grand scheme of things, but I humbly ask for another consideration to please include this patch. Maybe it'd be possible to make a compiler option out of it? What exactly is holding the bug up and unresolved for that long of a time? Thanks for consideration. |
| Comment by Axel Kittenberger [ 25/Feb/14 ] |
|
Yay! hope it gets in there. fingers crossed Just measured it on my notebook estimated battery runtime. its 5% longer without mongodb running - with no connected clients. Otherwise is there a fool proof way to replace the mongod that came with debian (jessie here) by one compiled with your patch? |
| Comment by Calvin Owens [ 25/Feb/14 ] |
|
Updating the patch to apply to the current git HEAD - I've heard from a couple developers who say they use it. |
| Comment by Roger Binns [ 01/May/13 ] |
|
Note that your estimate is only taking into account when MongoDB is running on bare metal. When running virtualized/hypervisor then the host also has to dedicate cycles to this timer and wakeups and those cycles can't be used for useful work either. |
| Comment by Calvin Owens [ 28/Apr/13 ] |
|
Update: https://http.snarkywidgets.com/mongo-timer-fix-v3.patch |
| Comment by Calvin Owens [ 21/Apr/13 ] |
|
Also, some crude estimation using my desktop predicts that this fix would save you about 36.5 KWh/year per server, or about $5 at the average US price. In a large datacenter, that could be quite significant. YMMV. |
| Comment by Calvin Owens [ 21/Apr/13 ] |
|
I've cooked up a fix for this, tested on Linux. Unfortunately, I don't have access to an OSX, Windows, or Solaris machine I can test this on. Can anybody here test on any of those platforms? Patch (against current master): https://http.snarkywidgets.com/mongo-timer-fix-v1.patch Remaining questions:
I'm going to wait to submit a pull request until testing has been done on at least OSX and Windows. (NOTE: I did try to test this on my iMac, but the build has been broken on 32-bit OSX for a long time, apparently) |
| Comment by Roger Binns [ 30/Dec/12 ] |
|
There is no debate over whether this can be fixed technically and how to fix it. Commenters and voters believe it should be fixed. 10gen has chosen not to work on it yet. If you fix it then that would be great! |
| Comment by Calvin Owens [ 30/Dec/12 ] |
|
That's my point - why not use clock_gettime() or one of his friends? It was alluded to in the original post that an efficient alternative to repeatedly calling select() didn't exist. getElapsedTimeMillis() could just call clock_gettime(CLOCK_MONOTONIC_RAW), which would perpetuate tiny latency increases, but would allow select() to block forever waiting for connections - a big win in terms of power consumption and CPU usage. As a bonus, you'd get a more accurate timer, since it would have nanosecond precision and wouldn't drift on the long side, as one based on select() will do over time. Here's a benchmark of alternative timer methods: http://stackoverflow.com/questions/6498972/faster-equivalent-of-gettimeofday/13096917#13096917 |
| Comment by Roger Binns [ 30/Dec/12 ] |
|
Each timer call is not inefficient by itself. The problem is the quantity of them - 200 per second even if not a single client is connected. This causes greatly increased power consumption - https://lesswatts.org/documentation/silicon-power-mgmnt/ and for virtualized environments means that the host also has to do work. In addition (on Linux) there is no need to run such a frequent timer since clock_gettime() can be used get a timer value on demand. Run powertop to see just how bad mongodb is. And idle instance should have zero wakeups per second while a connected instance should have wakeups proportional to the amount of connection traffic. |
| Comment by Calvin Owens [ 30/Dec/12 ] |
|
I would like to fix this. Could somebody clarify for me: Thoughts? |
| Comment by HRJ [ 19/Dec/12 ] |
|
My virtual machine is waking up 400 times/second. Inside the virtual machine, mongodb is waking up 200 times/second. Please solve this for the love of our environment. |
| Comment by Vitaly A. Sorokin [ 08/Oct/12 ] |
|
+1 for this bug. affects both my macbook and linode VM. it has been there for two years! just wondering... |
| Comment by Axel Kittenberger [ 08/Jul/12 ] |
|
If this could be fixed! clock() returning the kernel jiffies sure isn't slower than doing a timeout every 10ms. Or just increase the timeouts and add the timer counter by larger values (even calculated) On my Macbook air, mongodb is always a top CPU users and moves it to fan usage. That mongodb is an all powerfull database is cool, but it would be even cooler if it would be scaleable as well, meaning to be able to scale down and be a not unnecessary powerdrain on mobile devices, when nothing is going on. |
| Comment by Teemu Ikonen [ 18/Jun/12 ] |
|
This issue makes laptop development use with MongoDB bit harder. Mongod takes constantly few % of cpu and causes increased battery drainage. |