[SERVER-2886] Many commands being executed for 1271310319ms, mongo Created: 01/Apr/11 Updated: 05/Oct/12 Resolved: 25/Jun/12 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 1.8.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor - P4 |
| Reporter: | aleksey | Assignee: | Aaron Staple |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Debian Lenny 64-bit, 6 Gb RAM |
||
| Issue Links: |
|
||||||||||||||||||||
| Operating System: | Linux | ||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Description |
|
Server hosts two mongod instances: 1. First mongod contains single 256 Mb capped collections with 100-300 inserts/sec and one index (no replica set, no sharding). It has been working fine for about year, but since day before yesterday both instances began to lot of log messages like this: query admin.$cmd ntoreturn:1 command: { getlasterror: 1 } reslen:83 1271310319ms or in case of replica set query admin.$cmd ntoreturn:1 command: { replSetHeartbeat: "vsrv", v: 12, pv: 1, checkEmpty: false, from: "213.xxx.xxx.xxx:27017" }reslen:132 1271310319ms Servers with other replica set member doesn't produce this messages. Maybe this issue connected with the fact that kernel began to log this messages to syslog: |
| Comments |
| Comment by Aaron Staple [ 25/Jun/12 ] | |||||
|
We've seen some strange logs in the past where the clock is moving forward but reported op times are always the same large value. This could potentially be caused by certain types of flapping of the clock value but is still suspicious. Nonetheless, use of a non monotonic clock seems like the most likely cause at this point. So I'm going to close this ticket and mark as a duplicate of | |||||
| Comment by Kay Agahd [ 10/May/12 ] | |||||
|
You could fix it by using monotonic clock which is better suited than current time to measure time intervals. Monotonic clock is never negativ and often faster than calling gettimeofday. | |||||
| Comment by Randolph Tan [ 09/May/12 ] | |||||
|
This probably has to do with our use of gettimeofday for measuring time. It appears that it is possible for the time returned from gettimeofday be backwards and this will give you negative time difference. Here's a short program below that simulates CurOp::elapsedMillis and shows how -10msec can interpreted into 14days
| |||||
| Comment by aleksey [ 08/Apr/11 ] | |||||
|
After we have allocated more RAM to this server, problem seems to be disappeared. It's strange, because database is not very large and this server hosts only this db and few workers, that doesn't consume much RAM: vsrv:SECONDARY> db.stats() vsrv:SECONDARY> db.serverStatus().mem | |||||
| Comment by Scott Hernandez (Inactive) [ 02/Apr/11 ] | |||||
|
If you have OS/hardware problems that would most likely be the cause. Are there any other messages coming from the os? |