[SERVER-12410] the lockTime value is bigger than the totalTime value in globalLock metrics of serverStatus() Created: 20/Jan/14 Updated: 20/May/15 Resolved: 23/May/14 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Diagnostics |
| Affects Version/s: | 2.4.8 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | Jianfeng Xu | Assignee: | Ramon Fernandez Marina |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
OS: rhel 5.5 x64 |
||
| Attachments: |
|
||||
| Issue Links: |
|
||||
| Operating System: | Linux | ||||
| Participants: | |||||
| Description |
|
In a three-nodes replicaset, one primary and two secondaries. , , { "_id" : 5, "host" : "10.136.24.24:27032" } ] Using db.serverStatus().globalLock to show lock time on all secondary nodes. Node 10.136.24.24, the lockTime value is bigger than the totalTime value. Noe 10.136.24.38 is correct. So what's the problem? Any ideas. Thanks! NODE 10.136.24.24 , } NODE 10.136.24.38 , } |
| Comments |
| Comment by Ramon Fernandez Marina [ 23/May/14 ] |
|
Hi Jianfeng, I haven't heard back from you for some time, so I'm going to mark this ticket as resolved. If this is still an issue for you, feel free to re-open and provide additional information. Regards, |
| Comment by Ramon Fernandez Marina [ 15/May/14 ] |
|
Hi Jianfeng, we haven't head from you in over two weeks. Have you had a chance to replace ntpdate with a NTP daemon and observe what happens? If you have, can you please confirm whether the behavior you describe goes away? |
| Comment by Ramon Fernandez Marina [ 29/Apr/14 ] |
|
Hi Jianfeng, I can get the totalTime counter to be lower than globalLock by setting the clock backwards a few seconds using the date command, but over time the the totalTime counter grows bigger than globalLock – so I'm wondering whether this counter discrepancy you're observing persists over time, or if it goes away after a few minutes. In other words, if ntpdate makes the clock jump backwards (you can use "ntpdate -s" to see in syslog what ntpdate is doing) then I'd say what you're seeing is expected. If this is an issue for you I'd recommend you use a NTP daemon to make the clock drift and avoid these sudden jumps. |
| Comment by Jianfeng Xu [ 09/Apr/14 ] |
|
Hi Stephen, We use ntpdate to synchronise the clocks to a ntpd server on all the rs nodes in every three hours. Like this: And all the rs nodes are running on the real machine. Thanks! |
| Comment by Thomas Rueckstiess [ 07/Apr/14 ] |
|
Hi Jianfeng, Is this still an issue for you? If so, can you answer Stephen's questions above? We'd like to find out what could be causing the difference you are seeing and our current hypothesis is that it could be related to clock skew or small clock adjustments. Thanks, |
| Comment by Stennie Steneker (Inactive) [ 18/Mar/14 ] |
|
Hi Jianfeng, I'm wondering if perhaps the odd times are due to clock skew/adjustments ( A few questions:
Thanks, |
| Comment by Jianfeng Xu [ 11/Feb/14 ] |
|
Hi Dan |
| Comment by Daniel Pasette (Inactive) [ 21/Jan/14 ] |
|
That output looks normal and the global lock time reported in the output is consistent (reported in "." of the locks section). I can't think of any known bugs related to this symptom. If everything is functioning normally aside from this stat, I don't think it's anything to worry about. The stats will only be reset with a restart of the server – please re-post if you see this behavior again. |
| Comment by Jianfeng Xu [ 21/Jan/14 ] |
|
Please look at the attach file locks.txt. Thanks. |
| Comment by Daniel Pasette (Inactive) [ 20/Jan/14 ] |
|
can you attach the results of db.serverStatus().locks as well? |