Core Server
  1. Core Server
  2. SERVER-4709

Strange timing of operations reported in logs

    Details

    • Type: Question Question
    • Status: Closed Closed
    • Priority: Major - P3 Major - P3
    • Resolution: Cannot Reproduce
    • Affects Version/s: 2.0.2
    • Fix Version/s: None
    • Component/s: Logging
    • Labels:
      None
    • Backport:
      No
    • # Replies:
      12
    • Last comment by Customer:
      true

      Description

      While checking logs for SERVER-4100 causes (this case is unrelated, mentioned for information purposes only), I got some strange reconds on the primary:

      Sat Jan 14 02:15:37 [initandlisten] connection accepted from 10.2.1.200:48055 #108356
      Sat Jan 14 02:15:48 [conn108356] update models_stats.show_log query:

      { _id: ObjectId('4ebf7eb76b8dda061480d539') }

      update: { $unset:

      { h: 1 }

      , $set:

      { e: null, s: 1321172663 }

      } keyUpdates:1 1271310315ms
      Sat Jan 14 02:15:53 [conn108356] end connection 10.2.1.200:48055

      This part of log shows the update running for something around 14 days, but entire connection existed less than a minute, looks like there's something amiss.
      We have several (in the tens range) of such queries in logs with the same time (1271310315ms) on different connections with different operations.

      1. test.c
        2 kB
        Ben Becker

        Issue Links

          Activity

          Hide
          Ben Becker
          added a comment -

          It's worth noting that mongo does not use a monotonic timer for most stats, and thus any change in system time will affect the results. This can happen due to a Xen live migration, system suspend, NTP time change, user-initiated time change, etc...

          If this issue is on-going (and you have the time), could you try building the attached test.c file and let it run while the issue appears? It runs gettimeofday() and clock_gettime() every ~50ms, and shouldn't use too much of the CPU. Please post back any output from this program, or if you observe another invalid time while the program does not report anything.

          To build and run (assuming test.c is in the current directory):

          gcc ./test -otest -lrt && ./test
          

          Thanks for the report; I'll see if we can move towards safer, monotonic timing functions in the future.

          Show
          Ben Becker
          added a comment - It's worth noting that mongo does not use a monotonic timer for most stats, and thus any change in system time will affect the results. This can happen due to a Xen live migration, system suspend, NTP time change, user-initiated time change, etc... If this issue is on-going (and you have the time), could you try building the attached test.c file and let it run while the issue appears? It runs gettimeofday() and clock_gettime() every ~50ms, and shouldn't use too much of the CPU. Please post back any output from this program, or if you observe another invalid time while the program does not report anything. To build and run (assuming test.c is in the current directory): gcc ./test -otest -lrt && ./test Thanks for the report; I'll see if we can move towards safer, monotonic timing functions in the future.
          Hide
          Aristarkh Zagorodnikov
          added a comment -

          Will do next week.

          Show
          Aristarkh Zagorodnikov
          added a comment - Will do next week.
          Hide
          Aristarkh Zagorodnikov
          added a comment -

          I tend to blame virtualization, because we ran the same workload on physical hardware, I checked logs and there was no similar problems.
          I also would like to not that this effect was demonstrated twice, when system was under very high (unusually, to the point of can't-type-in-SSH) load.

          Show
          Aristarkh Zagorodnikov
          added a comment - I tend to blame virtualization, because we ran the same workload on physical hardware, I checked logs and there was no similar problems. I also would like to not that this effect was demonstrated twice, when system was under very high (unusually, to the point of can't-type-in-SSH) load.
          Hide
          Ben Becker
          added a comment -

          SERVER-4740 may help with strange time results.

          Show
          Ben Becker
          added a comment - SERVER-4740 may help with strange time results.
          Hide
          Ben Becker
          added a comment -

          I'm going to close this issue for now. Please reopen if you encounter this issue again.

          Show
          Ben Becker
          added a comment - I'm going to close this issue for now. Please reopen if you encounter this issue again.

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:
                Days since reply:
                2 years, 12 weeks, 6 days ago
                Date of 1st Reply: