Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-41327

Investigate a query failure with CursorKilled along with failure in obtaining lock for stats collection

    XMLWordPrintable

Details

    • Improvement
    • Status: Closed
    • Major - P3
    • Resolution: Works as Designed
    • None
    • None
    • None
    • None
    • Query 2019-07-01, Query 2019-07-15, Query 2019-07-29, Query 2019-08-12, Query 2019-08-26

    Description

      When a long running query is executed MongoDB tries to acquire storage statistics. If the lock request for that operation times out, it fails with the following message:

      [conn4650] Timed out obtaining lock while trying to gather storage statistics for a slow operation

      Then, we can see that the cursor is killed:

      Error in $cursor stage :: caused by :: operation was interrupted" errName:CursorKilled errCode:237 reslen:292

      That query needed 257292ms to execute, to then get killed because it couldn't acquire a global lock to gather statistics (500ms timeout).

      I think that in the cases where the lock request timeouts MongoDB should skip the gathering instead of killing the cursor.

      Edit:
      We already skip storage statistics collection in case of a failure to obtain MODE_IS global lock. In such a case an error is reported but the query proceeds as normal.
      This ticket should root cause the reason why the query failed with CursorKilled. A failure to obtain MODE_IS global lock might be hinting towards the state of the system.

      Attachments

        Issue Links

          Activity

            People

              anton.korshunov@mongodb.com Anton Korshunov
              miguel.nieto@mongodb.com Miguel Angel Nieto
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: