Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-41327

Investigate a query failure with CursorKilled along with failure in obtaining lock for stats collection

    • Type: Icon: Improvement Improvement
    • Resolution: Works as Designed
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None
    • Query 2019-07-01, Query 2019-07-15, Query 2019-07-29, Query 2019-08-12, Query 2019-08-26

      When a long running query is executed MongoDB tries to acquire storage statistics. If the lock request for that operation times out, it fails with the following message:

      [conn4650] Timed out obtaining lock while trying to gather storage statistics for a slow operation

      Then, we can see that the cursor is killed:

      Error in $cursor stage :: caused by :: operation was interrupted" errName:CursorKilled errCode:237 reslen:292

      That query needed 257292ms to execute, to then get killed because it couldn't acquire a global lock to gather statistics (500ms timeout).

      I think that in the cases where the lock request timeouts MongoDB should skip the gathering instead of killing the cursor.

      Edit:
      We already skip storage statistics collection in case of a failure to obtain MODE_IS global lock. In such a case an error is reported but the query proceeds as normal.
      This ticket should root cause the reason why the query failed with CursorKilled. A failure to obtain MODE_IS global lock might be hinting towards the state of the system.

            Assignee:
            anton.korshunov@mongodb.com Anton Korshunov
            Reporter:
            miguel.nieto@mongodb.com Miguel Angel Nieto
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated:
              Resolved: