Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-33660

Once getMores include lsid, sharded aggregations with $mergeCursors can hang

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.7.3
    • Component/s: Aggregation Framework
    • Labels:
      None
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Sprint:
      Query 2018-03-12

      Description

      A deadlock is induced on the SessionCatalog:

      1. The operation performing the merging half of the pipeline checks out the Session for that lsid.
      2. That operation includes a $mergeCursors, which performs getMores on remote hosts, one of which is the same host performing the $mergeCursors.
      3. That operation will attempt to check out the same session once the getMore includes the lsid - blocking on a mutex in the SessionCatalog.

      As a short term fix, we should do the following:

      1. Only check out the Session if the operation includes a transaction number.
      2. Ban aggregations with a transaction number on mongos.

      As a long term fix, we will investigate either not using getMores over the network for what are really local reads. If that proves difficult, we will have to re-evaluate.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              charlie.swanson Charlie Swanson
              Reporter:
              charlie.swanson Charlie Swanson
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: