Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-37283

View graph cycle on expressive lookup secondary read

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: 4.0.0
    • Fix Version/s: 4.1.8
    • Component/s: Aggregation Framework
    • Labels:
      None
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Requested:
      v4.0
    • Steps To Reproduce:
      Hide

      The following patch is based on this commit and will make this issue easier to reproduce:

      diff --git a/jstests/concurrency/fsm_workloads/view_catalog_cycle_lookup.js b/jstests/concurrency/fsm_workloads/view_catalog_cycle_lookup.js
      index 4745ef3..cc1e14e 100644
      --- a/jstests/concurrency/fsm_workloads/view_catalog_cycle_lookup.js
      +++ b/jstests/concurrency/fsm_workloads/view_catalog_cycle_lookup.js
      @@ -15,7 +15,7 @@ var $config = (function() {
           const prefix = 'view_catalog_cycle_lookup_';
       
           var data = {
      -        viewList: ['viewA', 'viewB', 'viewC', 'viewD', 'viewE'].map(viewName => prefix + viewName),
      +        viewList: ['viewA', 'viewB', 'viewC'].map(viewName => prefix + viewName),
               getRandomView: function(viewList) {
                   return viewList[Random.randInt(viewList.length)];
               },
      @@ -138,8 +138,8 @@ var $config = (function() {
           }
       
           return {
      -        threadCount: 20,
      -        iterations: 100,
      +        threadCount: 60,
      +        iterations: 150,
               data: data,
               states: states,
               startState: 'readFromView',
      diff --git a/src/mongo/db/commands/run_aggregate.cpp b/src/mongo/db/commands/run_aggregate.cpp
      index 958c33f..ee6b10c 100644
      --- a/src/mongo/db/commands/run_aggregate.cpp
      +++ b/src/mongo/db/commands/run_aggregate.cpp
      @@ -244,6 +244,7 @@ StatusWith<StringMap<ExpressionContext::ResolvedNamespace>> resolveInvolvedNames
               } else if (viewCatalog->lookup(opCtx, involvedNs.ns())) {
                   // If 'involvedNs' refers to a view namespace, then we resolve its definition.
                   auto resolvedView = viewCatalog->resolveView(opCtx, involvedNs);
      +            sleepmillis(100);
                   if (!resolvedView.isOK()) {
                       return {ErrorCodes::FailedToParse,
                               str::stream() << "Failed to resolve view '" << involvedNs.ns() << "': " 
      

      Run resmoke with the following arguments (this may take a few runs to trigger):

      python buildscripts/resmoke.py --suite=concurrency_replication_causal_consistency jstests/concurrency/fsm_workloads/view_catalog_cycle_lookup.js
      

      The following patch includes an additional change which will trigger an invariant if resolveInvolvedNamespaces() encounters a invalid ViewCatalog mid-resolution. This confirms that the MODE_IS lock is not protecting the ViewCatalog from change:

      diff --git a/jstests/concurrency/fsm_workloads/view_catalog_cycle_lookup.js b/jstests/concurrency/fsm_workloads/view_catalog_cycle_lookup.js
      index 4745ef3..cc1e14e 100644
      --- a/jstests/concurrency/fsm_workloads/view_catalog_cycle_lookup.js
      +++ b/jstests/concurrency/fsm_workloads/view_catalog_cycle_lookup.js
      @@ -15,7 +15,7 @@ var $config = (function() {
           const prefix = 'view_catalog_cycle_lookup_';
       
           var data = {
      -        viewList: ['viewA', 'viewB', 'viewC', 'viewD', 'viewE'].map(viewName => prefix + viewName),
      +        viewList: ['viewA', 'viewB', 'viewC'].map(viewName => prefix + viewName),
               getRandomView: function(viewList) {
                   return viewList[Random.randInt(viewList.length)];
               },
      @@ -138,8 +138,8 @@ var $config = (function() {
           }
       
           return {
      -        threadCount: 20,
      -        iterations: 100,
      +        threadCount: 60,
           return {
      -        threadCount: 20,
      -        iterations: 100,
      +        threadCount: 60,
      +        iterations: 150,
               data: data,
               states: states,
               startState: 'readFromView',
      diff --git a/src/mongo/db/commands/run_aggregate.cpp b/src/mongo/db/commands/run_aggregate.cpp
      index 958c33f..ea5530c 100644
      --- a/src/mongo/db/commands/run_aggregate.cpp
      +++ b/src/mongo/db/commands/run_aggregate.cpp
      @@ -220,6 +220,7 @@ StatusWith<StringMap<ExpressionContext::ResolvedNamespace>> resolveInvolvedNames
           std::deque<NamespaceString> involvedNamespacesQueue(pipelineInvolvedNamespaces.begin(),
                                                               pipelineInvolvedNamespaces.end());
           StringMap<ExpressionContext::ResolvedNamespace> resolvedNamespaces;
      +    bool allowViewCatalogReload = true;
       
           while (!involvedNamespacesQueue.empty()) {
               auto involvedNs = std::move(involvedNamespacesQueue.front());
      @@ -243,7 +244,9 @@ StatusWith<StringMap<ExpressionContext::ResolvedNamespace>> resolveInvolvedNames
                   resolvedNamespaces[involvedNs.coll()] = {involvedNs, std::vector<BSONObj>{}};
               } else if (viewCatalog->lookup(opCtx, involvedNs.ns())) {
                   // If 'involvedNs' refers to a view namespace, then we resolve its definition.
      -            auto resolvedView = viewCatalog->resolveView(opCtx, involvedNs);
      +            auto resolvedView = viewCatalog->resolveView(opCtx, involvedNs, allowViewCatalogReload);
      +            allowViewCatalogReload = false;
      +            sleepmillis(100);
                   if (!resolvedView.isOK()) {
                       return {ErrorCodes::FailedToParse,
                               str::stream() << "Failed to resolve view '" << involvedNs.ns() << "': "
      diff --git a/src/mongo/db/views/view_catalog.cpp b/src/mongo/db/views/view_catalog.cpp
      index eb1e6dc..e8bf1a4 100644
      --- a/src/mongo/db/views/view_catalog.cpp
      +++ b/src/mongo/db/views/view_catalog.cpp
      @@ -430,7 +430,8 @@ std::shared_ptr<ViewDefinition> ViewCatalog::lookup(OperationContext* opCtx, Str
       }
       
       StatusWith<ResolvedView> ViewCatalog::resolveView(OperationContext* opCtx,
      -                                                  const NamespaceString& nss) {
      +                                                  const NamespaceString& nss,
      +                                                  bool allowReload) {
           stdx::unique_lock<stdx::mutex> lock(_mutex);
       
           // Keep looping until the resolution completes. If the catalog is invalidated during the
      @@ -461,6 +462,7 @@ StatusWith<ResolvedView> ViewCatalog::resolveView(OperationContext* opCtx,
       
                   // If the catalog has been invalidated, bail and restart.
                   if (!_valid.load()) {
      +                invariant(allowReload);
                       uassertStatusOK(_reloadIfNeeded_inlock(opCtx));
                       break;
                   }
      diff --git a/src/mongo/db/views/view_catalog.h b/src/mongo/db/views/view_catalog.h
      index 14bdf4d..d84245a 100644
      --- a/src/mongo/db/views/view_catalog.h
      +++ b/src/mongo/db/views/view_catalog.h
      @@ -115,7 +115,9 @@ public:
            * fully-resolved view definition containing the backing namespace, the resolved pipeline and
            * the collation to use for the operation.
            */
      -    StatusWith<ResolvedView> resolveView(OperationContext* opCtx, const NamespaceString& nss);
      +    StatusWith<ResolvedView> resolveView(OperationContext* opCtx,
      +                                         const NamespaceString& nss,
      +                                         bool allowReload = true);
       
           /**
            * Reload the views catalog if marked invalid. No-op if already valid. Does only minimal 
      

      Show
      The following patch is based on this commit and will make this issue easier to reproduce: diff --git a/jstests/concurrency/fsm_workloads/view_catalog_cycle_lookup.js b/jstests/concurrency/fsm_workloads/view_catalog_cycle_lookup.js index 4745ef3..cc1e14e 100644 --- a/jstests/concurrency/fsm_workloads/view_catalog_cycle_lookup.js +++ b/jstests/concurrency/fsm_workloads/view_catalog_cycle_lookup.js @@ -15,7 +15,7 @@ var $config = (function() { const prefix = 'view_catalog_cycle_lookup_'; var data = { - viewList: ['viewA', 'viewB', 'viewC', 'viewD', 'viewE'].map(viewName => prefix + viewName), + viewList: ['viewA', 'viewB', 'viewC'].map(viewName => prefix + viewName), getRandomView: function(viewList) { return viewList[Random.randInt(viewList.length)]; }, @@ -138,8 +138,8 @@ var $config = (function() { } return { - threadCount: 20, - iterations: 100, + threadCount: 60, + iterations: 150, data: data, states: states, startState: 'readFromView', diff --git a/src/mongo/db/commands/run_aggregate.cpp b/src/mongo/db/commands/run_aggregate.cpp index 958c33f..ee6b10c 100644 --- a/src/mongo/db/commands/run_aggregate.cpp +++ b/src/mongo/db/commands/run_aggregate.cpp @@ -244,6 +244,7 @@ StatusWith<StringMap<ExpressionContext::ResolvedNamespace>> resolveInvolvedNames } else if (viewCatalog->lookup(opCtx, involvedNs.ns())) { // If 'involvedNs' refers to a view namespace, then we resolve its definition. auto resolvedView = viewCatalog->resolveView(opCtx, involvedNs); + sleepmillis(100); if (!resolvedView.isOK()) { return {ErrorCodes::FailedToParse, str::stream() << "Failed to resolve view '" << involvedNs.ns() << "': " Run resmoke with the following arguments (this may take a few runs to trigger): python buildscripts/resmoke.py --suite=concurrency_replication_causal_consistency jstests/concurrency/fsm_workloads/view_catalog_cycle_lookup.js The following patch includes an additional change which will trigger an invariant if resolveInvolvedNamespaces() encounters a invalid ViewCatalog mid-resolution. This confirms that the MODE_IS lock is not protecting the ViewCatalog from change: diff --git a/jstests/concurrency/fsm_workloads/view_catalog_cycle_lookup.js b/jstests/concurrency/fsm_workloads/view_catalog_cycle_lookup.js index 4745ef3..cc1e14e 100644 --- a/jstests/concurrency/fsm_workloads/view_catalog_cycle_lookup.js +++ b/jstests/concurrency/fsm_workloads/view_catalog_cycle_lookup.js @@ -15,7 +15,7 @@ var $config = (function() { const prefix = 'view_catalog_cycle_lookup_'; var data = { - viewList: ['viewA', 'viewB', 'viewC', 'viewD', 'viewE'].map(viewName => prefix + viewName), + viewList: ['viewA', 'viewB', 'viewC'].map(viewName => prefix + viewName), getRandomView: function(viewList) { return viewList[Random.randInt(viewList.length)]; }, @@ -138,8 +138,8 @@ var $config = (function() { } return { - threadCount: 20, - iterations: 100, + threadCount: 60, return { - threadCount: 20, - iterations: 100, + threadCount: 60, + iterations: 150, data: data, states: states, startState: 'readFromView', diff --git a/src/mongo/db/commands/run_aggregate.cpp b/src/mongo/db/commands/run_aggregate.cpp index 958c33f..ea5530c 100644 --- a/src/mongo/db/commands/run_aggregate.cpp +++ b/src/mongo/db/commands/run_aggregate.cpp @@ -220,6 +220,7 @@ StatusWith<StringMap<ExpressionContext::ResolvedNamespace>> resolveInvolvedNames std::deque<NamespaceString> involvedNamespacesQueue(pipelineInvolvedNamespaces.begin(), pipelineInvolvedNamespaces.end()); StringMap<ExpressionContext::ResolvedNamespace> resolvedNamespaces; + bool allowViewCatalogReload = true; while (!involvedNamespacesQueue.empty()) { auto involvedNs = std::move(involvedNamespacesQueue.front()); @@ -243,7 +244,9 @@ StatusWith<StringMap<ExpressionContext::ResolvedNamespace>> resolveInvolvedNames resolvedNamespaces[involvedNs.coll()] = {involvedNs, std::vector<BSONObj>{}}; } else if (viewCatalog->lookup(opCtx, involvedNs.ns())) { // If 'involvedNs' refers to a view namespace, then we resolve its definition. - auto resolvedView = viewCatalog->resolveView(opCtx, involvedNs); + auto resolvedView = viewCatalog->resolveView(opCtx, involvedNs, allowViewCatalogReload); + allowViewCatalogReload = false; + sleepmillis(100); if (!resolvedView.isOK()) { return {ErrorCodes::FailedToParse, str::stream() << "Failed to resolve view '" << involvedNs.ns() << "': " diff --git a/src/mongo/db/views/view_catalog.cpp b/src/mongo/db/views/view_catalog.cpp index eb1e6dc..e8bf1a4 100644 --- a/src/mongo/db/views/view_catalog.cpp +++ b/src/mongo/db/views/view_catalog.cpp @@ -430,7 +430,8 @@ std::shared_ptr<ViewDefinition> ViewCatalog::lookup(OperationContext* opCtx, Str } StatusWith<ResolvedView> ViewCatalog::resolveView(OperationContext* opCtx, - const NamespaceString& nss) { + const NamespaceString& nss, + bool allowReload) { stdx::unique_lock<stdx::mutex> lock(_mutex); // Keep looping until the resolution completes. If the catalog is invalidated during the @@ -461,6 +462,7 @@ StatusWith<ResolvedView> ViewCatalog::resolveView(OperationContext* opCtx, // If the catalog has been invalidated, bail and restart. if (!_valid.load()) { + invariant(allowReload); uassertStatusOK(_reloadIfNeeded_inlock(opCtx)); break; } diff --git a/src/mongo/db/views/view_catalog.h b/src/mongo/db/views/view_catalog.h index 14bdf4d..d84245a 100644 --- a/src/mongo/db/views/view_catalog.h +++ b/src/mongo/db/views/view_catalog.h @@ -115,7 +115,9 @@ public: * fully-resolved view definition containing the backing namespace, the resolved pipeline and * the collation to use for the operation. */ - StatusWith<ResolvedView> resolveView(OperationContext* opCtx, const NamespaceString& nss); + StatusWith<ResolvedView> resolveView(OperationContext* opCtx, + const NamespaceString& nss, + bool allowReload = true); /** * Reload the views catalog if marked invalid. No-op if already valid. Does only minimal
    • Sprint:
      Query 2018-12-17, Query 2018-12-31, Query 2019-01-14, Query 2019-01-28, Query 2019-02-11, Query 2019-02-25
    • Linked BF Score:
      58

      Description

      It is possible for the view_catalog_cycle_lookup.js FSM test to fail with MaxSubPipelineDepthExceeded when run in a suite that performs secondary reads. This is caused when the aggregate command generates an invalid pipeline containing a view cycle and can happen when the view catalog changes while resolving pipeline namespaces.

      When the aggregate command calls resolveInvolvedNamespaces() it holds a MODE_IS database lock which is meant to protect against ViewCatalog change, as collmod & collection drop/create require a database MODE_X lock. On secondaries however view catalog changes are replicated as inserts to the system.views collection and obtain a MODE_IX lock rather than MODE_X. This allows for view definition change mid-resolution and can result in an invalid view graph, one that may contain a cycle.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              charlie.swanson Charlie Swanson
              Reporter:
              james.wahlin James Wahlin
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: