Inefficient resolution of convoy of database metadata routing refreshes

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: 8.3.0-rc0, 8.3.0-rc1, 8.3.0-rc2
    • Component/s: None
    • None
    • Catalog and Routing
    • ALL
    • v8.3
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      This issue only impacts 8.3 since it is the first version where SPM-3729 is enabled.

      Before SPM-3729
      Former primary shards that were targeted by a stale router were: 1) installing a version on the DSS, if they didn't have one and 2) sending back that version to the router. Note that this version was potentially stale (i.e. indicating that another shard was the primary, for sure not them) and was useful to provide some hints to the routers of who could be the new primary shard. This behavior allowed us to resolve all the stale requests potentially with one refresh: all of them returned back to the router the same wanted version, the router did just one refresh, hitting the CSRS and after that resolved all the request with the new routing information.

      After SPM-3729
      With the new implementation some things changed: shards don't keep a stale view of who they believe is the primary shard: after ensuring that they are not the primary shard, they return a stale db error to the router, but without adding a wanted version.  This causes an invalidation of the cache as part of the error handling in the router. With this new flow, we won't have a unique refresh solving all the requests but it will depend on the specific interleaving of routing invalidations and routing metadata refreshes from the CSRS.

      Note that this inefficient resolution will only take place after: movePrimary or DropDB+Recreation.

      Thanks to  kaloian.manassiev@mongodb.com  for pointing out the difference of behavior on the CatalogCache between collections and databases.

            Assignee:
            Unassigned
            Reporter:
            Sergi Mateo Bellido
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated: