Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-95330

Validation checks to local catalog for unsharded collections are too strict

    • Type: Icon: Bug Bug
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 8.1.0-rc0, 7.0.14, 8.0.0
    • Component/s: Sharding
    • Catalog and Routing
    • ALL
    • Hide

      1. Apply the attached repro on commit 35e4c053add
      2. Run the test until the failure happens:
      python bazel-mongo/buildscripts/resmoke.py run --suites=sharding ./jstests/sharding/test_snapshot_reads_in_rollbacks.js --log=file --repeat=100

      NOTE: the repro it is brute forcing the scenario, it might take a lot of time to reproduce

      Show
      1. Apply the attached repro on commit 35e4c053add 2. Run the test until the failure happens: python bazel-mongo/buildscripts/resmoke.py run --suites=sharding ./jstests/sharding/test_snapshot_reads_in_rollbacks.js --log=file --repeat=100 NOTE: the repro it is brute forcing the scenario, it might take a lot of time to reproduce
    • CAR Team 2024-10-28
    • 0
    • 1

      SERVER-84723 added extra validations to have the guarantee that any operation on unsharded collections with atClusterTime or inside a transaction were done for the same incarnation of the collection. In practice, this was done by asserting that the collection catalog which was stashed is the same that the one present in the latest catalog.

      This had an unintended consequence, during a rollback, a new collection catalog is installed, so, considering the intended check internally is comparing collection pointers, if an operation with a specific atClusterTime comes in after the rollback, and the minimum visible timestamp is higher, then the collection is opened (as in, recovered from Wired Tiger), which makes the validation to fail, because the opened collection will be seen as different than the one in the latest catalog.

      From a user perspective, this might cause some reads with atClusterTime to fail after a rollback, or, a resharding operation to fail.

      To ensure the same guarantee but being less strict, we could try to compare the collection catalog UUID's, if they're the same, then we can assume they are the same incarnation of the collection.

        1. bf-34498-repro.patch
          6 kB
          Marcos José Grillo Ramirez

            Assignee:
            paolo.polato@mongodb.com Paolo Polato
            Reporter:
            marcos.grillo@mongodb.com Marcos José Grillo Ramirez
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated: