Uploaded image for project: 'Compass '
  1. Compass
  2. COMPASS-9090

Investigate changes in SERVER-98399: Time-series collections with mixed-schema data should fail validation if only the top-level mixed-schema flag is set

    • Type: Icon: Investigation Investigation
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • No version
    • Affects Version/s: None
    • Component/s: None
    • None
    • Not Needed
    • Developer Tools

      Original Downstream Change Summary
      • validate over a time-series collection containing mixed-schema buckets will report the error for CA-85 / SERVER-91194 more aggressively: It will report an error if the mixed-schema flag has not been lost, but is still at risk of being lost on a future initial sync / movePrimary / etc. operation.
      • insert and update operations over a time-series collection's internal system.buckets namespace will similarly reject mixed-schema buckets if the mixed-schema flag could be lost in the future (CannotInsertTimeseriesBucketsWithMixedSchema error code).

      Those changes are expected to have no/low impact downstream impact, as those cases should already be handled as part of the initial response to SERVER-91194; this ticket only expands the cases where the pre-existing errors are reported.

      Description of Linked Ticket

      Summary

      If the top-level timeseriesBucketsMayHaveMixedSchemaData catalog flag is true, but the alternative flag in the WiredTiger configuration string (introduced in SERVER-91195) is not set, collection validation should fail. Otherwise, certain upgrade paths can cause a replica set to appear to be in a healthy state with respect to query correctness and collection validation of time series collections, yet nevertheless exhibit problems in the future when a time series collection is later cloned.

      Details

      SERVER-91194 described an issue where the value of the timeseriesBucketsMayHaveMixedSchemaData top-level catalog flag could be lost when the collection is cloned due to various procedures (such as initial sync, mongodump, etc..). This could result in queries over time series collections missing matching documents.

      To fix this, SERVER-91195 provided a method to ensure that this flag is preserved when the collection is cloned: The value is stored as part of the collection options, inside the WiredTiger configuration string (md.options.storageEngine.wiredTiger.configString).

      Specifically, the flag is set on the WiredTiger configuration string on the following cases:

      • When the FCV is upgraded from 5.0 to 6.0 using a MongoDB binary with SERVER-91195 fixed (i.e. MongoDB ≥ 6.0.17).

      However, other upgrade paths (such as 5.0.28 -> 6.0.16 -> 7.0.15, or 5.0.28 -> 6.0.16 -> 6.0.19) never set the flag on the WiredTiger configuration string.

      Additionally, the implementation of SERVER-91195 has the following properties:

      • If the top-level timeseriesBucketsMayHaveMixedSchemaData catalog flag is true, it is still respected; i.e. the replica set assumes that the time series collection may contain mixed schema data.
      • The top-level timeseriesBucketsMayHaveMixedSchemaData catalog flag is still lost when the collection is cloned. The assumption is that this is not problematic because the WiredTiger configuration string will be copied.

      Taken together, those properties imply that it's possible for certain upgrade paths to result in a replica set which appears healthy from an user point of view (queries over time series collections return correct results, and collection validation returns no errors), which nevertheless can eventually get into a corrupt state if the collection is cloned. For example:

      • Start up a MongoDB v5.0 replica set
      • Create a time series collection with a mixed-schema bucket
      • Upgrade to a MongoDB ≤v6.0.16 binary, then upgrade FCV to 6.0
        • At this point the time series collection has top-level timeseriesBucketsMayHaveMixedSchemaData=true, but the flag is not set in the WiredTiger configuration string.
      • Upgrade to any version that has SERVER-91195 fixed, e.g. v7.0.15, including any FCV upgrades.
        • The flag is still not set in the WiredTiger configuration string.
      • At this point, the cluster appears healthy to the user:
        • Queries over time series collections return correct results.
        • db.coll.validate() does not report any error, just a warning.
      • Add a new secondary to the replica set (going through initial sync). At this point:
        • The primary has top-level timeseriesBucketsMayHaveMixedSchemaData=true
        • The secondary has top-level timeseriesBucketsMayHaveMixedSchemaData=false
        • Neither the primary nor the secondary has the flag set in the WiredTiger configuration string.
        • The secondary believes that there is no mixed schema buckets.
        • The secondary reports an error on db.coll.validate().
        • Queries that rely on the mixed schema flag (see SERVER-59505) may return incorrect results.

      A reproducer is attached.

      This ticket should propose a way to automatically set the mixed-schema flag to the WiredTiger configuration string (so that its value can't be lost anymore), or flag those collections for manual intervention via collMod as per the SERVER-91194 CA. Both options ensure that a replica set which appears healthy will remain healthy after the collection is cloned.

            Assignee:
            rhys.howell@mongodb.com Rhys Howell
            Reporter:
            backlog-server-pm Backlog - Core Eng Program Management Team
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: