Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-95763

checkMetadataConsistency can trigger tripwire 9089900 during collection metadata validation on chunk metadata inconsistency

    • Catalog and Routing
    • Fully Compatible
    • ALL
    • CAR Team 2024-10-28

      SERVER-90899 extended checkMetadataConsistency so that, on the collection metadata consistency check across shards, inconsistencies in the top level metadata object were detected. As part of the changes, the last stage of the aggregation used during the check was changed to a $facet, and a tripwire assertion was added to validate that the aggregation returns one result, which was assumed to be true given $facet's behavior.

      However, the method used to run the aggregation contains an exception handler for the exceptional case where the aggregation can not run because there's an inconsistency in the chunk metadata. The handler captures the exception and returns an empty result set, with the assumption that the caller will handle this as "no inconsistencies". However, the changes in SERVER-90899 do not take this into account.

      Thus, the tripwire assertion can be hit with the following steps:

      • Create a sharded cluster (1 mongos + 1 data shard + 1 config server is enough)
      • Create a sharded collection: sh.shardCollection("test.nochunks", {a:1})
      • Create a chunk metadata inconsistency: db.getSiblingDB("config").chunks.deleteMany({})
      • Restart the data shard in order to force it to refresh the collection metadata
      • Run the metadata consistency checker: db.getSiblingDB("admin").checkMetadataConsistency()

      This will cause the tripwire assertion to be reported in the data shard's log:

      {"t":{"$date":"2024-10-14T15:14:23.751+00:00"},"s":"I",  "c":"SH_REFR",  "id":4619903, "svc":"S", "ctx":"CatalogCache-0","msg":"Error refreshing cached collection","attr":{"namespace":"test.nochunks","durationMillis":2,"error":"ConflictingOperationInProgress: No chunks were found for the collection test.nochunks"}}
      {"t":{"$date":"2024-10-14T15:14:23.751+00:00"},"s":"I",  "c":"SH_REFR",  "id":4086500, "svc":"S", "ctx":"conn28","msg":"Collection refresh failed","attr":{"namespace":"test.nochunks","exception":"ConflictingOperationInProgress: No chunks were found for the collection test.nochunks"}}
      {"t":{"$date":"2024-10-14T15:14:23.752+00:00"},"s":"I",  "c":"SHARDING", "id":8739100, "svc":"S", "ctx":"conn28","msg":"Failed to refresh the routing information due to a potential metadata inconsistency","attr":{"namespace":"test.nochunks","error":"ConflictingOperationInProgress: No chunks were found for the collection test.nochunks"}}
      {"t":{"$date":"2024-10-14T15:14:23.752+00:00"},"s":"E",  "c":"ASSERT",   "id":4457000, "svc":"S", "ctx":"conn28","msg":"Tripwire assertion","attr":{"error":{"code":9089900,"codeName":"Location9089900","errmsg":"Expected collection metadata consistency check aggregation to return one document"},"location":"{fileName:\"src/mongo/db/s/metadata_consistency_util.cpp\", line:819, functionName:\"checkCollectionMetadataConsistencyAcrossShards\"}"}}
      

       

      Fix the handling on checkCollectionMetadataConsistencyAcrossShards to gracefully handle this case, and add test coverage for this situation.

       

      Issue discovered by pol.pinol@mongodb.com.

            Assignee:
            joan.mico@mongodb.com Joan Bruguera Micó (Inactive)
            Reporter:
            joan.mico@mongodb.com Joan Bruguera Micó (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: