Layered table collection scan transiently skips records that are accessible via point lookup

XMLWordPrintableJSON

    • Storage Engines - Transactions
    • 941.164
    • SE Transactions - 2026-04-10, SE Transactions - 2026-04-24
    • 5

      On the disagg secondary, a forward collection scan over a layered table (ingest + stable) transiently skips a record (rid=1) while a point lookup for the same record at the same snapshot timestamp succeeds.

      Observed behavior:

      • dbHash command does a collection scan and returns 9 documents on the secondary (rid=1 missing), 10 on the primary
      • find with sort: {_id: 1} (which goes through the _id index + point fetch by RecordId) returns all 10 documents on both nodes
      • Both operations run at the same atClusterTime snapshot on the same node
      • The collection always has exactly 10 documents (inserted once, only updated). 53 previous dbHash scans during the same test returned 10 on both nodes. The skip is transient.
      • When rid=1 IS returned, its BSON bytes match between primary and secondary

      Primary collection scan (10 documents):

      [Disagg:j5:prim] {"t":{"$date":"2026-04-08T05:33:10.613+00:00"},"s":"I","c":"COMMAND","id":9999901,"ctx":"conn421","msg":"dbHash record","attr":{"ns":"test1_fsmdb0.fsmcoll0","idx":0,"rid":1,"bsonSize":10075,"docHash":"c3b5842c569553d812e8b59171f7e3dd","outOfOrder":false}}
      [Disagg:j5:prim] {"t":{"$date":"2026-04-08T05:33:10.613+00:00"},"s":"I","c":"COMMAND","id":9999901,"ctx":"conn421","msg":"dbHash record","attr":{"ns":"test1_fsmdb0.fsmcoll0","idx":1,"rid":2,"bsonSize":9539,"docHash":"1cf2d97abdb76102ef4c47ffc4cff691","outOfOrder":false}}
      [Disagg:j5:prim] {"t":{"$date":"2026-04-08T05:33:10.613+00:00"},"s":"I","c":"COMMAND","id":9999901,"ctx":"conn421","msg":"dbHash record","attr":{"ns":"test1_fsmdb0.fsmcoll0","idx":2,"rid":3,"bsonSize":9539,"docHash":"9a8423a94bd3b7d290ee21e88d61058e","outOfOrder":false}}
      [Disagg:j5:prim] {"t":{"$date":"2026-04-08T05:33:10.614+00:00"},"s":"I","c":"COMMAND","id":9999901,"ctx":"conn421","msg":"dbHash record","attr":{"ns":"test1_fsmdb0.fsmcoll0","idx":3,"rid":4,"bsonSize":10209,"docHash":"36f8a92c44b98c12f837a3c3a110ea75","outOfOrder":false}}
      [Disagg:j5:prim] {"t":{"$date":"2026-04-08T05:33:10.614+00:00"},"s":"I","c":"COMMAND","id":9999901,"ctx":"conn421","msg":"dbHash record","attr":{"ns":"test1_fsmdb0.fsmcoll0","idx":4,"rid":5,"bsonSize":8601,"docHash":"5779546fba1187039ecaf6e2569c88d6","outOfOrder":false}}
      [Disagg:j5:prim] {"t":{"$date":"2026-04-08T05:33:10.614+00:00"},"s":"I","c":"COMMAND","id":9999901,"ctx":"conn421","msg":"dbHash record","attr":{"ns":"test1_fsmdb0.fsmcoll0","idx":5,"rid":6,"bsonSize":9003,"docHash":"d2380a0bffecfa7deb463452c0f37c11","outOfOrder":false}}
      [Disagg:j5:prim] {"t":{"$date":"2026-04-08T05:33:10.614+00:00"},"s":"I","c":"COMMAND","id":9999901,"ctx":"conn421","msg":"dbHash record","attr":{"ns":"test1_fsmdb0.fsmcoll0","idx":6,"rid":7,"bsonSize":9271,"docHash":"d7a23686aa8a0fc35e227f58845e7b46","outOfOrder":false}}
      [Disagg:j5:prim] {"t":{"$date":"2026-04-08T05:33:10.615+00:00"},"s":"I","c":"COMMAND","id":9999901,"ctx":"conn421","msg":"dbHash record","attr":{"ns":"test1_fsmdb0.fsmcoll0","idx":7,"rid":8,"bsonSize":10075,"docHash":"6a1036c5ecc89b60495349fbecae0072","outOfOrder":false}}
      [Disagg:j5:prim] {"t":{"$date":"2026-04-08T05:33:10.615+00:00"},"s":"I","c":"COMMAND","id":9999901,"ctx":"conn421","msg":"dbHash record","attr":{"ns":"test1_fsmdb0.fsmcoll0","idx":8,"rid":9,"bsonSize":9673,"docHash":"375eb2abf1bd4d97ef1af75bf2ef56a6","outOfOrder":false}}
      [Disagg:j5:prim] {"t":{"$date":"2026-04-08T05:33:10.615+00:00"},"s":"I","c":"COMMAND","id":9999901,"ctx":"conn421","msg":"dbHash record","attr":{"ns":"test1_fsmdb0.fsmcoll0","idx":9,"rid":10,"bsonSize":9539,"docHash":"bbe0c493c5efe5c635a47b186ba252bd","outOfOrder":false}}
      [Disagg:j5:prim] {"t":{"$date":"2026-04-08T05:33:10.615+00:00"},"s":"I","c":"COMMAND","id":9999902,"ctx":"conn421","msg":"dbHash collection scan complete","attr":{"ns":"test1_fsmdb0.fsmcoll0","totalDocs":10}}
      

      Secondary collection scan (9 documents, rid=1 missing):

      [Disagg:j5:sec] {"t":{"$date":"2026-04-08T05:33:10.806+00:00"},"s":"I","c":"COMMAND","id":9999901,"ctx":"conn124","msg":"dbHash record","attr":{"ns":"test1_fsmdb0.fsmcoll0","idx":0,"rid":2,"bsonSize":9539,"docHash":"1cf2d97abdb76102ef4c47ffc4cff691","outOfOrder":false}}
      [Disagg:j5:sec] {"t":{"$date":"2026-04-08T05:33:10.806+00:00"},"s":"I","c":"COMMAND","id":9999901,"ctx":"conn124","msg":"dbHash record","attr":{"ns":"test1_fsmdb0.fsmcoll0","idx":1,"rid":3,"bsonSize":9539,"docHash":"9a8423a94bd3b7d290ee21e88d61058e","outOfOrder":false}}
      [Disagg:j5:sec] {"t":{"$date":"2026-04-08T05:33:10.806+00:00"},"s":"I","c":"COMMAND","id":9999901,"ctx":"conn124","msg":"dbHash record","attr":{"ns":"test1_fsmdb0.fsmcoll0","idx":2,"rid":4,"bsonSize":10209,"docHash":"36f8a92c44b98c12f837a3c3a110ea75","outOfOrder":false}}
      [Disagg:j5:sec] {"t":{"$date":"2026-04-08T05:33:10.807+00:00"},"s":"I","c":"COMMAND","id":9999901,"ctx":"conn124","msg":"dbHash record","attr":{"ns":"test1_fsmdb0.fsmcoll0","idx":3,"rid":5,"bsonSize":8601,"docHash":"5779546fba1187039ecaf6e2569c88d6","outOfOrder":false}}
      [Disagg:j5:sec] {"t":{"$date":"2026-04-08T05:33:10.807+00:00"},"s":"I","c":"COMMAND","id":9999901,"ctx":"conn124","msg":"dbHash record","attr":{"ns":"test1_fsmdb0.fsmcoll0","idx":4,"rid":6,"bsonSize":9003,"docHash":"d2380a0bffecfa7deb463452c0f37c11","outOfOrder":false}}
      [Disagg:j5:sec] {"t":{"$date":"2026-04-08T05:33:10.807+00:00"},"s":"I","c":"COMMAND","id":9999901,"ctx":"conn124","msg":"dbHash record","attr":{"ns":"test1_fsmdb0.fsmcoll0","idx":5,"rid":7,"bsonSize":9271,"docHash":"d7a23686aa8a0fc35e227f58845e7b46","outOfOrder":false}}
      [Disagg:j5:sec] {"t":{"$date":"2026-04-08T05:33:10.807+00:00"},"s":"I","c":"COMMAND","id":9999901,"ctx":"conn124","msg":"dbHash record","attr":{"ns":"test1_fsmdb0.fsmcoll0","idx":6,"rid":8,"bsonSize":10075,"docHash":"6a1036c5ecc89b60495349fbecae0072","outOfOrder":false}}
      [Disagg:j5:sec] {"t":{"$date":"2026-04-08T05:33:10.808+00:00"},"s":"I","c":"COMMAND","id":9999901,"ctx":"conn124","msg":"dbHash record","attr":{"ns":"test1_fsmdb0.fsmcoll0","idx":7,"rid":9,"bsonSize":9673,"docHash":"375eb2abf1bd4d97ef1af75bf2ef56a6","outOfOrder":false}}
      [Disagg:j5:sec] {"t":{"$date":"2026-04-08T05:33:10.833+00:00"},"s":"I","c":"COMMAND","id":9999901,"ctx":"conn124","msg":"dbHash record","attr":{"ns":"test1_fsmdb0.fsmcoll0","idx":8,"rid":10,"bsonSize":9539,"docHash":"bbe0c493c5efe5c635a47b186ba252bd","outOfOrder":false}}
      [Disagg:j5:sec] {"t":{"$date":"2026-04-08T05:33:10.833+00:00"},"s":"I","c":"COMMAND","id":9999902,"ctx":"conn124","msg":"dbHash collection scan complete","attr":{"ns":"test1_fsmdb0.fsmcoll0","totalDocs":9}}
      

       

      Reproducer:

      Suite: disagg_concurrency_replication_multi_stmt_txn
      Test: multi_statement_transaction_atomicity_isolation_metrics_test.js
      Reproduces ~2/10 runs
      Example failure (with instrumented dbhash.cpp logging): https://spruce.corp.mongodb.com/task/mongodb_mongo_master_enterprise_amazon_linux2023_arm64_all_feature_flags_extra_system_deps_disagg_concurrency_replication_multi_stmt_txn_3_linux_enterprise_patch_acfb88842b0de999f099e5b8fba4f965882021d3_69d5e13ca7a14400073ddb81_26_04_08_05_01_59/tests?execution=0&sorts=STATUS%3AASC

       

      Why the dbHash hook reports "No documents are missing":
      The dbHash command with includeReplicatedRecordIds=1 does a collection scan (forward iteration over the layered table in RecordId order via __clayered_iterate). The subsequent find comparison in getCollectionDiffUsingSessions uses sort: {_id: 1}, which goes through the _id index and does point fetches by RecordId (via __clayered_search → __clayered_lookup). The point lookup path independently searches both ingest and stable layers, so it finds the record. The forward iteration merge logic does not.

            Assignee:
            Chenhao Qu
            Reporter:
            Gregory Wlodarek
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: