Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-21275

Document not found due to WT commit visibility issue

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Critical - P2 Critical - P2
    • 3.0.8, 3.2.0-rc4
    • Affects Version/s: 3.2.0-rc2
    • Component/s: Querying
    • Labels:
    • Fully Compatible
    • ALL
    • Repl C (11/20/15), QuInt D (12/14/15)

      Issue Status as of Dec 10, 2015

      When using the WiredTiger storage engine, a race condition may prevent locally committed documents from being immediately visible to subsequent read operations. This bug may have an impact on both server and application operations. Unless exposed by a replication problem, it is not possible to determine if a system has been impacted by this bug without significant downtime.

      Normally, after a write is committed by the storage engine, it is immediately visible to subsequent operations. A race condition in WiredTiger may prevent a write from becoming immediately visible to subsequent operations, which may result in various problems, primarily impacting replication:

      • User writes may not be immediately visible to subsequent read operations
      • Replica set members may diverge and contain different data
      • Replication thread(s) shut down server with error message “Fatal Assertion 16360”, due to duplicate _id values (a unique index violation)

      Deployments where a WiredTiger node is or was used as a source of data may be affected. This includes:

      • replica sets where the primary node is or was running WiredTiger
      • replica sets using chained replication where any node may sync from a WiredTiger node

      MMAPv1-only deployments are not affected by this issue. Mixed storage engine deployments are not affected when WiredTiger nodes never become primary, or when WiredTiger secondaries are not used as a source for chained replication.

      There are no workarounds for this issue. All MongoDB 3.0 users running the WiredTiger storage engine should upgrade to MongoDB 3.0.8. A 3.0.8-rc0 release candidate containing the fix for this issue is available for download.

      Users experiencing the "Fatal Assertion 16360" error may restart the affected node to fix the issue, but this condition may recur so upgrading to 3.0.8 is strongly recommended.

      MongoDB 3.0.0 through 3.0.7 using the WiredTiger storage engine. MongoDB 3.2.0 is not affected by this issue.

      The fix is included in the 3.0.8 production release.

      Original description

      A new test is being introduced into the FSM tests to check the dbHash of the DB (and collections) on all replica set nodes, during these phases of the workload (SERVER-21115):

      • Workload completed, before invoking teardown
      • Workload completed, after invoking teardown

      Before the dbHash is computed, cluster.awaitReplication() is invoked to ensure that all nodes in the replica set have caught up.

      During the development of this test it was noticed that infrequent failures would occur for workload remove_and_bulk_insert, for wiredTiger storage.

        1. data_db_dbhash_0212513.tar.gz
          71.11 MB
        2. data_db_dbhash_20151118161433.tar.gz
          49.40 MB
        3. dbhash-remove_and_bulk_insert.js
          5 kB
        4. dbhash-remove_and_bulk_insert.js
          4 kB
        5. dbhash-remove_and_bulk_insert.js
          4 kB
        6. dbhash-remove_and_bulk_insert.js
          4 kB
        7. dbhash-remove_and_bulk_insert-wiredTiger-20151118101103.log.gz
          12 kB
        8. dbhash-remove_and_bulk_insert-wiredTiger-20151118161433.log.gz
          466 kB
        9. rbi-96.log
          55 kB
        10. run_dbhash.sh
          3 kB
        11. run_dbhash.sh
          3 kB
        12. run_dbhash.sh
          1 kB

            mathias@mongodb.com Mathias Stearn
            jonathan.abrahams Jonathan Abrahams
            0 Vote for this issue
            43 Start watching this issue