Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-21275

Document not found due to WT commit visibility issue

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Critical - P2 Critical - P2
    • 3.0.8, 3.2.0-rc4
    • Affects Version/s: 3.2.0-rc2
    • Component/s: Querying
    • Labels:
      None
    • Fully Compatible
    • ALL
    • Repl C (11/20/15), QuInt D (12/14/15)

      Issue Status as of Dec 10, 2015

      ISSUE SUMMARY
      When using the WiredTiger storage engine, a race condition may prevent locally committed documents from being immediately visible to subsequent read operations. This bug may have an impact on both server and application operations. Unless exposed by a replication problem, it is not possible to determine if a system has been impacted by this bug without significant downtime.

      USER IMPACT
      Normally, after a write is committed by the storage engine, it is immediately visible to subsequent operations. A race condition in WiredTiger may prevent a write from becoming immediately visible to subsequent operations, which may result in various problems, primarily impacting replication:

      • User writes may not be immediately visible to subsequent read operations
      • Replica set members may diverge and contain different data
      • Replication thread(s) shut down server with error message “Fatal Assertion 16360”, due to duplicate _id values (a unique index violation)

      Deployments where a WiredTiger node is or was used as a source of data may be affected. This includes:

      • replica sets where the primary node is or was running WiredTiger
      • replica sets using chained replication where any node may sync from a WiredTiger node

      MMAPv1-only deployments are not affected by this issue. Mixed storage engine deployments are not affected when WiredTiger nodes never become primary, or when WiredTiger secondaries are not used as a source for chained replication.

      WORKAROUNDS
      There are no workarounds for this issue. All MongoDB 3.0 users running the WiredTiger storage engine should upgrade to MongoDB 3.0.8. A 3.0.8-rc0 release candidate containing the fix for this issue is available for download.

      Users experiencing the "Fatal Assertion 16360" error may restart the affected node to fix the issue, but this condition may recur so upgrading to 3.0.8 is strongly recommended.

      AFFECTED VERSIONS
      MongoDB 3.0.0 through 3.0.7 using the WiredTiger storage engine. MongoDB 3.2.0 is not affected by this issue.

      FIX VERSION
      The fix is included in the 3.0.8 production release.

      Original description

      A new test is being introduced into the FSM tests to check the dbHash of the DB (and collections) on all replica set nodes, during these phases of the workload (SERVER-21115):

      • Workload completed, before invoking teardown
      • Workload completed, after invoking teardown

      Before the dbHash is computed, cluster.awaitReplication() is invoked to ensure that all nodes in the replica set have caught up.

      During the development of this test it was noticed that infrequent failures would occur for workload remove_and_bulk_insert, for wiredTiger storage.

        1. data_db_dbhash_0212513.tar.gz
          71.11 MB
        2. data_db_dbhash_20151118161433.tar.gz
          49.40 MB
        3. dbhash-remove_and_bulk_insert.js
          5 kB
        4. dbhash-remove_and_bulk_insert.js
          4 kB
        5. dbhash-remove_and_bulk_insert.js
          4 kB
        6. dbhash-remove_and_bulk_insert.js
          4 kB
        7. dbhash-remove_and_bulk_insert-wiredTiger-20151118101103.log.gz
          12 kB
        8. dbhash-remove_and_bulk_insert-wiredTiger-20151118161433.log.gz
          466 kB
        9. rbi-96.log
          55 kB
        10. run_dbhash.sh
          3 kB
        11. run_dbhash.sh
          3 kB
        12. run_dbhash.sh
          1 kB

            Assignee:
            mathias@mongodb.com Mathias Stearn
            Reporter:
            jonathan.abrahams Jonathan Abrahams
            Votes:
            0 Vote for this issue
            Watchers:
            43 Start watching this issue

              Created:
              Updated:
              Resolved: