[SERVER-21275] Document not found due to WT commit visibility issue Created: 03/Nov/15  Updated: 20/Sep/17  Resolved: 01/Dec/15

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: 3.2.0-rc2
Fix Version/s: 3.0.8, 3.2.0-rc4

Type: Bug Priority: Critical - P2
Reporter: Jonathan Abrahams Assignee: Mathias Stearn
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File data_db_dbhash_0212513.tar.gz     File data_db_dbhash_20151118161433.tar.gz     File dbhash-remove_and_bulk_insert-wiredTiger-20151118101103.log.gz     File dbhash-remove_and_bulk_insert-wiredTiger-20151118161433.log.gz     File dbhash-remove_and_bulk_insert.js     File dbhash-remove_and_bulk_insert.js     File dbhash-remove_and_bulk_insert.js     File dbhash-remove_and_bulk_insert.js     Text File rbi-96.log     File run_dbhash.sh     File run_dbhash.sh     File run_dbhash.sh    
Issue Links:
Depends
is depended on by SERVER-21115 Add dbHash checking to concurrency suite Closed
Duplicate
duplicates WT-2237 Make committed changes visible immedi... Closed
is duplicated by SERVER-21778 slave node crash: writer worker caugh... Closed
Related
related to SERVER-21847 log range of operations read from syn... Closed
is related to SERVER-21237 ReplSetTest.prototype.awaitReplicatio... Closed
is related to SERVER-21645 WiredTigerRecordStore::temp_cappedTru... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Completed:
Sprint: Repl C (11/20/15), QuInt D (12/14/15)
Participants:
Case:

 Description   
Issue Status as of Dec 10, 2015

ISSUE SUMMARY
When using the WiredTiger storage engine, a race condition may prevent locally committed documents from being immediately visible to subsequent read operations. This bug may have an impact on both server and application operations. Unless exposed by a replication problem, it is not possible to determine if a system has been impacted by this bug without significant downtime.

USER IMPACT
Normally, after a write is committed by the storage engine, it is immediately visible to subsequent operations. A race condition in WiredTiger may prevent a write from becoming immediately visible to subsequent operations, which may result in various problems, primarily impacting replication:

  • User writes may not be immediately visible to subsequent read operations
  • Replica set members may diverge and contain different data
  • Replication thread(s) shut down server with error message “Fatal Assertion 16360”, due to duplicate _id values (a unique index violation)

Deployments where a WiredTiger node is or was used as a source of data may be affected. This includes:

  • replica sets where the primary node is or was running WiredTiger
  • replica sets using chained replication where any node may sync from a WiredTiger node

MMAPv1-only deployments are not affected by this issue. Mixed storage engine deployments are not affected when WiredTiger nodes never become primary, or when WiredTiger secondaries are not used as a source for chained replication.

WORKAROUNDS
There are no workarounds for this issue. All MongoDB 3.0 users running the WiredTiger storage engine should upgrade to MongoDB 3.0.8. A 3.0.8-rc0 release candidate containing the fix for this issue is available for download.

Users experiencing the "Fatal Assertion 16360" error may restart the affected node to fix the issue, but this condition may recur so upgrading to 3.0.8 is strongly recommended.

AFFECTED VERSIONS
MongoDB 3.0.0 through 3.0.7 using the WiredTiger storage engine. MongoDB 3.2.0 is not affected by this issue.

FIX VERSION
The fix is included in the 3.0.8 production release.

Original description

A new test is being introduced into the FSM tests to check the dbHash of the DB (and collections) on all replica set nodes, during these phases of the workload (SERVER-21115):

  • Workload completed, before invoking teardown
  • Workload completed, after invoking teardown

Before the dbHash is computed, cluster.awaitReplication() is invoked to ensure that all nodes in the replica set have caught up.

During the development of this test it was noticed that infrequent failures would occur for workload remove_and_bulk_insert, for wiredTiger storage.



 Comments   
Comment by ITWEBTF SAXOBANK [ 15/Dec/15 ]

Hi Dan,

The tests we run will

  • Ensure (Create) indexes on all collections once.
  • Run back-end code from the TradingFloor.com web site + two back-end consoles that do long-running tasks and process queue items.
  • Each test will delete all documents in all collections, then run code that will write/read to the database, then read and assert on the outcome.
  • Drop empty collections after running all tests.

There are around 500 tests that run local developer machines (Intel i7 CPU, 16 GB RAM and at least 250 GB SSD) non-clustered.

The particular test I saw failing,

  • Inserted a document (around 800 lines of Jason) with one sub-document in array field.
  • Added another sub-document to the array field.

The failure I saw was that the added sub-document was not there.

I hope you can use this info. Feel free to copy it to another issue if that provides better tracking. If you do so, I can add the actual documents.

Best regards,
Brian

Comment by Daniel Pasette (Inactive) [ 14/Dec/15 ]

Hi itwebtf@saxobank.com, we do run the same tests on MMAPv1 that we do on WT. This particular issue is limited to the WiredTiger storage engine. I would be very interested in understanding your test case, but it would be a separate issue. If you can describe your cluster details and workload setup, please do so. It would be best to get the details in a new SERVER issue though.

Comment by ITWEBTF SAXOBANK [ 14/Dec/15 ]

Today I saw a very similar issue on mmapv1 on MongoDb 3.2, a local non-clustered installation. I have tests that all write to the database, then assert on the result. I have only once seen that the value read was the value before the last write operation.

I cannot repro, and I cannot prove that this is not a problem in our code, but the code and the tests have run unchanged for more than a year on a 2.4.9 database.

I suppose that whatever tests you run on WiredTiger will be run on mmapv1 as well, and then you will eventually see this issue.

Comment by Daniel Pasette (Inactive) [ 24/Nov/15 ]

Resolved as duplicate of WT-2237

Comment by Benety Goh [ 18/Nov/15 ]

logs associated with data_db_dbhash_0212513.tar.gz

Comment by Benety Goh [ 18/Nov/15 ]

attached /data/db contents from failed run_dbhash.sh run under linux (git hash 0212513)

Comment by Jonathan Abrahams [ 05/Nov/15 ]

Attached is a script which can reproduce the issue. Caution, it happens infrequently, so the script has to be run multiple times. Also attached is the log from the failed run.

It was run in this manner:

#!/bin/bash
# cd to root of mongo repo
pushd ~/mongo
f=/tmp/dbhash-remove_and_bulk_insert.log
for i in $(seq 1 100)
do
   mongo --nodb ~/dbhash-remove_and_bulk_insert.js | tee $f
   status=${PIPESTATUS[0]}
   if [ $status -ne 0 ]; then
      echo "Failure detected!"
      break
   else
      rm $f-$i.log
   fi
done
popd

Generated at Thu Feb 08 03:56:51 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.