Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Done
Priority: Critical - P2
Fix Version/s: WT2.9.0, 3.2.8, 3.3.9
Affects Version/s: WT2.8.0
Component/s: None
Labels:
None

Total Hours with Assigned Team:
88,817.035
Sprint:
None
Story Points:
None

Issue Status as of Jul 06, 2016

ISSUE SUMMARY
Under extremely rare circumstances, a race condition in the code that updates large records may cause some of those updates to be lost during an unclean shutdown.

On a production system, the path with the race condition is only taken when log records are 128k or larger. From MongoDB's perspective, it is a smaller record size, maybe 40k, since an individual WT log record contains the insert into collections, indexes, oplog, etc.

Attempts to trigger this race condtion with MongoDB using a synthetic workload with compression disabled have produced mixed results. However, attempts to reproduce this issue in MongoDB with default compression (snappy) have been unsuccessful.

This issue only affects users running with journaling enabled. Users that run with journaling disabled can not be affected by this bug.

USER IMPACT
If the race condition is triggered and the node suffers an unclean shutdown, some updates to large records since the last checkpoint may be lost. Unfortunately it is not possible to detect if the race condition has been triggered.

AFFECTED VERSIONS
MongoDB 3.2 versions up to and including MongoDB 3.2.7.

REMEDIATION
A fix for this issue is included in the MongoDB 3.2.8 production release. Users with workloads that include updates to large records whose nodes may be subject to unclean shutdowns should upgrade to MongoDB 3.2.8 to avoid exposure to this issue.

WORKAROUNDS
Unfortunately there are no known workarounds for this issue.

Original description

Hi!

After re-building WiredTiger with diagnostic enabled one of out test started to fail.
The test checks ability of DB to recover after application crash.
Please see attached minimized test:

$ ./recovery-test-mp
5 writer threads spawned
killing child
checking DB...
no record with key 28363
no record with key 3689348814741930043
no record with key 3689348814741983775
no record with key 7378697629483839817
no record with key 7378697629483894622
no record with key 11068046444225735421
no record with key 14757395258967669044
no record with key 14757395258967726182
8 record(s) absent from total of 544769

I was unable to reproduce the problem without diagnostic enabled.

Thanks!

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

check.2696.js
0.6 kB
Jun 14 2016 07:58:51 PM UTC
insert.2696.js
0.3 kB
Jun 14 2016 07:58:51 PM UTC
recovery-test-mp.c
5 kB
Jun 08 2016 06:16:39 PM UTC
run2696.sh
3 kB
Jun 14 2016 07:58:51 PM UTC
runloop.sh
0.3 kB
Jun 14 2016 07:58:51 PM UTC

related to

WT-2184 lost records after crash

Closed

Assignee:: Susan LoVerso (Inactive)
Reporter:: Dmitri Shubin
Votes:: 0 Vote for this issue
Watchers:: 6 Start watching this issue

Created:: Jun 08 2016 06:16:39 PM UTC
Updated:: Jul 28 2016 04:11:43 PM UTC
Resolved:: Jun 14 2016 04:39:44 AM UTC

Details

Description

Original description

Attachments

Attachments

Issue Links

Activity

People

Dates