[SERVER-20322] Wiredtiger develop can lose records following stop even with log enabled Created: 04/Sep/15 Updated: 28/Feb/16 Resolved: 10/Sep/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | WiredTiger |
| Affects Version/s: | None |
| Fix Version/s: | 3.0.7, 3.1.8 |
| Type: | Bug | Priority: | Major - P3 | |
| Reporter: | David Hows | Assignee: | Alexander Gorrod | |
| Resolution: | Done | Votes: | 0 | |
| Labels: | None | |||
| Remaining Estimate: | Not Specified | |||
| Time Spent: | Not Specified | |||
| Original Estimate: | Not Specified | |||
| Environment: |
Command to Reproduce
|
|||
| Backwards Compatibility: | Fully Compatible |
| Operating System: | ALL |
| Backport Completed: | |
| Participants: |
| Description |
|
I was testing a merge of WT develop into MongoDB master. This caused the jstest auth/auth_helpers.js to fail. The failure here was due to a user being inserted into admin.system.users not being found following a restart of MongoDB. The user could be found initially after insertion but cannot be found following the restart. The underlying issue presented on with LSM in this test case, since creating an LSM table involves creating the LSM tree and a separate LSM chunk (the first chunk). The table create needs to come before the chunk create in the log. When creating an LSM table as part of a larger transaction, the chunk can be created before the table in the log. Which leads to recovery not seeing the first chunk in the LSM tree. This issue could also affect non-LSM WiredTiger tables, but is more difficult to trigger. It would require a system crash during an index create call. Could be the cause of |
| Comments |
| Comment by Alexander Gorrod [ 10/Sep/15 ] | |
|
Diagnostic information from david.hows when originally diagnosing the issue: I've dived into this and found that adding a one second sleep before the restart of MongoDB on line 13 of the test (auth/auth_helpers.js) will stop the test from failing. Diving into extra diagnostics with with MongoDB found that the following line is seen before the stop of MongoDB in cases where the issue does not occur.
The stop in question is from a SIGTERM, which is a graceful stop of MongoDB | |
| Comment by Githook User [ 10/Sep/15 ] | |
|
Author: {u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexg@wiredtiger.com'}Message: | |
| Comment by Githook User [ 10/Sep/15 ] | |
|
Author: {u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexg@wiredtiger.com'}Message: | |
| Comment by Githook User [ 10/Sep/15 ] | |
|
Author: {u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexg@wiredtiger.com'}Message: | |
| Comment by Alexander Gorrod [ 10/Sep/15 ] | |
|
Code review: https://mongodbcr.appspot.com/15990001/ | |
| Comment by Githook User [ 09/Sep/15 ] | |
|
Author: {u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexg@wiredtiger.com'}Message: | |
| Comment by Alexander Gorrod [ 09/Sep/15 ] | |
|
The root cause is that MongoDB is creating indexes using transactions that are also being used for other operations. In WiredTiger table create calls should be auto-commit. There is a semantic internally where we checkpoint the metadata when committing a transaction used for a create - and that checkpoint makes the create durable. What was happening in this case was that the metadata from the create ends up being written after the first LSM file has been created, which meant that the LSM metadata gets written as having 0 chunks, when there is actually 1 chunk. The solution is to use a separate WiredTiger session when creating indexes in MongoDB. I'll create a code review. | |
| Comment by Alexander Gorrod [ 09/Sep/15 ] | |
|
I've moved this to a server ticket, since the behavior is due to how MongoDB is creating indexes in WiredTiger. | |
| Comment by Alexander Gorrod [ 08/Sep/15 ] | |
|
I can reproduce this. It doesn't appear to be directly related to the commit mentioned above. I'll dig deeper tomorrow. | |
| Comment by David Hows [ 07/Sep/15 ] | |
|
Still broken with the inclusions in pull #2172. Git bisect says https://github.com/wiredtiger/wiredtiger/commit/05015655982f6a5312559420e44452032186ba09 is the cause. | |
| Comment by Alexander Gorrod [ 06/Sep/15 ] | |
|
david.hows Could you retest this with the latest develop please? This change may fix the issue: |