[SERVER-20322] Wiredtiger develop can lose records following stop even with log enabled Created: 04/Sep/15  Updated: 28/Feb/16  Resolved: 10/Sep/15

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: None
Fix Version/s: 3.0.7, 3.1.8

Type: Bug Priority: Major - P3
Reporter: David Hows Assignee: Alexander Gorrod
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Command to Reproduce

./buildscripts/resmoke.py jstests/auth/auth_helpers.js --wiredTigerCollectionConfigString="type=lsm" --wiredTigerIndexConfigString="type=lsm" --wiredTigerEngineConfigString="verbose=[log]


Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Completed:
Participants:

 Description   

I was testing a merge of WT develop into MongoDB master. This caused the jstest auth/auth_helpers.js to fail.

The failure here was due to a user being inserted into admin.system.users not being found following a restart of MongoDB. The user could be found initially after insertion but cannot be found following the restart.

The underlying issue presented on with LSM in this test case, since creating an LSM table involves creating the LSM tree and a separate LSM chunk (the first chunk). The table create needs to come before the chunk create in the log. When creating an LSM table as part of a larger transaction, the chunk can be created before the table in the log. Which leads to recovery not seeing the first chunk in the LSM tree.

This issue could also affect non-LSM WiredTiger tables, but is more difficult to trigger. It would require a system crash during an index create call. Could be the cause of SERVER-19274.



 Comments   
Comment by Alexander Gorrod [ 10/Sep/15 ]

Diagnostic information from david.hows when originally diagnosing the issue:

I've dived into this and found that adding a one second sleep before the restart of MongoDB on line 13 of the test (auth/auth_helpers.js) will stop the test from failing.

Diving into extra diagnostics with with MongoDB found that the following line is seen before the stop of MongoDB in cases where the issue does not occur.

[js_test:auth_helpers] 2015-09-04T12:59:36.258+1000  m27000| 2015-09-04T12:59:36.258+1000 I STORAGE  [thread1] WiredTiger [1441335576:258691][19039:0x7f632e5b8700], log-server: log_archive: archive to log number 1

The stop in question is from a SIGTERM, which is a graceful stop of MongoDB

Comment by Githook User [ 10/Sep/15 ]

Author:

{u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexg@wiredtiger.com'}

Message: SERVER-20322 Fixup lint from last push.
Branch: v3.0
https://github.com/mongodb/mongo/commit/292aafc2c05b7075353f8c49f205361958188ab0

Comment by Githook User [ 10/Sep/15 ]

Author:

{u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexg@wiredtiger.com'}

Message: SERVER-20322 Use a separate session when creating new indexes in WiredTiger.
Branch: v3.0
https://github.com/mongodb/mongo/commit/300c3dbc97781c452dfb1bf403c888d6def03359

Comment by Githook User [ 10/Sep/15 ]

Author:

{u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexg@wiredtiger.com'}

Message: SERVER-20322 Fixup lint from last push.
Branch: master
https://github.com/mongodb/mongo/commit/9b4250dc957f6cf75435606dccfd16f617b9cd33

Comment by Alexander Gorrod [ 10/Sep/15 ]

Code review: https://mongodbcr.appspot.com/15990001/

Comment by Githook User [ 09/Sep/15 ]

Author:

{u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexg@wiredtiger.com'}

Message: SERVER-20322 Use a separate session when creating new indexes in WiredTiger.
Branch: master
https://github.com/mongodb/mongo/commit/278338486d13816a0e2d5dbe3b9f2d8ab2df80cb

Comment by Alexander Gorrod [ 09/Sep/15 ]

The root cause is that MongoDB is creating indexes using transactions that are also being used for other operations.

In WiredTiger table create calls should be auto-commit. There is a semantic internally where we checkpoint the metadata when committing a transaction used for a create - and that checkpoint makes the create durable.

What was happening in this case was that the metadata from the create ends up being written after the first LSM file has been created, which meant that the LSM metadata gets written as having 0 chunks, when there is actually 1 chunk.

The solution is to use a separate WiredTiger session when creating indexes in MongoDB. I'll create a code review.

Comment by Alexander Gorrod [ 09/Sep/15 ]

I've moved this to a server ticket, since the behavior is due to how MongoDB is creating indexes in WiredTiger.

Comment by Alexander Gorrod [ 08/Sep/15 ]

I can reproduce this. It doesn't appear to be directly related to the commit mentioned above. I'll dig deeper tomorrow.

Comment by David Hows [ 07/Sep/15 ]

Still broken with the inclusions in pull #2172.

Git bisect says https://github.com/wiredtiger/wiredtiger/commit/05015655982f6a5312559420e44452032186ba09 is the cause.

Comment by Alexander Gorrod [ 06/Sep/15 ]

david.hows Could you retest this with the latest develop please? This change may fix the issue:
https://github.com/wiredtiger/wiredtiger/pull/2172

Generated at Thu Feb 08 03:53:52 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.