[SERVER-21619] sys-perf: WT crash during core_workloads_WT execution Created: 22/Nov/15 Updated: 03/Mar/16 Resolved: 30/Nov/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | WiredTiger |
| Affects Version/s: | None |
| Fix Version/s: | 3.2.0-rc5 |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | Rui Zhang (Inactive) | Assignee: | Alexander Gorrod |
| Resolution: | Done | Votes: | 0 |
| Labels: | sys-perf-reg | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Description |
|
There were some random crashes of mongod during core_workloads_WT test in system-perf. observations:
Stack trace from the mongod log file,
decode
more details here and few other crashes:
|
| Comments |
| Comment by Githook User [ 27/Jan/16 ] | ||||||||||||||||||||||||||||||||||||||||
|
Author: {u'name': u'Ramon Fernandez', u'email': u'ramon@mongodb.com'}Message: Import wiredtiger-wiredtiger-2.7.0-505-g7fea169.tar.gz from wiredtiger branch mongodb-3.4 ref: 44463c5..7fea169
| ||||||||||||||||||||||||||||||||||||||||
| Comment by Githook User [ 11/Jan/16 ] | ||||||||||||||||||||||||||||||||||||||||
|
Author: {u'username': u'keithbostic', u'name': u'Keith Bostic', u'email': u'keith@wiredtiger.com'}Message: Coverity analysis defect 77699: Unchecked return value, problem Instead of calling __wt_evict_page_clean_update() when discarding pages, This allows __wt_evict_page_clean_update() to be static in evict_page.c, | ||||||||||||||||||||||||||||||||||||||||
| Comment by Githook User [ 30/Nov/15 ] | ||||||||||||||||||||||||||||||||||||||||
|
Author: {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}Message: Import wiredtiger-wiredtiger-mongodb-3.2-rc4-41-g8326df6.tar.gz from wiredtiger branch mongodb-3.2 ref: b65381f..8326df6 4c49948 | ||||||||||||||||||||||||||||||||||||||||
| Comment by Githook User [ 26/Nov/15 ] | ||||||||||||||||||||||||||||||||||||||||
|
Author: {u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexander.gorrod@mongodb.com'}Message: Merge pull request #2336 from wiredtiger/server-21619-dont-split-dead-tree
| ||||||||||||||||||||||||||||||||||||||||
| Comment by Githook User [ 26/Nov/15 ] | ||||||||||||||||||||||||||||||||||||||||
|
Author: {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}Message: | ||||||||||||||||||||||||||||||||||||||||
| Comment by Githook User [ 26/Nov/15 ] | ||||||||||||||||||||||||||||||||||||||||
|
Author: {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}Message: Switch a few boolean values from 0/1 to false/true. | ||||||||||||||||||||||||||||||||||||||||
| Comment by Githook User [ 26/Nov/15 ] | ||||||||||||||||||||||||||||||||||||||||
|
Author: {u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexg@wiredtiger.com'}Message: It leads to problems where eviction attempts to write back to a | ||||||||||||||||||||||||||||||||||||||||
| Comment by David Hows [ 24/Nov/15 ] | ||||||||||||||||||||||||||||||||||||||||
|
The following steering variables for the insert_ttl.js test seem to let me reproduce fairly reliably. Short version, more runs @ 16 threads and runtime down to 2 minutes from 3.
| ||||||||||||||||||||||||||||||||||||||||
| Comment by David Hows [ 24/Nov/15 ] | ||||||||||||||||||||||||||||||||||||||||
|
Background as to what we have found thus far. The segfault is because we are trying to evict a page from a tree that has already been closed. Following the drop during the start of the next test we see the error. It seems that we need to have the test running inserts and TTL deletes for long enough to get the table big enough, and then drop it to cause the fault. Will attach a modified version of the test shortly that should have better luck in reproduction. | ||||||||||||||||||||||||||||||||||||||||
| Comment by David Hows [ 24/Nov/15 ] | ||||||||||||||||||||||||||||||||||||||||
|
Okay, just brute force'd it myself on the tip of master and inside of GDB.
First quick and dirty finding is that the btree's block manager is currently 0, so thats the cause of the segfault. Question now is, why is there no block manager? More to follow. | ||||||||||||||||||||||||||||||||||||||||
| Comment by David Hows [ 24/Nov/15 ] | ||||||||||||||||||||||||||||||||||||||||
|
Was there any particular trick to reproducing or were you just running the perf test over and over? Can you share your method that finally got a repro going as well as the githash of the binary in question? I'd like to do some diving this end and don't want to spend my time re-inventing the wheel. Thanks! | ||||||||||||||||||||||||||||||||||||||||
| Comment by Michael Cahill (Inactive) [ 24/Nov/15 ] | ||||||||||||||||||||||||||||||||||||||||
|
daveh86, can you please take a look at this one: let me know if you can reproduce it and if so, what information you can gather? | ||||||||||||||||||||||||||||||||||||||||
| Comment by Michael Cahill (Inactive) [ 23/Nov/15 ] | ||||||||||||||||||||||||||||||||||||||||
|
rui.zhang, what we're seeing here is WiredTiger attempting to evict a dirty page and write a block into a data file, but WT's "block manager" structure is NULL. I haven't seen this elsewhere. The code at the crash site hasn't changed recently, but two things occur to me as possibilities:
Can you please post details about how to run the test? Even if it doesn't repro every time, we can leave it running in a loop and see what happens when it fails. | ||||||||||||||||||||||||||||||||||||||||
| Comment by Rui Zhang (Inactive) [ 22/Nov/15 ] | ||||||||||||||||||||||||||||||||||||||||
|
Feel free to take this if the details are enough. Will work on a simple/reliable repro and try to get the core file. | ||||||||||||||||||||||||||||||||||||||||
| Comment by Rui Zhang (Inactive) [ 22/Nov/15 ] | ||||||||||||||||||||||||||||||||||||||||
|
The range of SHA this may introduced, 3f598f1edc is the first show this issue here
|