[SERVER-16997] wiredtiger's "bytes currently in the cache" exceeds "maximum bytes configured" and eviction threads are deadlocked Created: 22/Jan/15 Updated: 18/Sep/15 Resolved: 29/Jan/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Storage |
| Affects Version/s: | None |
| Fix Version/s: | 3.0.0-rc8 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Michael O'Brien | Assignee: | Alexander Gorrod |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||
| Issue Links: |
|
||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Participants: | |||||||||||||||||
| Linked BF Score: | 0 | ||||||||||||||||
| Description |
|
not sure how to reproduce, it happened on dan's test server after running a heavy workload.
monogstat shows that the cache is totally full of dirty bytes:
|
| Comments |
| Comment by Daniel Pasette (Inactive) [ 29/Jan/15 ] | ||||||||||||
|
Resolved with https://github.com/wiredtiger/wiredtiger/pull/1616 | ||||||||||||
| Comment by Githook User [ 27/Jan/15 ] | ||||||||||||
|
Author: {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@wiredtiger.com'}Message: (cherry picked from commit bd4956d96e195e7a0072fd87e3793e4f442f92af) | ||||||||||||
| Comment by Githook User [ 26/Jan/15 ] | ||||||||||||
|
Author: {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@wiredtiger.com'}Message: | ||||||||||||
| Comment by Daniel Pasette (Inactive) [ 23/Jan/15 ] | ||||||||||||
|
Added a test which repros the issue quickly. Run mongod with a very small configured wiredTiger cache: Run capped_test.js: Watch insert throughput drop to zero using mongostat: This does not reproduce without the change to capped collections use of truncate:
| ||||||||||||
| Comment by Alexander Gorrod [ 23/Jan/15 ] | ||||||||||||
|
> I wonder whether this hang is due to the new code to handle WT_ROLLBACK rather than the memory accounting issue. This has been reproduced with code that doesn't contain those changes, so they aren't the cause. | ||||||||||||
| Comment by Daniel Pasette (Inactive) [ 23/Jan/15 ] | ||||||||||||
|
Attaching another view of stats gathered earlier this afternoon. I'm able to trigger this with just 101.5% cache utilization – doesn't need to be a very high amount over. | ||||||||||||
| Comment by Alexander Gorrod [ 23/Jan/15 ] | ||||||||||||
|
Stats graph | ||||||||||||
| Comment by Daniel Pasette (Inactive) [ 22/Jan/15 ] | ||||||||||||
|
To set up the workload that I'm running which triggers this:
| ||||||||||||
| Comment by Daniel Pasette (Inactive) [ 22/Jan/15 ] | ||||||||||||
|
Able to make this happen much faster by dropping wiredTigerCacheSizeGB to 1GB. | ||||||||||||
| Comment by Keith Bostic (Inactive) [ 22/Jan/15 ] | ||||||||||||
|
My guess is this isn't a deadlock. The bytes in the cache has gone wrong:
there aren't any pages to evict because they've all been evicted, and the system is just thrashing. | ||||||||||||
| Comment by Daniel Pasette (Inactive) [ 22/Jan/15 ] | ||||||||||||
|
Second occurrence attached as gdb2.txt | ||||||||||||
| Comment by Geert Bosch [ 22/Jan/15 ] | ||||||||||||
|
We have hundreds of threads waiting on the evict_waiter_cond while an evict_server is | ||||||||||||
| Comment by Daniel Pasette (Inactive) [ 22/Jan/15 ] | ||||||||||||
|
Attaching backtrace as gdb.txt | ||||||||||||
| Comment by Daniel Pasette (Inactive) [ 22/Jan/15 ] | ||||||||||||
|