[SERVER-26001] Insert workload stalled at 96% cache utilization Created: 07/Sep/16 Updated: 26/Oct/16 Resolved: 17/Oct/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | WiredTiger |
| Affects Version/s: | 3.2.9 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | Bruce Lucas (Inactive) | Assignee: | David Hows |
| Resolution: | Duplicate | Votes: | 2 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Sprint: | Storage 2016-10-31 | ||||||||
| Participants: | |||||||||
| Description |
|
Note: this may be the same underlying issue as The insert repro workload from
I've also attached stack traces captured during the stuck period, although I don't think they give much insight. |
| Comments |
| Comment by David Hows [ 09/Sep/16 ] |
Correct
Correct
Correct |
| Comment by Alexander Gorrod [ 08/Sep/16 ] |
|
Thanks david.hows. My interpretation of the graph you uploaded is that the behavior is better with the more recent WiredTiger code. It appears to me as though loading the first table finishes at around 3:35, where there is a small blip in insert operations. Loading the second table finishes at around 3:53. There is no evidence of dropouts during the run. Wheras when run without the most recent version of WiredTiger the test was reliably failing to finish the load phase. Does that match your reading of the timeseries? |
| Comment by David Hows [ 08/Sep/16 ] |
|
I've attached a screenshot of a timeseries of a run combining MongoDB 3.2.9 and WT Develop.
|
| Comment by Michael Cahill (Inactive) [ 08/Sep/16 ] |
|
A little more detail here, I think what we're seeing here with 3.2.9 is a special case of what alexander.gorrod saw in In particular, this is caused by a bug in the selection of eviction candidates where the eviction server will normally not retry eviction of a page if the transaction state hasn't changed since last time it tried. This led to a busy wait where all active transactions were blocked waiting for eviction, and eviction was blocked waiting for something to change in the transaction state. This issue is resolved in WiredTiger's develop branch, and we plan to include those changes into 3.2.10. |
| Comment by David Hows [ 08/Sep/16 ] |
|
I've run this and it reproduced readily for me on 3.2.9 pretty readily; during the first load stage (ping) I saw bulk inserts taking over 60 seconds to run rather than the normal few hundred milliseconds. This occurred at the 9.1GB inserted mark, which is also pretty indicative. I've also made 3 full runs of this on v3.2 with the tip of WT develop, these did not reproduce the delays at all so I believe the issue is resolved within WT already. |
| Comment by Bruce Lucas (Inactive) [ 07/Sep/16 ] |
|
That repro doesn't seem to be completely consistent. I've run it a few times with a few variations on size and number of collections, and it only got completely stuck like that twice. The other times there would be significant stalls of maybe a minute or so, but it recovered and finished the run. |