[SERVER-22964] IX GlobalLock being held while wating for wt cache eviction Created: 04/Mar/16 Updated: 19/Nov/16 Resolved: 30/Mar/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | WiredTiger |
| Affects Version/s: | 3.2.3 |
| Fix Version/s: | 3.0.12, 3.2.5, 3.3.4 |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | CenZheng | Assignee: | Michael Cahill (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | WTplaybook, code-and-test | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||
| Operating System: | Linux | ||||||||||||
| Backport Completed: | |||||||||||||
| Steps To Reproduce: | 1. setup a two node replSet using the attached mongod.conf(only specify storage related) |
||||||||||||
| Participants: | |||||||||||||
| Description |
|
hi, |
| Comments |
| Comment by CenZheng [ 11/Mar/16 ] |
|
Get it, Thank you Ramon~ |
| Comment by Ramon Fernandez Marina [ 10/Mar/16 ] |
|
Thanks for the additional information zhcn381, we believe we've identified the source of the problem and have produced a fix in
This large amount of create and drop operations is not very common in production cases, but not unheard of either. If this will be your production use case, until this fix makes it into a stable release note that using a larger WiredTiger cache will easily help you work around the issue. Regards, |
| Comment by CenZheng [ 10/Mar/16 ] |
|
Hi, Ramon The situation occurred again, I have uploaded the latest diagnostics data(metrics.2016-03-09T13-01-56Z-00000.tar.gz). You can have a look. Thanks~ |
| Comment by Ramon Fernandez Marina [ 08/Mar/16 ] |
|
Thanks for uploading zhcn381, I can see the diagnostics data and I'm taking a look. |
| Comment by CenZheng [ 08/Mar/16 ] |
|
hi Ramon, I have uploaded last time's diagnostic.data file, pls check! Thanks |
| Comment by Ramon Fernandez Marina [ 07/Mar/16 ] |
|
zhcn381, I have been running your repro script for almost 48h but my setup hasn't hung. We do have some ideas about what the issue could be, but if it is possible for you to upload the diagnostic.data folder this information would still be useful investigating this ticket. Thanks, |
| Comment by CenZheng [ 05/Mar/16 ] |
|
hi Ramon, |
| Comment by Ramon Fernandez Marina [ 04/Mar/16 ] |
|
zhcn381, can you please upload here the contents of the diagnostic.data directory from your dbpath and the full log file of the affected node? I'll try to reproduce this behavior on my end in the mean time, but since you say it takes two days it will be faster to look at the diagnostic data already captured by your system. Thanks, |