[SERVER-17362] ClientCursor::staticYield can't unlock b/c of recursive lock ns Created: 24/Feb/15 Updated: 18/Jun/15 Resolved: 18/Jun/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Querying |
| Affects Version/s: | 2.6.6 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Ian Rice | Assignee: | Ramon Fernandez Marina |
| Resolution: | Incomplete | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
SSD drives |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Operating System: | Linux | ||||||||
| Steps To Reproduce: | it's intermittent and I can't replicate it. It's not even specific to times of heavy load. |
||||||||
| Participants: | |||||||||
| Description |
|
Intermittently my logs fill with "ClientCursor::staticYield can't unlock b/c of recursive lock ns ..."[millions of entries] . It happens on a lot of different collections but always on a findAndModify. When it's at it's worse - the lock lasts 6-7 seconds but the entire database is unresponsive during this time. |
| Comments |
| Comment by Ramon Fernandez Marina [ 29/May/15 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
ricei, I believe you may be running into the same issue as described in To be sure that your issue has the same root cause as that of
You'll note that these commands increase the log level, so I'd also like to see the logs generated while this commands are run to better diagnose this issue. Thanks, | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 21/May/15 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thanks for the additional information ricei. The query plan cache is our main suspect indeed, the harder part is to trigger the exact behavior with a suitable dataset. Will try with unindexed queries as you suggest and report back with my findings. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ian Rice [ 21/May/15 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
As an update to this - I discovered that the issue was triggered by a previous unindexed query. That is - a query is run on the collection that does not use an index but rather a scan and subsequently the findAndModify which should use an index instead uses a scan. It seems the findAndModify has a race to select which method to use and the results of the previous query influence the outcome of the race resulting in the incorrect method being selected | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 21/May/15 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
ricei, apologies for the long delay in getting back to you. While I have not been able to reproduce this behavior locally, I wanted to let you know that this behavior has been removed in the v3.0 version and now findAndModify() queries can yield. We'll continue to investigate why findAndModify() seems to ignore available indexes to satisfy the query, but if upgrading to 3.0.3 was an option for you it would be helpful for us if you could let us know how does 3.0.3 behave compared to 2.6.9. Thanks, | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Gui Forget [ 29/Mar/15 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
We are on 2.6.9 and seeing a very similar issue. It's causing write lock contention and affecting the performance of our app.
I have attached the outputs of the commands requested earlier | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ian Rice [ 25/Feb/15 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
some more logs
It seems to be an issue with the findandmodify choosing to ignore the available index. After each of the long runs of "recursive lock" there is a statement that the planSummary is IXSCAN and nscanned:2167279 nscannedObjects:2167279
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ian Rice [ 24/Feb/15 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
As some additional background. Some of the collections have exists for ~ 4+ years. Mongodb has been upgraded through each of the major releases. Additionally in January the storage was swapped over to SSD. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 24/Feb/15 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
ricei, we're going to need more information to understand what's going on. My hunch is that, even if your findAndModify() queries are using an index, they may be using a sub-optimal index in some situations. In addition from server logs around the time when this happens (spanning at least a few seconds before and after), can you please send us the output of the following commands?
|