[SERVER-23283] RangeDeleter does not log cursor ids correctly in deleteNow() Created: 22/Mar/16 Updated: 25/Jan/17 Resolved: 28/Mar/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Diagnostics, Sharding |
| Affects Version/s: | 3.0.9 |
| Fix Version/s: | 3.0.12, 3.2.5, 3.3.4 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Steffen | Assignee: | Randolph Tan |
| Resolution: | Done | Votes: | 0 |
| Labels: | code-only | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||
| Issue Links: |
|
||||
| Backwards Compatibility: | Fully Compatible | ||||
| Operating System: | ALL | ||||
| Backport Completed: | |||||
| Sprint: | Sharding 12 (04/01/16) | ||||
| Participants: | |||||
| Description |
|
RangeDeleter currently uses the logCursorsWaiting helper to log the list of cursors it waits. The helper method extracts this information from the cursorsToWait member of the RangeDeleteEntry. The problem is that the the blocking method (deleteNow) keeps track of the cursor independently and does update the cursors in RangeDeleteEntry, so it is not logged properly: https://github.com/mongodb/mongo/blob/r3.3.3/src/mongo/db/range_deleter.cpp#L320 Original title: Chunk range deleter gets stuck with empty id listOriginal description:
|
| Comments |
| Comment by Ramon Fernandez Marina [ 28/Mar/16 ] | |
|
steffen, the bug in the logging of cursor ids has been fixed and backported to 3.0.11 and 3.2.5. While there's no tentative release date for 3.0.11 at the time of this writing, a 3.2.5-rc0 release candidate is scheduled for early April. If this is a critical issue for you you may want to consider upgrading to 3.2.5. Regards, EDIT: this fix was moved from 3.0.11 to 3.0.12. | |
| Comment by Githook User [ 28/Mar/16 ] | |
|
Author: {u'username': u'renctan', u'name': u'Randolph Tan', u'email': u'randolph@10gen.com'}Message: (cherry picked from commit 26603490725d969247044de4f36f487972264023) | |
| Comment by Githook User [ 28/Mar/16 ] | |
|
Author: {u'username': u'renctan', u'name': u'Randolph Tan', u'email': u'randolph@10gen.com'}Message: (cherry picked from commit 26603490725d969247044de4f36f487972264023) | |
| Comment by Githook User [ 28/Mar/16 ] | |
|
Author: {u'username': u'renctan', u'name': u'Randolph Tan', u'email': u'randolph@10gen.com'}Message: | |
| Comment by Randolph Tan [ 22/Mar/16 ] | |
|
Hi, I have found the problem. The chunk cleanup is waiting for some open cursors to go away, but is not logging it correctly. I have updated the description to reflect the issue. There is currently no good way to get out of this situation other than restarting the server unless you have an idea what the open cursor is (most likely with no timeout since mongod also cleans up cursors that are idle for 10 minutes). There are also plans to include a new command that will list all cursors in the server (https://jira.mongodb.org/browse/SERVER-3090) which can help in situations like this. Thanks! | |
| Comment by Steffen [ 22/Mar/16 ] | |
|
Result of this command:
| |
| Comment by Randolph Tan [ 22/Mar/16 ] | |
|
Hi, Alternatively, you can use this gdb command without the need to explicitly attach to the process:
Note that the command might need higher privileges and you might need to sudo the command. | |
| Comment by Randolph Tan [ 22/Mar/16 ] | |
|
Hi, Is it possible to attach gdb to the process and capture the backtrace for all threads? It will also be helpful if you capture a couple of times, around 3~5, with an interval of around a second. This will give us more clue on where the chunk deleter is stuck. Thanks! | |
| Comment by Steffen [ 22/Mar/16 ] | |
|
Forgot to say that we've sent a db.killOp("repset5:-1973023014") after 5 days which didn't kill it. |