[SERVER-26159] mongos crashes: Invariant failure !op->timedOut() src/mongo/executor/network_interface_asio.cpp Created: 19/Sep/16 Updated: 08/Jan/24 Resolved: 21/Sep/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Networking |
| Affects Version/s: | None |
| Fix Version/s: | 3.2.10, 3.4.0-rc0 |
| Type: | Bug | Priority: | Blocker - P1 |
| Reporter: | Xiaoguang Wang | Assignee: | Samantha Ritter (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | code-only | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Operating System: | ALL | ||||||||
| Backport Completed: | |||||||||
| Sprint: | Platforms 2016-09-19, Platforms 2016-10-10 | ||||||||
| Participants: | |||||||||
| Case: | (copied to CRM) | ||||||||
| Description |
|
We met the bug
|
| Comments |
| Comment by Xiaoguang Wang [ 21/Sep/16 ] |
|
Thanks for your quick response and fix. We decided to disable "move chunk" and "auto split" in our production environment because we have experienced the mongo cluster no-response problem at least 3 times recently, we are sure these problems are caused by "move chunk" and "auto split" with high workload. And, this crash( It is suspected that the design of "chunk" has defects. Can you look into the "move chunk" and "auto split" procedure to prevent mongo cluster stopping response under high workload? |
| Comment by Samantha Ritter (Inactive) [ 21/Sep/16 ] |
|
Thank you very much for the information and log files. I have found a race condition in our connection code. The race is triggered when communication between cluster members takes about as long as our timeout value, so it makes sense that you encountered it when mongos was stuck trying to refresh. I'm very sorry that this bug has been interfering with your service, especially after I've merged in the fix for this race condition to both our v3.2 and master branches. This fix will be available in 3.2.10-rc1, which we will release shortly. If you are willing to upgrade to rc1 when it comes out, we are hopeful that it will fix this issue for you. If rc1 does not fix the issue this will continue to be a top priority for us. I'm going to resolve this ticket. If you continue to see this issue in 3.2.10-rc1, please don't hesitate to open a new ticket so we can address the problem right away. Thanks, |
| Comment by Githook User [ 21/Sep/16 ] |
|
Author: {u'username': u'samantharitter', u'name': u'samantharitter', u'email': u'samantha.ritter@10gen.com'}Message: |
| Comment by Kelsey Schubert [ 21/Sep/16 ] |
|
Thank you for answering Sam's questions. Please be aware that I've moved the uploaded files to the secure upload portal that she provided. Kind regards, |
| Comment by Xiaoguang Wang [ 21/Sep/16 ] |
|
> Are you running a replica set or a sharded cluster? How many nodes? |
| Comment by Xiaoguang Wang [ 21/Sep/16 ] |
|
> How frequently have you seen this crash? > Is it something you are able to reproduce consistently? > Would you please upload an archive of the diagnostic.data directory from the dbpath of a primary node? |
| Comment by Githook User [ 20/Sep/16 ] |
|
Author: {u'username': u'samantharitter', u'name': u'samantharitter', u'email': u'samantha.ritter@10gen.com'}Message: |
| Comment by Samantha Ritter (Inactive) [ 20/Sep/16 ] |
|
wxiaoguang@gmail.com, as we investigate this, if you are able to give me some information about your environment and workload that would be very helpful.
I've created a secure upload portal for you to use. Files uploaded to this portal are visible only to MongoDB employees and are routinely deleted after some time. Thank you, |
| Comment by Samantha Ritter (Inactive) [ 20/Sep/16 ] |
|
Hi there, just an update on this ticket. I believe I've identified the root cause of the issue and we're currently reviewing and testing a fix for it. |