[SERVER-24283] Invariant failure grantedCounts[mode] >= 1 Created: 25/May/16 Updated: 10/May/23 Resolved: 15/Jun/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Concurrency |
| Affects Version/s: | 3.2.6 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Eric Le Lay | Assignee: | Kaloian Manassiev |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | mmapv1 | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Operating System: | ALL | ||||||||
| Steps To Reproduce: |
|
||||||||
| Sprint: | Sharding 15 (06/03/16), Sharding 16 (06/24/16) | ||||||||
| Participants: | |||||||||
| Linked BF Score: | 5 | ||||||||
| Description |
|
Mongo running in a 3 machines cluster crashed with invariant failure on the primary.
LOG EXCERPT ON THE PRIMARY
|
| Comments |
| Comment by Kaloian Manassiev [ 11/Oct/16 ] | ||||||||||||||||||||||||||||||
|
Hi ericeliga, Another customer recently discovered a similar lock manager problem which turned out to be due to bugs in the Intel's TSX microcode ( Are you still experiencing this invariant failure and if so would it be possible to report the chipset which the machine is running? Thank you in advance. Best regards, | ||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 15/Jun/16 ] | ||||||||||||||||||||||||||||||
|
ericeliga, the additional checks mentioned above are described in | ||||||||||||||||||||||||||||||
| Comment by Kaloian Manassiev [ 15/Jun/16 ] | ||||||||||||||||||||||||||||||
|
Hi, Apologies for the late reply. We have been unable to reproduce this issue on our side and the information in the call stacks is insufficient to figure out the root cause. As part of the investigation we inspected the code and found no obvious bugs and in addition we checked in some extra assertions and tests to help us catch this at an earlier stage should it happen in the future and we will back-port them to 3.2. For now though, there is nothing more that we can do, so I am going to close this ticket as 'Cannot reproduce' until we get more information, should it happen with the additional checks in place. Apologies again for the inconvenience and thank you very much for providing us with debugging information. Best regards, | ||||||||||||||||||||||||||||||
| Comment by Eric Le Lay [ 15/Jun/16 ] | ||||||||||||||||||||||||||||||
|
Hi, are you waiting for more info from my side? | ||||||||||||||||||||||||||||||
| Comment by Kaloian Manassiev [ 31/May/16 ] | ||||||||||||||||||||||||||||||
|
This is running on MMAP V1 and from the getmores, which happened before the invariant was hit it can be seen that the oplog collection lock is being acquired in MODE_S (the last entry in the log line below):
Also, from the format of the log line, it looks like the caller is using legacy protocol codes and not the new getmore command. This is an example of how to exercise this code path:
The oplog collection lock is acquired in MODE_S because this is MMAP V1 node. | ||||||||||||||||||||||||||||||
| Comment by Eric Le Lay [ 26/May/16 ] | ||||||||||||||||||||||||||||||
|
I uploaded mongod_clean_1.log.bz2. | ||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 26/May/16 ] | ||||||||||||||||||||||||||||||
|
Hi ericeliga, you can use this upload portal to send us files for this ticket securely and privately. Thanks, | ||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 25/May/16 ] | ||||||||||||||||||||||||||||||
|
ericeliga, can you please upload the full logs from the affected node from the last restart until the time of the invariant failure? Thanks, | ||||||||||||||||||||||||||||||
| Comment by Andy Schwerin [ 25/May/16 ] | ||||||||||||||||||||||||||||||
|
My read of this is that we're either unlocking a different mode that we locked, or that somebody else double-unlocked. | ||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 25/May/16 ] | ||||||||||||||||||||||||||||||
|
Thanks for your report ericeliga, we're taking a look at the stack trace and we'll post updates in this ticket when we have them. Regards, | ||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 25/May/16 ] | ||||||||||||||||||||||||||||||
|