[SERVER-60371] Fatal assertion - msgid 34437 - DuplicateKey Created: 30/Sep/21 Updated: 18/Oct/21 Resolved: 18/Oct/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 4.4.9 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Rob Colella | Assignee: | Eric Sedor |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | Bug | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
AWS EC2 Linux2 Graviton |
||
| Issue Links: |
|
||||||||||||||||||||||||
| Sprint: | Storage - Ra 2021-10-04 | ||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Story Points: | 8 | ||||||||||||||||||||||||
| Description |
|
Previously running on version 4.4.8 in a three node repl set and experienced DuplicateKey bug which crashed mongodb and required resync on secondary node. Issue tracked here: https://jira.mongodb.org/browse/WT-7984
Upgraded all nodes to version 4.4.9 which was supposed to resolve this, however, just hit the same bug again.
|
| Comments |
| Comment by Eric Sedor [ 04/Oct/21 ] | ||||||||
|
rob.colella@kibocommerce.com, thanks for the update. A complete initial sync is the remediation we recommend. It does not technically have to be performed off of a primary node, but it does have to be performed off of an un-impacted node. You may be able to imagine other options that will work for you, but we consider this the most straightforward, complete, and least disruptive option. | ||||||||
| Comment by Rob Colella [ 04/Oct/21 ] | ||||||||
|
Hi Eric, just wanted to give you an update and let you know we are currently working through running a validate against all nodes in all clusters. | ||||||||
| Comment by Eric Sedor [ 01/Oct/21 ] | ||||||||
|
Hi rob.colella@kibocommerce.com; There are two issues in 4.4.8 (and earlier) which could persist data inconsistencies in ways that could later cause duplicate key failures after upgrading to 4.4.9. They are Upgrading to 4.4.9 will remove the risk of persisting further inconsistencies but does not by itself correct the impact of these bugs. Some inconsistencies created could be lying in wait to trigger duplicate key issues when the circumstances allow. What we'd like to confirm is that all nodes have been remediated of all possible impact of the bugs from earlier versions. That's why it's important to perform validate on all collections of all nodes after upgrading to 4.4.9, and to confirm all collections on all nodes have passed validate on 4.4.9. It sounds like you're starting that process, so I'll look forward to hearing back from you. Please let me know if I can clarify further. Sincerely, | ||||||||
| Comment by Chenhao Qu [ 01/Oct/21 ] | ||||||||
|
rob.colella@kibocommerce.com Thanks for reporting this issue and providing us with the details. We have converted this issue into a mongodb server ticket and the triage team will take over it and provide further assistance if needed since it is the end of our Friday here in Sydney. | ||||||||
| Comment by Rob Colella [ 01/Oct/21 ] | ||||||||
|
The nodes that crashed this morning were version 4.4.9 | ||||||||
| Comment by Chenhao Qu [ 01/Oct/21 ] | ||||||||
|
rob.colella@kibocommerce.com Thanks for the response. Can you run a validate and let us know? What's the version of the crashed nodes this morning? Are they crashed because of duplicate key error? We believe the node will not crash until the affected data is accessed so it is not strange that it runs several hours and then crash. | ||||||||
| Comment by Rob Colella [ 01/Oct/21 ] | ||||||||
|
Also, what determine if the data is affected or not? | ||||||||
| Comment by Rob Colella [ 01/Oct/21 ] | ||||||||
|
Will running a validate command show if there is an issue or not? | ||||||||
| Comment by Chenhao Qu [ 01/Oct/21 ] | ||||||||
|
rob.colella@kibocommerce.com Have you encountered any more duplicate key issue after its first occurrence? Our theory is that after you upgraded to 4.4.8, your data was affected by the reported bug so it would still hit the duplicate key error after you upgraded to 4.4.9. If that's the case, it should running fine on 4.4.9 since then. | ||||||||
| Comment by Chenhao Qu [ 30/Sep/21 ] | ||||||||
|
rob.colella@kibocommerce.com Thanks for the information. It is very useful for us to narrow down the problem. | ||||||||
| Comment by Rob Colella [ 30/Sep/21 ] | ||||||||
|
This was during normal operation. All nodes were running fine on v4.4.9 since last night with no issues.
Installed packages are as follows:
| ||||||||
| Comment by Chenhao Qu [ 30/Sep/21 ] | ||||||||
|
rob.colella@kibocommerce.com Thanks for your quick response. We are not aware any other customer hitting the same issue on 4.4.9. Can you describe what you have done and the state of the database before you hit this issue? I know you upgraded all the node to 4.4.9. Does this happen at the startup of one node in the upgrade process or during normal operations? | ||||||||
| Comment by Rob Colella [ 30/Sep/21 ] | ||||||||
|
Unfortunately I was only able to grab the crash snippet of the logs before a resync as this is a critical system that needed to be up asap.
| ||||||||
| Comment by Chenhao Qu [ 30/Sep/21 ] | ||||||||
|
rob.colella@kibocommerce.com Sorry to hear that you encountered this issue again. Can you share with us more information to diagnose the problem, e.g. any core dumps, more logs, and if possible, the data files? |