[SERVER-33971] Nodes in MongoDB sharded cluster crashes with Invariant failure oplogEntry.getWallClockTime() Created: 19/Mar/18 Updated: 29/Oct/23 Resolved: 22/Mar/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 3.6.3 |
| Fix Version/s: | 3.6.4, 3.7.4 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Yuriy [X] | Assignee: | Jack Mulrow |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | SWNA | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||
| Issue Links: |
|
||||
| Backwards Compatibility: | Fully Compatible | ||||
| Operating System: | ALL | ||||
| Backport Requested: |
v3.6
|
||||
| Participants: | |||||
| Description |
|
We got strange error on our MongoDB cluster. It consists of 4 replicas with 3 nodes each. We use ubuntu 16.04 on servers and docker containers for mongo. |
| Comments |
| Comment by Githook User [ 22/Mar/18 ] |
|
Author: {'email': 'jack.mulrow@mongodb.com', 'name': 'Jack Mulrow', 'username': 'jsmulrow'}Message: (cherry picked from commit daa7dbf7e4564fc38b946416e3240caeb3c59b3a) |
| Comment by Githook User [ 21/Mar/18 ] |
|
Author: {'email': 'jack.mulrow@mongodb.com', 'name': 'Jack Mulrow', 'username': 'jsmulrow'}Message: |
| Comment by Kaloian Manassiev [ 20/Mar/18 ] |
|
jack.mulrow, the only place where it is possible to generate oplog entries without wallclock time is when we write the sentinel entry when a session has dropped off the end of the log. I believe this is what is happening in this situation - we are transferring entries for such a session and we are invariant-ing too early. |
| Comment by Kaloian Manassiev [ 19/Mar/18 ] |
|
Thank you for confirming, Ubus. We have a theory about how this can happen that we are working on validating it. We will update the ticket once we have something more specific. Thank you again for the report! |
| Comment by Yuriy [X] [ 19/Mar/18 ] |
|
Nope. We haven't used 3.6.0 rc4 or rc5 at all. |
| Comment by Kaloian Manassiev [ 19/Mar/18 ] |
|
Hi Ubus, Thank you very much for your report. The crash that you experienced indicates that chunk migration encountered an oplog entry, which contains retryable writes information, but no wallclock time component. Starting in version 3.6.0 we unconditionally write the wallclock time to all oplog entries, so this situation should theoretically not be possible. However I noticed that between 3.6.0 RC4 and RC5 we fixed Best regards, |