[SERVER-33971] Nodes in MongoDB sharded cluster crashes with Invariant failure oplogEntry.getWallClockTime() Created: 19/Mar/18  Updated: 29/Oct/23  Resolved: 22/Mar/18

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.6.3
Fix Version/s: 3.6.4, 3.7.4

Type: Bug Priority: Major - P3
Reporter: Yuriy [X] Assignee: Jack Mulrow
Resolution: Fixed Votes: 0
Labels: SWNA
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File rs0-0.1.log     Text File rs0-1.1.log    
Issue Links:
Backports
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v3.6
Participants:

 Description   

We got strange error on our MongoDB cluster. It consists of 4 replicas with 3 nodes each.
Today, during half an hour nodes from 3 replicas began to crash in chain. Primary crashes -> one of secondaries becames primary -> in 2 minutes it crashes too -> new node goes primary -> again crashes in some minutes and so on. Nodes have autorestart, and in 10-15 minutes of chaining crashes all is ok and works normal. To mention, balancer was disabled at that time.
I've attached two parts of logs with assertion from 2 nodes of one replica. If you need more logs or info - just ask.

We use ubuntu 16.04 on servers and docker containers for mongo.
MongoDB version is 3.6.3



 Comments   
Comment by Githook User [ 22/Mar/18 ]

Author:

{'email': 'jack.mulrow@mongodb.com', 'name': 'Jack Mulrow', 'username': 'jsmulrow'}

Message: SERVER-33971 Generate session sentinel oplog entry with wall clock time

(cherry picked from commit daa7dbf7e4564fc38b946416e3240caeb3c59b3a)
Branch: v3.6
https://github.com/mongodb/mongo/commit/f0575e255effe2c2e9d909187c8ee73ab2431b29

Comment by Githook User [ 21/Mar/18 ]

Author:

{'email': 'jack.mulrow@mongodb.com', 'name': 'Jack Mulrow', 'username': 'jsmulrow'}

Message: SERVER-33971 Generate session sentinel oplog entry with wall clock time
Branch: master
https://github.com/mongodb/mongo/commit/daa7dbf7e4564fc38b946416e3240caeb3c59b3a

Comment by Kaloian Manassiev [ 20/Mar/18 ]

jack.mulrow, the only place where it is possible to generate oplog entries without wallclock time is when we write the sentinel entry when a session has dropped off the end of the log. I believe this is what is happening in this situation - we are transferring entries for such a session and we are invariant-ing too early.

Comment by Kaloian Manassiev [ 19/Mar/18 ]

Thank you for confirming, Ubus.

We have a theory about how this can happen that we are working on validating it. We will update the ticket once we have something more specific.

Thank you again for the report!

Comment by Yuriy [X] [ 19/Mar/18 ]

Nope. We haven't used 3.6.0 rc4 or rc5 at all.
Last migration was from 3.6.2 to 3.6.3 in the end of February, 3 weeks ago. And from this time oplog should be rewritten many times, longest oplog length is 40+ hours. Size of oplog is 100Gb.
Sometimes DB is under heavy load, mostly because of bad disk IO.

Comment by Kaloian Manassiev [ 19/Mar/18 ]

Hi Ubus,

Thank you very much for your report.

The crash that you experienced indicates that chunk migration encountered an oplog entry, which contains retryable writes information, but no wallclock time component. Starting in version 3.6.0 we unconditionally write the wallclock time to all oplog entries, so this situation should theoretically not be possible.

However I noticed that between 3.6.0 RC4 and RC5 we fixed SERVER-32027, which was causing the wallclock entries to not always be written. Is it possible that at some point you had RC4 on your system and then eventually upgraded it to 3.6.3? This would have left inconsistent oplog entries and could cause the invariant in question.

Best regards,
-Kal.

Generated at Thu Feb 08 04:35:10 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.