[SERVER-36569] Config Server unable to restart staging Opslog missing entries error. Created: 10/Aug/18  Updated: 04/Nov/18  Resolved: 01/Oct/18

Status: Closed
Project: Core Server
Component/s: Replication, Sharding
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Prasad Surase Assignee: Nick Brewer
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File 1.png     PNG File 2.png     PNG File 3.png     PNG File 4.png     PNG File 5.png     Text File config_primary.log     Text File config_secondary.log    
Operating System: ALL
Participants:

 Description   

I have a 13 server MongoDB(4.0.0) cluster consisting of 3 Config Servers(configured as a replication set). the Primary of the CS is crashes stating the error below. Please find the log attachments for more details.

Rollback common point is { ts: Timestamp(1533794677, 1), t: 5 }
2018-08-10T06:06:09.177+0000 I ROLLBACK [rsBackgroundSync] finding record store counts
2018-08-10T06:06:09.177+0000 I REPL [rsBackgroundSync] Incremented the rollback ID to 230
2018-08-10T06:06:09.177+0000 I ROLLBACK [rsBackgroundSync] Preparing to write deleted documents to a rollback file for collection config.lockpings with uuid ec8d2b3e-3d7e-43b2-acae-b5e68f407945 to /data_storages/data/rollback/config.lockpings/removed.2018-08-10T06-06-09.0.bson
2018-08-10T06:06:09.178+0000 I STORAGE [rsBackgroundSync] closeCatalog: closing all databases
2018-08-10T06:06:09.178+0000 I STORAGE [rsBackgroundSync] closeCatalog: closing storage engine catalog
2018-08-10T06:06:09.178+0000 I STORAGE [WTOplogJournalThread] oplog journal thread loop shutting down
2018-08-10T06:06:09.178+0000 F ROLLBACK [rsBackgroundSync] RecoverToStableTimestamp failed. :: caused by :: UnrecoverableRollbackError: No stable timestamp available to recover to. Initial data timestamp: Timestamp(1533794690, 1), Stable timestamp: Timestamp(0, 0)
2018-08-10T06:06:09.178+0000 I ROLLBACK [rsBackgroundSync] Rollback summary:
2018-08-10T06:06:09.178+0000 I ROLLBACK [rsBackgroundSync] start time: 2018-08-10T06:06:09.160+0000
2018-08-10T06:06:09.179+0000 I ROLLBACK [rsBackgroundSync] end time: 2018-08-10T06:06:09.170+0000
2018-08-10T06:06:09.179+0000 I ROLLBACK [rsBackgroundSync] sync source: ip-172-30-0-149:57042
2018-08-10T06:06:09.179+0000 I ROLLBACK [rsBackgroundSync] rollback data file directory: /data_storages/data/rollback/config.lockpings
2018-08-10T06:06:09.179+0000 I ROLLBACK [rsBackgroundSync] rollback id: 230
2018-08-10T06:06:09.179+0000 I ROLLBACK [rsBackgroundSync] last optime on branch of history rolled back: { ts: Timestamp(1533794690, 1), t: 5 }
2018-08-10T06:06:09.179+0000 I ROLLBACK [rsBackgroundSync] common point optime: { ts: Timestamp(1533794677, 1), t: 5 }
2018-08-10T06:06:09.179+0000 I ROLLBACK [rsBackgroundSync] last wall clock time on the branch of history rolled back: 2018-08-09T06:04:50.652+0000
2018-08-10T06:06:09.179+0000 I ROLLBACK [rsBackgroundSync] common point wall clock time: 2018-08-09T06:04:37.479+0000
2018-08-10T06:06:09.179+0000 I ROLLBACK [rsBackgroundSync] difference in wall clock times: 13 second(s)
2018-08-10T06:06:09.179+0000 I ROLLBACK [rsBackgroundSync] shard identity document rolled back: false
2018-08-10T06:06:09.179+0000 I ROLLBACK [rsBackgroundSync] config server config version document rolled back: false
2018-08-10T06:06:09.179+0000 I ROLLBACK [rsBackgroundSync] affected sessions: none
2018-08-10T06:06:09.179+0000 I ROLLBACK [rsBackgroundSync] affected namespaces:
2018-08-10T06:06:09.179+0000 I ROLLBACK [rsBackgroundSync] config.lockpings
2018-08-10T06:06:09.179+0000 I ROLLBACK [rsBackgroundSync] counts of interesting commands rolled back:
2018-08-10T06:06:09.179+0000 I ROLLBACK [rsBackgroundSync] update: 1
2018-08-10T06:06:09.179+0000 I ROLLBACK [rsBackgroundSync] total number of entries rolled back (including no-ops): 2
2018-08-10T06:06:09.179+0000 I ROLLBACK [rsBackgroundSync] transition to SECONDARY
2018-08-10T06:06:09.179+0000 I REPL [rsBackgroundSync] transition to SECONDARY from ROLLBACK
2018-08-10T06:06:09.179+0000 F REPL [rsBackgroundSync] Rollback failed with unrecoverable error: UnrecoverableRollbackError: Recover to stable timestamp failed.
2018-08-10T06:06:09.179+0000 F - [rsBackgroundSync] Fatal assertion 50666 UnrecoverableRollbackError: Recover to stable timestamp failed. at src/mongo/db/repl/bgsync.cpp



 Comments   
Comment by Nick Brewer [ 01/Oct/18 ]

prasadsurase Since there's been no activity on this ticket in some time, I'm going to close it. Feel free to update this ticket with the requested information, and we'll reopen it for you.

-Nick

Comment by Nick Brewer [ 14/Sep/18 ]

prasadsurase Is this still an issue for you? Thanks in advance for any information you can provide.

-Nick

Comment by Nick Brewer [ 16/Aug/18 ]

prasadsurase Looking at the logs you've provided, it doesn't appear that these servers were ever able to connect to each other from the time mongod was started - have you tested connectivity between these machines, and have they been able to connect to each other in the past?

-Nick

Comment by Prasad Surase [ 13/Aug/18 ]

@Nick pls find the logs for config_primary and config_secondary servers.

Comment by Nick Brewer [ 10/Aug/18 ]

prasadsurase Thanks for your report - please upload plaintext log files instead of screenshots. 

Thanks,

-Nick

Generated at Thu Feb 08 04:43:30 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.