[SERVER-66432] unstable point TS from recoverToStableTimestamp Created: 12/May/22  Updated: 11/Jul/22  Resolved: 11/Jul/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Nir Elkayam Assignee: Chris Kelly
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

Hi,

 

I'm using MongoDB 4.2.18 and I have a cluster of 3 nodes.
after system updated I got 2/3 replicas up.
one replica can't up with the error: Common point must be at least stable timestamp.

 

 

Mongodb will be initialized as cluster replicaset
2022-05-12T08:43:58.818+0000 I  CONTROL  [main] Automatically disabling TLS 1.0, to force-enable TLS 1.0 specify --sslDisabledProtocols 'none'
2022-05-12T08:43:58.819+0000 W  ASIO     [main] No TransportLayer configured during NetworkInterface startup
2022-05-12T08:43:58.902+0000 I  CONTROL  [initandlisten] MongoDB starting : pid=1 port=27017 dbpath=/data/db 64-bit host=a18c0ccd6d96
2022-05-12T08:43:58.902+0000 I  CONTROL  [initandlisten] db version v4.2.18
2022-05-12T08:43:58.902+0000 I  CONTROL  [initandlisten] git version: f65ce5e25c0b26a00d091a4d24eec1a8b3a4c016
2022-05-12T08:43:58.902+0000 I  CONTROL  [initandlisten] OpenSSL version: OpenSSL 1.1.1  11 Sep 2018
2022-05-12T08:43:58.902+0000 I  CONTROL  [initandlisten] allocator: tcmalloc
2022-05-12T08:43:58.902+0000 I  CONTROL  [initandlisten] modules: none
2022-05-12T08:43:58.902+0000 I  CONTROL  [initandlisten] build environment:
2022-05-12T08:43:58.902+0000 I  CONTROL  [initandlisten]     distmod: ubuntu1804
2022-05-12T08:43:58.902+0000 I  CONTROL  [initandlisten]     distarch: x86_64
2022-05-12T08:43:58.902+0000 I  CONTROL  [initandlisten]     target_arch: x86_64
2022-05-12T08:43:58.902+0000 I  CONTROL  [initandlisten] 16384 MB of memory available to the process out of 253753 MB total system memory
2022-05-12T08:43:58.902+0000 I  CONTROL  [initandlisten] options: { net: { bindIp: "*" }, replication: { oplogSizeMB: 5120, replSet: "rs0" }, security: { authorization: "enabled", keyFile: "/run/secrets/replicakeyfile" }, setParameter: { diagnosticDataCollectionEnabled: "false", internalQueryExecMaxBlockingSortBytes: "104857600" }, storage: { journal: { enabled: true } } }
2022-05-12T08:43:58.903+0000 W  STORAGE  [initandlisten] Detected unclean shutdown - /data/db/mongod.lock is not empty.
2022-05-12T08:43:58.903+0000 I  STORAGE  [initandlisten] Detected data files in /data/db created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'.
2022-05-12T08:43:58.903+0000 W  STORAGE  [initandlisten] Recovering data from the last clean checkpoint.
2022-05-12T08:43:58.903+0000 I  STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=7680M,cache_overflow=(file_max=0M),session_max=33000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000,close_scan_interval=10,close_handle_minimum=250),statistics_log=(wait=0),verbose=[recovery_progress,checkpoint_progress],
2022-05-12T08:43:59.509+0000 I  STORAGE  [initandlisten] WiredTiger message [1652345039:509891][1:0x7f0061e2bb00], txn-recover: Recovering log 2289 through 2290
2022-05-12T08:43:59.584+0000 I  STORAGE  [initandlisten] WiredTiger message [1652345039:584257][1:0x7f0061e2bb00], txn-recover: Recovering log 2290 through 2290
2022-05-12T08:43:59.709+0000 I  STORAGE  [initandlisten] WiredTiger message [1652345039:709408][1:0x7f0061e2bb00], txn-recover: Main recovery loop: starting at 2289/256 to 2290/256
2022-05-12T08:43:59.709+0000 I  STORAGE  [initandlisten] WiredTiger message [1652345039:709900][1:0x7f0061e2bb00], txn-recover: Recovering log 2289 through 2290
2022-05-12T08:43:59.805+0000 I  STORAGE  [initandlisten] WiredTiger message [1652345039:805211][1:0x7f0061e2bb00], file:collection-10-199254800462122439.wt, txn-recover: Recovering log 2290 through 2290
2022-05-12T08:43:59.879+0000 I  STORAGE  [initandlisten] WiredTiger message [1652345039:879915][1:0x7f0061e2bb00], file:collection-10-199254800462122439.wt, txn-recover: Set global recovery timestamp: (1652199868, 2)
2022-05-12T08:43:59.886+0000 I  RECOVERY [initandlisten] WiredTiger recoveryTimestamp. Ts: Timestamp(1652199868, 2)
2022-05-12T08:43:59.888+0000 I  STORAGE  [initandlisten] No table logging settings modifications are required for existing WiredTiger tables. Logging enabled? 0
2022-05-12T08:43:59.890+0000 I  STORAGE  [initandlisten] Starting OplogTruncaterThread local.oplog.rs
2022-05-12T08:43:59.890+0000 I  STORAGE  [initandlisten] The size storer reports that the oplog contains 133091 records totaling to 17956205 bytes
2022-05-12T08:43:59.890+0000 I  STORAGE  [initandlisten] Sampling the oplog to determine where to place markers for truncation
2022-05-12T08:43:59.892+0000 I  STORAGE  [initandlisten] Sampling from the oplog between Apr 26 13:17:05:1 and May 10 16:24:28:2 to determine where to place markers for truncation
2022-05-12T08:43:59.892+0000 I  STORAGE  [initandlisten] Taking 3 samples and assuming that each section of oplog contains approximately 397928 records totaling to 53687151 bytes
2022-05-12T08:43:59.892+0000 I  STORAGE  [initandlisten] WiredTiger record store oplog processing took 2ms
2022-05-12T08:43:59.893+0000 I  STORAGE  [initandlisten] Timestamp monitor starting
2022-05-12T08:43:59.893+0000 I  CONTROL  [initandlisten] ** WARNING: You are running this process as the root user, which is not recommended.
2022-05-12T08:43:59.893+0000 I  CONTROL  [initandlisten]
2022-05-12T08:43:59.894+0000 I  CONTROL  [initandlisten]
2022-05-12T08:43:59.894+0000 I  CONTROL  [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/enabled is 'always'.
2022-05-12T08:43:59.894+0000 I  CONTROL  [initandlisten] **        We suggest setting it to 'never'
2022-05-12T08:43:59.894+0000 I  CONTROL  [initandlisten]
2022-05-12T08:43:59.894+0000 I  CONTROL  [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/defrag is 'always'.
2022-05-12T08:43:59.894+0000 I  CONTROL  [initandlisten] **        We suggest setting it to 'never'
2022-05-12T08:43:59.894+0000 I  CONTROL  [initandlisten]
2022-05-12T08:43:59.899+0000 I  STORAGE  [initandlisten] Flow Control is enabled on this deployment.
2022-05-12T08:43:59.901+0000 I  FTDC     [initandlisten] Initializing full-time diagnostic data capture with directory '/data/db/diagnostic.data'
2022-05-12T08:43:59.905+0000 I  REPL     [initandlisten] Rollback ID is 1
2022-05-12T08:43:59.906+0000 I  REPL     [initandlisten] Recovering from stable timestamp: Timestamp(1652199868, 2) (top of oplog: { ts: Timestamp(1652199868, 2), t: 15 }, appliedThrough: { ts: Timestamp(1652199868, 2), t: 15 }, TruncateAfter: Timestamp(0, 0))
2022-05-12T08:43:59.906+0000 I  REPL     [initandlisten] Starting recovery oplog application at the stable timestamp: Timestamp(1652199868, 2)
2022-05-12T08:43:59.906+0000 I  REPL     [initandlisten] No oplog entries to apply for recovery. Start point is at the top of the oplog.
2022-05-12T08:43:59.908+0000 I  CONTROL  [LogicalSessionCacheRefresh] Sessions collection is not set up; waiting until next sessions refresh interval: Replication has not yet been configured
2022-05-12T08:43:59.909+0000 I  CONTROL  [LogicalSessionCacheReap] Failed to reap transaction table: NotYetInitialized: Replication has not yet been configured
2022-05-12T08:43:59.909+0000 I  NETWORK  [listener] Listening on /tmp/mongodb-27017.sock
2022-05-12T08:43:59.909+0000 I  NETWORK  [listener] Listening on 0.0.0.0
2022-05-12T08:43:59.909+0000 I  NETWORK  [listener] waiting for connections on port 27017
2022-05-12T08:44:03.153+0000 W  REPL     [replexec-0] Locally stored replica set configuration does not have a valid entry for the current node; waiting for reconfig or remote heartbeat; Got "NodeNotFound: No host described in new configuration 136369957 for replica set rs0 maps to this node" while validating { _id: "rs0", version: 136369957, protocolVersion: 1, writeConcernMajorityJournalDefault: true, members: [ { _id: 1, host: "10.0.10.130:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 0, host: "10.0.11.89:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 2, host: "10.0.11.156:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 } ], settings: { chainingAllowed: true, heartbeatIntervalMillis: 2000, heartbeatTimeoutSecs: 10, electionTimeoutMillis: 10000, catchUpTimeoutMillis: -1, catchUpTakeoverDelayMillis: 30000, getLastErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 }, replicaSetId: ObjectId('62572b1492bafe31c889776d') } }
2022-05-12T08:44:03.153+0000 I  REPL     [replexec-0] New replica set config in use: { _id: "rs0", version: 136369957, protocolVersion: 1, writeConcernMajorityJournalDefault: true, members: [ { _id: 1, host: "10.0.10.130:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 0, host: "10.0.11.89:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 2, host: "10.0.11.156:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 } ], settings: { chainingAllowed: true, heartbeatIntervalMillis: 2000, heartbeatTimeoutSecs: 10, electionTimeoutMillis: 10000, catchUpTimeoutMillis: -1, catchUpTakeoverDelayMillis: 30000, getLastErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 }, replicaSetId: ObjectId('62572b1492bafe31c889776d') } }
2022-05-12T08:44:03.154+0000 I  REPL     [replexec-0] This node is not a member of the config
2022-05-12T08:44:03.154+0000 I  REPL     [replexec-0] transition to REMOVED from STARTUP
2022-05-12T08:44:37.890+0000 I  NETWORK  [listener] connection accepted from 127.0.0.1:34882 #3 (1 connection now open)
2022-05-12T08:44:37.891+0000 I  NETWORK  [conn3] received client metadata from 127.0.0.1:34882 conn3: { application: { name: "MongoDB Shell" }, driver: { name: "MongoDB Internal Client", version: "4.2.18" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "18.04" } }
2022-05-12T08:44:37.896+0000 I  NETWORK  [conn3] end connection 127.0.0.1:34882 (0 connections now open)
2022-05-12T08:44:42.257+0000 I  NETWORK  [listener] connection accepted from 127.0.0.1:35764 #4 (1 connection now open)
2022-05-12T08:44:42.258+0000 I  NETWORK  [conn4] received client metadata from 127.0.0.1:35764 conn4: { application: { name: "MongoDB Shell" }, driver: { name: "MongoDB Internal Client", version: "4.2.18" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "18.04" } }
2022-05-12T08:44:42.283+0000 I  ACCESS   [conn4] Successfully authenticated as principal admin on admin from client 127.0.0.1:35764
2022-05-12T08:44:42.293+0000 I  NETWORK  [conn4] end connection 127.0.0.1:35764 (0 connections now open)
2022-05-12T08:44:43.242+0000 I  NETWORK  [listener] connection accepted from 127.0.0.1:36306 #5 (1 connection now open)
2022-05-12T08:44:43.243+0000 I  NETWORK  [conn5] received client metadata from 127.0.0.1:36306 conn5: { application: { name: "MongoDB Shell" }, driver: { name: "MongoDB Internal Client", version: "4.2.18" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "18.04" } }
2022-05-12T08:44:43.267+0000 I  ACCESS   [conn5] Successfully authenticated as principal admin on admin from client 127.0.0.1:36306
2022-05-12T08:44:43.274+0000 I  NETWORK  [conn5] end connection 127.0.0.1:36306 (0 connections now open)
2022-05-12T08:44:43.517+0000 I  NETWORK  [listener] connection accepted from 127.0.0.1:36452 #6 (1 connection now open)
2022-05-12T08:44:43.518+0000 I  NETWORK  [conn6] received client metadata from 127.0.0.1:36452 conn6: { application: { name: "MongoDB Shell" }, driver: { name: "MongoDB Internal Client", version: "4.2.18" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "18.04" } }
2022-05-12T08:44:43.541+0000 I  ACCESS   [conn6] Successfully authenticated as principal admin on admin from client 127.0.0.1:36452
2022-05-12T08:44:43.547+0000 I  NETWORK  [conn6] end connection 127.0.0.1:36452 (0 connections now open)
2022-05-12T08:44:44.375+0000 I  NETWORK  [listener] connection accepted from 10.0.10.130:58370 #7 (1 connection now open)
2022-05-12T08:44:44.442+0000 I  ACCESS   [conn7] Successfully authenticated as principal __system on local from client 10.0.10.130:58370
2022-05-12T08:44:44.442+0000 I  NETWORK  [conn7] end connection 10.0.10.130:58370 (0 connections now open)
2022-05-12T08:44:44.443+0000 I  NETWORK  [listener] connection accepted from 10.0.10.130:58376 #8 (1 connection now open)
2022-05-12T08:44:44.444+0000 I  NETWORK  [conn8] received client metadata from 10.0.10.130:58376 conn8: { driver: { name: "NetworkInterfaceTL", version: "4.2.18" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "18.04" } }
2022-05-12T08:44:44.445+0000 I  ACCESS   [conn8] Successfully authenticated as principal __system on local from client 10.0.10.130:58376
2022-05-12T08:44:44.446+0000 I  CONNPOOL [Replication] Connecting to 10.0.10.130:27017
2022-05-12T08:44:44.458+0000 I  NETWORK  [listener] connection accepted from 10.0.11.89:38938 #10 (2 connections now open)
2022-05-12T08:44:44.478+0000 I  REPL     [replexec-0] New replica set config in use: { _id: "rs0", version: 136466117, protocolVersion: 1, writeConcernMajorityJournalDefault: true, members: [ { _id: 1, host: "10.0.10.130:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 0, host: "10.0.11.89:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 2, host: "10.0.11.157:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 } ], settings: { chainingAllowed: true, heartbeatIntervalMillis: 2000, heartbeatTimeoutSecs: 10, electionTimeoutMillis: 10000, catchUpTimeoutMillis: -1, catchUpTakeoverDelayMillis: 30000, getLastErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 }, replicaSetId: ObjectId('62572b1492bafe31c889776d') } }
2022-05-12T08:44:44.478+0000 I  REPL     [replexec-0] This node is 10.0.11.157:27017 in the config
2022-05-12T08:44:44.478+0000 I  REPL     [replexec-0] Scheduling a task to begin or continue replication
2022-05-12T08:44:44.478+0000 I  REPL     [replexec-0] transition to STARTUP2 from REMOVED
2022-05-12T08:44:44.478+0000 I  REPL     [replexec-4] Starting replication storage threads
2022-05-12T08:44:44.479+0000 I  CONNPOOL [Replication] Connecting to 10.0.11.89:27017
2022-05-12T08:44:44.479+0000 I  REPL     [replexec-1] Member 10.0.10.130:27017 is now in state PRIMARY
2022-05-12T08:44:44.479+0000 I  REPL     [replexec-4] transition to RECOVERING from STARTUP2
2022-05-12T08:44:44.479+0000 I  REPL     [replexec-4] Starting replication fetcher thread
2022-05-12T08:44:44.479+0000 I  REPL     [replexec-4] Starting replication applier thread
2022-05-12T08:44:44.479+0000 I  REPL     [replexec-4] Starting replication reporter thread
2022-05-12T08:44:44.480+0000 I  REPL     [rsSync-0] Starting oplog application
2022-05-12T08:44:44.480+0000 I  REPL     [rsBackgroundSync] waiting for 3 pings from other members before syncing
2022-05-12T08:44:44.480+0000 I  REPL     [rsSync-0] transition to SECONDARY from RECOVERING
2022-05-12T08:44:44.480+0000 I  REPL     [rsSync-0] Resetting sync source to empty, which was :27017
2022-05-12T08:44:44.483+0000 I  REPL     [replexec-3] Member 10.0.11.89:27017 is now in state SECONDARY
2022-05-12T08:44:44.484+0000 I  NETWORK  [listener] connection accepted from 10.0.11.89:38940 #14 (3 connections now open)
2022-05-12T08:44:44.484+0000 I  NETWORK  [conn14] received client metadata from 10.0.11.89:38940 conn14: { driver: { name: "NetworkInterfaceTL", version: "4.2.18" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "18.04" } }
2022-05-12T08:44:44.536+0000 I  ACCESS   [conn10] Successfully authenticated as principal __system on local from client 10.0.11.89:38938
2022-05-12T08:44:44.536+0000 I  NETWORK  [conn10] end connection 10.0.11.89:38938 (2 connections now open)
2022-05-12T08:44:44.555+0000 I  NETWORK  [listener] connection accepted from 10.0.11.89:38944 #15 (3 connections now open)
2022-05-12T08:44:44.555+0000 I  ACCESS   [conn14] Successfully authenticated as principal __system on local from client 10.0.11.89:38940
2022-05-12T08:44:44.555+0000 I  NETWORK  [conn15] received client metadata from 10.0.11.89:38944 conn15: { driver: { name: "NetworkInterfaceTL", version: "4.2.18" }, os: { type: "Linux", name: "Ubuntu", architecture: "x86_64", version: "18.04" } }
2022-05-12T08:44:44.558+0000 I  ACCESS   [conn15] Successfully authenticated as principal __system on local from client 10.0.11.89:38944
2022-05-12T08:44:45.222+0000 I  NETWORK  [listener] connection accepted from 10.0.9.34:59800 #16 (4 connections now open)
2022-05-12T08:44:45.222+0000 I  NETWORK  [listener] connection accepted from 10.0.9.34:59801 #17 (5 connections now open)
2022-05-12T08:44:45.222+0000 I  NETWORK  [conn16] received client metadata from 10.0.9.34:59800 conn16: { driver: { name: "mongo-java-driver|sync|spring-data", version: "4.1.2" }, os: { type: "Linux", name: "Linux", architecture: "amd64", version: "3.10.0-1160.49.1.el7.x86_64" }, platform: "Java/AdoptOpenJDK/15.0.2+7", application: { name: "analytics-core-ms" } }
2022-05-12T08:44:45.222+0000 I  NETWORK  [conn17] received client metadata from 10.0.9.34:59801 conn17: { driver: { name: "mongo-java-driver|sync|spring-data", version: "4.1.2" }, os: { type: "Linux", name: "Linux", architecture: "amd64", version: "3.10.0-1160.49.1.el7.x86_64" }, platform: "Java/AdoptOpenJDK/15.0.2+7", application: { name: "analytics-core-ms" } }
2022-05-12T08:44:45.480+0000 I  REPL     [rsBackgroundSync] sync source candidate: 10.0.11.89:27017
2022-05-12T08:44:45.480+0000 I  CONNPOOL [RS] Connecting to 10.0.11.89:27017
2022-05-12T08:44:45.565+0000 I  REPL     [rsBackgroundSync] Changed sync source from empty to 10.0.11.89:27017
2022-05-12T08:44:45.612+0000 I  REPL     [rsBackgroundSync] Starting rollback due to OplogStartMissing: Our last optime fetched: { ts: Timestamp(1652199868, 2), t: 15 }. source's GTE: { ts: Timestamp(1652201127, 2), t: 15 }
2022-05-12T08:44:45.612+0000 I  REPL     [rsBackgroundSync] Replication commit point: { ts: Timestamp(0, 0), t: -1 }
2022-05-12T08:44:45.612+0000 I  REPL     [rsBackgroundSync] Rollback using 'recoverToStableTimestamp' method.
2022-05-12T08:44:45.612+0000 I  REPL     [rsBackgroundSync] Scheduling rollback (sync source: 10.0.11.89:27017)
2022-05-12T08:44:45.612+0000 I  ROLLBACK [rsBackgroundSync] transition to ROLLBACK
2022-05-12T08:44:45.612+0000 I  REPL     [rsBackgroundSync] State transition ops metrics: { lastStateTransition: "rollback", userOpsKilled: 0, userOpsRunning: 6 }
2022-05-12T08:44:45.612+0000 I  REPL     [rsBackgroundSync] transition to ROLLBACK from SECONDARY
2022-05-12T08:44:45.612+0000 I  NETWORK  [rsBackgroundSync] Skip closing connection for connection # 15
2022-05-12T08:44:45.612+0000 I  NETWORK  [rsBackgroundSync] Skip closing connection for connection # 14
2022-05-12T08:44:45.612+0000 I  NETWORK  [rsBackgroundSync] Skip closing connection for connection # 8
2022-05-12T08:44:45.612+0000 I  NETWORK  [conn16] end connection 10.0.9.34:59800 (4 connections now open)
2022-05-12T08:44:45.612+0000 I  NETWORK  [conn17] end connection 10.0.9.34:59801 (3 connections now open)
2022-05-12T08:44:45.612+0000 I  ROLLBACK [rsBackgroundSync] Waiting for all background operations to complete before starting rollback
2022-05-12T08:44:45.612+0000 I  ROLLBACK [rsBackgroundSync] Finished waiting for background operations to complete before rollback
2022-05-12T08:44:45.612+0000 I  ROLLBACK [rsBackgroundSync] finding common point
2022-05-12T08:44:45.677+0000 I  ROLLBACK [rsBackgroundSync] Rollback common point is { ts: Timestamp(1652199863, 1), t: 14 }
2022-05-12T08:44:45.677+0000 F  ROLLBACK [rsBackgroundSync] Common point must be at least stable timestamp, common point: Timestamp(1652199863, 1), stable timestamp: Timestamp(1652199868, 2)
2022-05-12T08:44:45.677+0000 F  -        [rsBackgroundSync] Fatal Assertion 51121 at src/mongo/db/repl/rollback_impl.cpp 969
2022-05-12T08:44:45.677+0000 F  -        [rsBackgroundSync] \n\n***aborting after fassert() failure\n\n

 

 



 Comments   
Comment by Chris Kelly [ 11/Jul/22 ]

We haven’t heard back from you for some time, so I’m going to close this ticket. If this is still an issue for you, please provide additional information and we will reopen the ticket.

Comment by Chris Kelly [ 01/Jul/22 ]

We still need additional information to diagnose the problem. If this is still an issue for you, would you please submit the requested information? If you've fixed it, we can go ahead and close the ticket.

Comment by Chris Kelly [ 01/Jun/22 ]

Hi Nir,

In order to see what happened leading up to this event, please upload the following for each node in the replica set spanning a time period that includes the incident:

  • the mongod logs
  • the $dbpath/diagnostic.data directory (the contents are described here)

You should be able to resolve this by performing a clean resync from an unaffected node.

Regards,
Christopher

Generated at Thu Feb 08 06:05:24 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.