Details
-
Bug
-
Status: Closed
-
Major - P3
-
Resolution: Done
-
None
-
None
-
None
-
None
-
ALL
Description
Greetings, we need assistance in determining the cause of our issue with Mongo cluster.
Cluster settings:
- 3 nodes
- Run on VMware vCenter VMs
- MongoDB shell version v4.2.2
Static hostname: m1-prod-vm-db-mongo02
Icon name: computer-vm
Chassis: vm
Machine ID: 8edfb1744b9947f4a12093a8259d93d0
Boot ID: aed3637a32ca449798827711859ac700
Virtualization: vmware
Operating System: Red Hat Enterprise Linux Server 7.6 (Maipo)
CPE OS Name: cpe:/o:redhat:enterprise_linux:7.6:GA:server
Kernel: Linux 3.10.0-957.27.2.el7.x86_64
Architecture: x86-64
During the execution of the script which updates the data in collections we encountered a failure of one of replica nodes. (update.txt)
To repair the cluster we performed the following:
- Changed the /etc/mongod.conf on failed node - switched the port to 27108 and commented replication settings to run in standalone mode.
- Executed the same script on the failed node in standalone mode, it finished successfully
- Synced the collections with data from healthy replica node using the script :
#!/bin/bash
BEGIN_DATE=$1
LAST_DATE=$2
LOCAL_DATABASE=audit_prod
REMOTE_DATABASE=audit_prod
REMOTE_HOST=m1-prod-vm-db-mongo02
CURRENT_DATE=$BEGIN_DATE
while [$CURRENT_DATE!= $LAST_DATE];
do echo "Текущая дата: $CURRENT_DATE"
mongodump --db="${LOCAL_DATABASE}" --collection="${CURRENT_DATE}" --archive | ssh "${REMOTE_HOST}" -T "mongorestore --archive --port=27018"
echo "Дата завершения сбора синхронизации: $CURRENT_DATE"
CURRENT_DATE=$(date -d "${CURRENT_DATE} +1 день" +%F) Готово
- Synced the oplog collection from healthy replica to the failed one
mongodump -d local -c oplog.rs -o /data/dump/oplog
scp -rp /data/dump/ <username>@<hostname>:/data/dump/
mongorestore -vvvv -d local --port=27018 / вывод данных/
- Restored the initial config on failed node.
The logs during the failure (logs.txt)