Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Done
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
None

Operating System:
ALL
Confidence Status:
None
Work Order:
3

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Greetings, we need assistance in determining the cause of our issue with Mongo cluster.

Cluster settings:

3 nodes

Run on VMware vCenter VMs

MongoDB shell version v4.2.2

Static hostname: m1-prod-vm-db-mongo02
Icon name: computer-vm
Chassis: vm
Machine ID: 8edfb1744b9947f4a12093a8259d93d0
Boot ID: aed3637a32ca449798827711859ac700
Virtualization: vmware
Operating System: Red Hat Enterprise Linux Server 7.6 (Maipo)
CPE OS Name: cpe:/o:redhat:enterprise_linux:7.6:GA:server
Kernel: Linux 3.10.0-957.27.2.el7.x86_64
Architecture: x86-64

During the execution of the script which updates the data in collections we encountered a failure of one of replica nodes. (update.txt)

To repair the cluster we performed the following:

Changed the /etc/mongod.conf on failed node - switched the port to 27108 and commented replication settings to run in standalone mode.

Executed the same script on the failed node in standalone mode, it finished successfully

Synced the collections with data from healthy replica node using the script :

#!/bin/bash 
BEGIN_DATE=$1 
LAST_DATE=$2 
LOCAL_DATABASE=audit_prod 
REMOTE_DATABASE=audit_prod 
REMOTE_HOST=m1-prod-vm-db-mongo02 
CURRENT_DATE=$BEGIN_DATE 
while [$CURRENT_DATE!= $LAST_DATE]; 
do echo "Текущая дата: $CURRENT_DATE"   
mongodump --db="${LOCAL_DATABASE}" --collection="${CURRENT_DATE}" --archive | ssh "${REMOTE_HOST}" -T "mongorestore --archive --port=27018"   
echo "Дата завершения сбора синхронизации: $CURRENT_DATE"   
CURRENT_DATE=$(date -d "${CURRENT_DATE} +1 день" +%F) Готово

Synced the oplog collection from healthy replica to the failed one

mongodump -d local -c oplog.rs -o /data/dump/oplog  

scp -rp /data/dump/ <username>@<hostname>:/data/dump/ 

mongorestore -vvvv -d local --port=27018 / вывод данных/

Restored the initial config on failed node.

The logs during the failure (logs.txt)

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

logs.txt
Feb 28 2022 12:53:48 PM UTC
31 kB
_anush.chinoian@gmail.com
mongod.conf
Feb 28 2022 12:54:06 PM UTC
0.9 kB
_anush.chinoian@gmail.com
update.txt
Feb 28 2022 12:52:53 PM UTC
0.6 kB
_anush.chinoian@gmail.com

Assignee:: Edwin Zhou
Reporter:: Anush Chinoian
Participants:: Anush Chinoian, Dmitry Agranat, Edwin Zhou
Votes:: 0 Vote for this issue
Watchers:: 3 Start watching this issue

Created:: Feb 28 2022 12:56:33 PM UTC
Updated:: Jun 02 2022 03:53:59 PM UTC
Resolved:: Mar 29 2022 05:08:03 PM UTC

Details

Description

Attachments

Attachments

Activity

People

Dates