Loading...

XML

Word

Printable

JSON

Type: Question
Resolution: Incomplete
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
None

CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

ISSUE SUMMARY
Customer had a high load on the cluster and started having replication lag. The
Node 00-01 has been in a DOWN state from a day.

Initial issue was the node was in an infinite restart loop and The snapshot process started 2021-06-11 at 07:22 and ran until 23:13. And, the node encountered errors when starting up. The CoE noticed that there is a "duplicate key" issue prior to node replacement.

At this point CoE has restarted Node 01 up. However replication lag is continuing to increase on the secondary member 00-01. The workload has been reduced by the customer however, replication lag is not catching up. 15,360 IOPS available, there is high latency and queueing on the disk, despite the volume of IOPS not exceeding 1000

USER IMPACT
The replication lag on the Secondary is impacting production for customer.

Assignee:: Unassigned
Reporter:: Subha Arunachalam
Participants:: Prachi Shirodkar, Subha Arunachalam
Votes:: 0 Vote for this issue
Watchers:: 2 Start watching this issue

Created:: Jun 12 2021 08:17:19 PM UTC
Updated:: Jun 12 2021 09:25:55 PM UTC
Resolved:: Jun 12 2021 09:24:22 PM UTC

Details

Description

Attachments

Activity

People

Dates