Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Done
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 4.2.12
Component/s: None
Labels:
- moved-from-last-mile-done

Assigned Teams:

DevProd Performance Infrastructure
Operating System:
ALL
Steps To Reproduce:

Hide

We couldn't find stable reproduce steps.

Show
We couldn't find stable reproduce steps.
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

We have a PSA Replica Set, each data-bearing node has 32 cores, 64GB memory and 3TB SSD. This has been running fine for over two years now, but recently, while the data size keeps growing, we ran into a weird problem, twice in a month:

When high traffic occurred, primary's CPU(we use primaryPrefrred read preference) first went up to around 90%, then drop down to below 50%, and all queries slowed down after the drop.

We have examined systctl params, ulimits params, filesystem configs(XFS, no TPH) , WiredTiger cache usage(arount 80%), disk limits(throughput and IOPS), WiredTiger cache dirty percentage(around %5), etc, but couldn't figure out what's the rational behind the stall. Please help to confirm if this is a bug, or give us a clue on what are we doing working.

See attachments for related FTDC files.

We know version 4.2.12 has been EoL, apologes first if you find this issue is inapposite.

Many Thanks!

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

image-2024-04-16-17-47-39-450.png
170 kB
Apr 16 2024 09:47:42 AM UTC
metrics.2024-04-08T05-37-31Z-00000
9.85 MB
Apr 16 2024 09:48:39 AM UTC
metrics.2024-04-07T20-46-39Z-00000
9.85 MB
Apr 16 2024 09:48:41 AM UTC
metrics.2024-04-08T01-36-39Z-00000
9.83 MB
Apr 16 2024 09:48:41 AM UTC

Assignee:: Chris Kelly
Reporter:: Aaron Wang
Participants:: Aaron Wang, Chris Kelly
Votes:: 0 Vote for this issue
Watchers:: 5 Start watching this issue

Created:: Apr 16 2024 10:00:55 AM UTC
Updated:: Jan 30 2026 01:26:49 PM UTC
Resolved:: Apr 23 2024 05:13:18 PM UTC

Details

Description

Attachments

Attachments

Activity

People

Dates