Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Gone away
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 3.6.2
Component/s: None
Labels:
None
Environment:
CentOS 7

Operating System:
ALL
Steps To Reproduce:

Hide

It has happened randomly, around 3 times in last 3 weeks: first two with a few hours difference, and on different servers (and I think it happened to primaries at that time), and then now again in a secondary.

Show
It has happened randomly, around 3 times in last 3 weeks: first two with a few hours difference, and on different servers (and I think it happened to primaries at that time), and then now again in a secondary.
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

We have a sharding cluster DB, with 8 shards and each of them using two replica sets + arbiter. Today we had a problem in one of the secondaries server: it suddenly started to use 100% CPU, and did not respond to any query. It remained in that state until restarted.

I'm attaching stack trace from "pstack" in case it helps, it seems most threads are waiting for a lock, except some of them which might be hoarding the locks while consuming all CPU (this server has 2 CPUs): Threads 70, 73 and 83

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

incident.png
467 kB
Jun 13 2018 06:40:53 PM UTC
pstack-03a.txt
519 kB
Jun 13 2018 05:47:02 AM UTC
pstack-03b.txt
255 kB
Jun 12 2018 09:10:35 AM UTC

Assignee:: Bruce Lucas (Inactive)
Reporter:: Isaac Cruz
Participants:: Bruce Lucas, Isaac Cruz, Kelsey Schubert, Laxman P
Votes:: 0 Vote for this issue
Watchers:: 12 Start watching this issue

Created:: Jun 12 2018 09:13:06 AM UTC
Updated:: Oct 27 2023 08:43:23 PM UTC
Resolved:: Oct 26 2018 02:25:58 PM UTC

Details

Description

Attachments

Attachments

Activity

People

Dates