[SERVER-10274] repl writer worker clients in op list Created: 21/Jul/13  Updated: 10/Dec/14  Resolved: 25/Jul/13

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 2.4.5
Fix Version/s: None

Type: Question Priority: Major - P3
Reporter: Nic Cottrell (Personal) Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

rhel6


Attachments: Text File currentOp-primary.txt     Text File currentOp-secondary.txt     PDF File mongod master.pdf     PDF File mongod slave.pdf    
Participants:

 Description   

I have a 3-server replica set with 1 arbiter.

On the slave I see "repl writer worker" 1-4 all with an "X" in the locking column. Right now RWW 1 and 2 have 20 and 30 seconds respectively. 3-4 have 1271310 as the secs running.

On the master I see repl writer worker 1-6 and all are marked as locking. Also on the master are rsSync and rsSyncNotifier and both list "index: (3/3) btree-middle" in the "msg" column and have done for days now.

Is this some sort of problem, or is this just some new debug data that has been added in 2.4.5? (I don't remember ever seeing this in previous 2.4.x versions).



 Comments   
Comment by Daniel Pasette (Inactive) [ 25/Jul/13 ]

The repl write workers stick around even when the node is primary – they just aren't doing anything. It's normal. To see ALL ops that are listed in the web console, pass "true" to currentOp: db.currentOp(true).

Comment by Nic Cottrell (Personal) [ 25/Jul/13 ]

Well, if the repl workers are supposed to be showing up in the the op list like that, this I guess this is just a "working as designed" ticket...

Comment by Daniel Pasette (Inactive) [ 25/Jul/13 ]

yes, background flush times and iowait look a bit high on the secondary, but they appear to be coming down since the july 21st. your disks are probably a bit overworked moving MySQL instance will probably help more. Do you have a specific question or bug to report?

Comment by Nic Cottrell (Personal) [ 23/Jul/13 ]

MMS looks ok? I'm panicing about the huge iowait on the secondary (averaging 30%). I think the machine just has too little RAM (40GB) and it running a MySQL instance too. Will move that away soon and hopefully it will recover. Since setting up the replica my unit tests take twice as long so I'm a bit confused.

I've attached the currentOp outputs, so hopefully that sheds some light...

Comment by Daniel Pasette (Inactive) [ 23/Jul/13 ]

The time running can be incorrectly reported because of SERVER-4740 (See SERVER-2886 for better explanation).

From looking at your cluster in MMS, it looks healthy. I think that the output of the old ops in the web view are just not being cleared out. Can you attach the output of db.currentOp() from primary and secondary to the ticket?

Generated at Thu Feb 08 03:22:45 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.