[SERVER-2578] Replica Set Sync (rs_sync) operation not closing correctly Created: 17/Feb/11  Updated: 12/Jul/16  Resolved: 28/Feb/11

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 1.6.5, 1.7.5, 1.7.6
Fix Version/s: 1.9.0

Type: Bug Priority: Minor - P4
Reporter: Gaetan Voyer-Perrault Assignee: Kristina Chodorow (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Ubuntu


Attachments: File test_repl_set_bad_op.sh    
Issue Links:
Depends
Related
Operating System: Linux
Participants:

 Description   

The basic premise:

  • We're finding useless operations in the db.currentOp() when using replica sets.
  • The useless ops are in the secondary nodes.
  • The useless ops appear to be tied to "Replica Set Syncing".

Problems:
0. The useless ops exist.
1. The useless ops are "un-killable". However, bouncing mongod seems to make them go away.
2. When a replica does a complete re-sync, the useless op contains a seconds running.

#2 causes issues when watching for long running queries.

Reproduction:

  • See attachment
  • Reproducing is really as simple as forcing a re-sync on a dead node (stop mongod, delete data, re-start)


 Comments   
Comment by Jason McCay [ 30/Mar/12 ]

Just noticed this with one of our replica sets ... M1.Larges on AWS running MongoDB 2.0.3rc1. Very little activity on the primary ... no activity on the secondary and the secondary showing as being roughly seven hours behind.

Here is the (lone) current Op we are seeing:
http://cl.ly/100O2O1A0K340x292x3S

Similar to the experiences above, the op cannot be killed, so we are going to freeze the secondary and bounce the primary.

I know this is an old ticket, but I found this information via Google and it seemed to match what was described in this ticket. Since these databases are running a recent version of MongoDB, I felt it was relevant to include the info.

Comment by Kristina Chodorow (Inactive) [ 25/Aug/11 ]

Yes, it was not backported to the 1.8 branch. It will be in 2.0.

Comment by kosta giatras [ 25/Aug/11 ]

Does this also affect Version 1.8.3? We have same symptoms as http://groups.google.com/group/mongodb-user/browse_thread/thread/cbd8ddd130b347e2 . This is causing problems with our Nagios and Long running ops which becomes useless with this bug.

thank you

kosta

Comment by auto [ 25/Feb/11 ]

Author:

{u'login': u'kchodorow', u'name': u'Kristina', u'email': u'kristina@10gen.com'}

Message: finish curop in helpers SERVER-2578
https://github.com/mongodb/mongo/commit/97d4c5c7effb0c5fdf3c87affbba4ad6d6d2d077

Comment by Eliot Horowitz (Inactive) [ 25/Feb/11 ]

kristina put the fix in master, put a comment here when its done and we can look at patch.
that should be the general procedure

Comment by Kristina Chodorow (Inactive) [ 25/Feb/11 ]

The Helpers::putSingleton method is not calling done() on the current op. Should the fix go into 1.8.0 or wait?

Generated at Thu Feb 08 03:00:26 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.