[SERVER-4758] OplogReader has no socket timeout Created: 24/Jan/12  Updated: 06/Feb/17  Resolved: 06/Jun/12

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 2.0.6, 2.1.2

Type: Bug Priority: Major - P3
Reporter: Kristina Chodorow (Inactive) Assignee: Eric Milkie
Resolution: Done Votes: 2
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Duplicate
is duplicated by SERVER-4918 Lower replica set reader timeout (or ... Closed
Related
is related to SERVER-6733 Make oplog timeout shorter Closed
Operating System: ALL
Participants:

 Description   

It really should to prevent the sync thread from hanging if a server goes down.



 Comments   
Comment by Eric Milkie [ 06/Feb/17 ]

Hi michaelbrenden
I'm sorry you're experiencing difficulties with your replica set. Unfortunately, there is no 6 seconds oplog timeout in the code (and there certainly wasn't one added as part of the work for this ticket back in 2012).

Comment by Michael Brenden [ 06/Feb/17 ]

Now that it's become 6 seconds oplog timeout, a new problem is unwanted oplog timeout when scanning full, huge oplog. https://jira.mongodb.org/browse/SERVER-19605

Comment by Eric Milkie [ 05/Jun/12 ]

Lukas: The situation described in SERVER-4918 does actually happen with that version.

Comment by Lukas Krecan [ 05/Jun/12 ]

Sorry for late reaction, but does it really mean that if I have setup described in SERVER-4918 the replica set will not accept writes for 10 minutes?

Comment by auto [ 16/May/12 ]

Author:

{u'login': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: backport - fix handling of socket timeouts on Windows SERVER-4758
Branch: v2.0
https://github.com/mongodb/mongo/commit/dfbf91e16a8686cac7b58756e4fc63c88d874ec6

Comment by auto [ 15/May/12 ]

Author:

{u'login': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-4758 raising timeout to 10 minutes for safety
Branch: v2.0
https://github.com/mongodb/mongo/commit/dbd87d09fd95c10d37bcc8b875e7770043fe219d

Comment by auto [ 15/May/12 ]

Author:

{u'login': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-4758 raising timeout to 10 minutes for safety
Branch: master
https://github.com/mongodb/mongo/commit/1b4fc46b1af93fe1b664c174dfe9462e53a4948e

Comment by auto [ 24/Feb/12 ]

Author:

{u'login': u'milkie', u'email': u'milkie@10gen.com', u'name': u'Eric Milkie'}

Message: SERVER-4758 tcp timeout should be at least 60 seconds
Branch: master
https://github.com/mongodb/mongo/commit/0b88db9f59f580e16bd8325f4b6d69b9db6199fb

Comment by Eric Milkie [ 23/Feb/12 ]

If you run individual js files (or subsets of the js files) with "smoke --small-oplog", you can't reproduce the failure. Also, if you reorder the rename#.js files by changing their filenames, the failure case seems to go away or change in odd ways.
But I can reliably get a failure if you run the entire suite.
Also, I tried changing smoke.py to validate the master/slave hashes after every test instead of after all the tests, and it fails pretty quickly, somewhere in the c's. I'm not sure if this is relevant to this, though.

http://buildbot.mongodb.org:8081/builders/Linux%2064-bit%20DEBUG/builds/793/steps/test_1/logs/stdio

Comment by Eric Milkie [ 23/Feb/12 ]

I believe this commit broke the smalloplog test suite.

Comment by auto [ 21/Feb/12 ]

Author:

{u'login': u'kchodorow', u'name': u'Kristina', u'email': u'kristina@10gen.com'}

Message: Add oplog reader timeout SERVER-4758
Branch: master
https://github.com/mongodb/mongo/commit/f240a2a9eb376ec9789a562760864c652cc52593

Generated at Thu Feb 08 03:06:53 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.