[SERVER-9853] Ghostsyncing does not retry after a network failure Created: 03/Jun/13  Updated: 11/Jul/16  Resolved: 10/Jun/13

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 2.4.3, 2.5.0
Fix Version/s: 2.4.5, 2.5.1

Type: Bug Priority: Major - P3
Reporter: Eric Milkie Assignee: Eric Milkie
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
Operating System: ALL
Participants:

 Description   

It needs to try to reconnect immediately.
Non-ghostsync updates do not have this problem because the updater runs in a loop and keeps trying until it succeeds in transmitting the oplog position update.
The current Ghostsync code gives up if there is a network failure and doesn't get resolved until the next write is applied. This is 'ok' if there are lots of writes, but pretty bad if there are no writes for a while (because write concern on the primary will not see it).

This logic will be better solved by SERVER-6071.



 Comments   
Comment by auto [ 20/Jun/13 ]

Author:

{u'username': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-9853 avoid Windows test shutdown issues by stopping the test explicitly
Branch: v2.4
https://github.com/mongodb/mongo/commit/073bfa9db6461c27c4d19135ef256c5627b4cdbc

Comment by auto [ 20/Jun/13 ]

Author:

{u'username': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-9853 avoid Windows test shutdown issues by stopping the test explicitly
Branch: master
https://github.com/mongodb/mongo/commit/021b9bb6946bc7f07f88c176a081e53b2db36e49

Comment by auto [ 20/Jun/13 ]

Author:

{u'username': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-9853 retry ghostsync percolate if socket failure

Without this fix, depending on exact timing of upstream socket failure, you could lose
a chaining upstream oplog position update from a secondary. This might then cause
a write with a certain level of write concern to time out even though enough secondaries
had applied the write. This problem would be masked by a sufficient number of subsequent
writes, which would trigger another upstream oplog position update.

Note that this only affects chaining; nonchaining updates are already retrying when a socket
failure occurs.

Conflicts:

src/mongo/db/repl/rs_sync.cpp
Branch: v2.4
https://github.com/mongodb/mongo/commit/b82d99d5b8b2de63f9bfa5928793285526a81986

Comment by auto [ 10/Jun/13 ]

Author:

{u'username': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-9853 retry ghostsync percolate if socket failure

Without this fix, depending on exact timing of upstream socket failure, you could lose
a chaining upstream oplog position update from a secondary. This might then cause
a write with a certain level of write concern to time out even though enough secondaries
had applied the write. This problem would be masked by a sufficient number of subsequent
writes, which would trigger another upstream oplog position update.

Note that this only affects chaining; nonchaining updates are already retrying when a socket
failure occurs.
Branch: master
https://github.com/mongodb/mongo/commit/5e02976117c3ea90318326670cba51975b957ca9

Generated at Thu Feb 08 03:21:38 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.