[SERVER-22563] Establishing remote cursor during rollback needs retries Created: 10/Feb/16  Updated: 31/May/18  Resolved: 30/May/18

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 4.0.0, 4.1.1

Type: Bug Priority: Major - P3
Reporter: Spencer Brody (Inactive) Assignee: Vesselina Ratcheva (Inactive)
Resolution: Done Votes: 0
Labels: rbfz, rollback-optional
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: HTML File out_fassert_in_sync    
Issue Links:
Backports
Depends
is depended on by SERVER-21051 Add a suite that runs existing shardi... Closed
Related
related to SERVER-20711 Fatal assertion in secondary when syn... Closed
is related to SERVER-35004 Add functionality to only fail specif... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.0
Sprint: Repl 2018-05-07, Repl 2018-05-21, Repl 2018-06-04
Participants:
Linked BF Score: 27

 Description   

Currently a transient network error during rollback can result in the secondary fasserting. We should retry all network operations during rollback at least once or twice so we can avoid shutting down the server unless absolutely necessary.



 Comments   
Comment by Githook User [ 31/May/18 ]

Author:

{'username': 'vessy-mongodb', 'name': 'Vesselina Ratcheva', 'email': 'vesselina.ratcheva@10gen.com'}

Message: SERVER-22563 amend uassert code
Branch: v4.0
https://github.com/mongodb/mongo/commit/630487d850427ee41994128d41aaa19abd84cb12

Comment by Githook User [ 31/May/18 ]

Author:

{'username': 'vessy-mongodb', 'name': 'Vesselina Ratcheva', 'email': 'vesselina.ratcheva@10gen.com'}

Message: SERVER-22563 amend uassert code
Branch: master
https://github.com/mongodb/mongo/commit/d425fb810663b0f898c32a851807f3e021327bce

Comment by Vesselina Ratcheva (Inactive) [ 30/May/18 ]

Author:

{'username': 'vessy-mongodb', 'name': 'Vesselina Ratcheva', 'email': 'vesselina.ratcheva@10gen.com'}

Message: SERVER-22653 Retry establishing remote cursor during rollback

(cherry picked from commit 2b80b2eaa262a83d5a177bede9631c3a62682760)
Branch: v4.0
https://github.com/mongodb/mongo/commit/5a54158baef7d3609526f438d9dab56b3a662344

Comment by Vesselina Ratcheva (Inactive) [ 30/May/18 ]

Author:

{'username': 'vessy-mongodb', 'name': 'Vesselina Ratcheva', 'email': 'vesselina.ratcheva@10gen.com'}

Message: SERVER-22653 Retry establishing remote cursor during rollback
Branch: master
https://github.com/mongodb/mongo/commit/b65f5147a512cd57a323eb5853a503d8cf1bb557

Comment by Spencer Brody (Inactive) [ 10/Feb/16 ]

esha.maharishi hit this while working on SERVER-21051, which runs the sharding suite while the config server replica set is probabilistically dropping messages. Attaching the logs of the test that hit it.

Relevant section:

[js_test:read_pref_cmd] 2016-02-10T14:07:05.112-0500 c15515| 2016-02-10T14:07:05.112-0500 I NETWORK  [rsBackgroundSync] Socket recv() timeout  127.0.1.1:15514
[js_test:read_pref_cmd] 2016-02-10T14:07:05.112-0500 c15515| 2016-02-10T14:07:05.112-0500 I NETWORK  [rsBackgroundSync] SocketException: remote: (NONE):0 error: 9001 socket exception [RECV_TIMEOUT] server [127.0.1.1:15514]
[js_test:read_pref_cmd] 2016-02-10T14:07:05.112-0500 c15515| 2016-02-10T14:07:05.112-0500 I NETWORK  [rsBackgroundSync] DBClientCursor::init call() failed
[js_test:read_pref_cmd] 2016-02-10T14:07:05.112-0500 c15515| 2016-02-10T14:07:05.112-0500 E REPL     [rsBackgroundSync] InvalidSyncSource: remote oplog empty or unreadable
[js_test:read_pref_cmd] 2016-02-10T14:07:05.112-0500 c15515| 2016-02-10T14:07:05.112-0500 I REPL     [rsBackgroundSync] rollback finished
[js_test:read_pref_cmd] 2016-02-10T14:07:05.113-0500 c15515| 2016-02-10T14:07:05.112-0500 I -        [rsBackgroundSync] Fatal assertion 28723 UnrecoverableRollbackError: need to rollback, but unable to determine common point between local and remote oplog: InvalidSyncSource: remote oplog empty or unreadable @ 18752
[js_test:read_pref_cmd] 2016-02-10T14:07:05.113-0500 c15515| 2016-02-10T14:07:05.112-0500 I -        [rsBackgroundSync]
[js_test:read_pref_cmd] 2016-02-10T14:07:05.113-0500 c15515|
[js_test:read_pref_cmd] 2016-02-10T14:07:05.113-0500 c15515| ***aborting after fassert() failure

Generated at Thu Feb 08 04:00:45 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.