[SERVER-67009] _configsvrCommitChunkMigration keeps retrying to do a local write after stepdown Created: 03/Jun/22  Updated: 11/Jul/22  Resolved: 13/Jun/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 6.1.0-rc0
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Jordi Serra Torrens Assignee: Jack Mulrow
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File 0001-Repro-SERVER-67009.patch    
Issue Links:
Depends
Duplicate
duplicates SERVER-67016 Transaction API transactions should b... Closed
Problem/Incident
is caused by SERVER-65836 Change applyOps for internal transact... Closed
Related
is related to SERVER-67016 Transaction API transactions should b... Closed
Operating System: ALL
Steps To Reproduce:

./buildscripts/resmoke.py run --storageEngine=wiredTiger --storageEngineCacheSizeGB=.50 --suite=sharding --log=file jstests/sharding/repro-server-67009.js

Participants:
Linked BF Score: 35

 Description   

SERVER-65836 changed the way chunk migration commit modifies the config.chunks documents: Now it is done using the internal transactions API. This uses SyncTransactionWithRetries, which will retry to commit the transaction until it succeeds. This goes directly to the service_entry_point. In case of stepdown, this will keep retrying locally and getting the same NotPrimary errors repeatedly, without returning the error to the caller node. Returning to the caller node is desirable to give it the change to retarget the write to the proper configsvr primary.

_configsvrCommitChunkMigration should immediately fail and return the error to the client instead.



 Comments   
Comment by Jack Mulrow [ 13/Jun/22 ]

This should have been fixed by SERVER-67016, so I'm closing this as a duplicate of that ticket.

Comment by Marcos José Grillo Ramirez [ 03/Jun/22 ]

Assigning it to jack.mulrow@mongodb.com, he'll be doing some changes in how the transaction API handles interruptions, which should fix this issue.

Comment by Jordi Serra Torrens [ 03/Jun/22 ]

I think split, merge and removeShard might be also affected by this same situation.

Generated at Thu Feb 08 06:07:02 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.