ReplSetTest .stepUp() retires again if replSetStepUp cmd fails. Retries results in calling awaitReplication() again which requires a primary to be present for the replica set. And, we can't hold that guarantee (a primary will be present for every retries) if a test run with high election timeout (24 hrs).
Consider a scenario where a primary stepped down during the first failed attempt of replSetStepUp cmd. So, the replica set won't be having a primary going forward since the test runs with high election timeout. This would result the retry awaitReplication() step to be stuck waiting for the primary.
Previously, when reconstruct_prepared_transactions_initial_sync.js had a similar issue, I fixed it by making the jstest to use stepUpNoAwaitReplication instead of ReplSetTest .stepUp() (see
SERVER-48778). Now, retryable_commit_transaction_after_failover.js also failed due to the above mentioned issue. Revalidating SERVER-48778 fix makes me to realize that these steps are not necessary to be retried on failure of replSetStepUp cmd. Since replSetStepUp cmd is wrapped up in the assert.soon(), we really don't need to call those steps on every retries.