In stepdown suites with catchUpTimeoutMillis: 0, a node may enter ROLLBACK state after a ContinuousStepdown cycle. _await_primaries() (stepdown.py:325) only confirms a primary exists before handing control to subsequent hooks, it does not wait for all nodes to exit ROLLBACK. validate_node() (validate.py:187) then connects to each node via directConnection=true and calls list_database_names(), which fails with NotPrimaryOrSecondary (13436) on a mid-rollback node. The bare except: treats this as a validation failure.
I believe since rollback is not that common in steady state, this failure is otherwise rare, but I think this is the mechanism behind the linked BF (which is closed because it