Details
-
Improvement
-
Resolution: Unresolved
-
Major - P3
-
None
-
None
-
None
-
Catalog and Routing
-
3
Description
I realized that our test coverage is not enough to spot bugs caused by the introduction of backward incompatible metadata.
For instance, adding a phase to a sharding DDL coordinator that is not recognized by previous versions.
The ideal solution would be to add a suite, both for core-passthrough and FSMs, in which we continuously perform the full upgrade/downgrade procedure. Both FCV and binary change. This proposal is tracked by PM-3219.
On the other side there is another easier and intermediate solution that would allow us to catch most of those backward incompatibility bugs.
In fact, we could simply run a sharding continuous stepdown suites (e.g. concurrency_sharded_with_stepdowns) and running it in a implicit multiversion variant (e.g. Enterprise RHEL 8.0 (implicit multiversion & all feature flags))
By causing elections in these variants, we will implicitly make the coordinator node of DDL operations to change binaries while the operation is ongoing. Allowing us to spot possible backward incompatible bugs.
We already have sharded_retryable_writes_downgrade that should cover the core tests, but we are missing the counterpart for concurrency tests.