Add generic testing for stale mongoses

XMLWordPrintableJSON

    • Type: Task
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Cluster Scalability
    • 🟩 Routing and Topology
    • None
    • None
    • None
    • None
    • None
    • None

      Having a stale mongos can lead to misrouted writes that would result in a reroute after a sharding metadata refresh. The code path is quite complex depending on the kinds of data movements have occurred.

      We have some explicitly testing for stale mongoses in some jstests/sharding, e.g. by using one mongos to perform one or more data movement operations and use another mongos to perform CRUD after. For example:

      jstests/sharding/batched_writes_with_id_without_shard_key_stale_config.js
      jstests/sharding/configsvr_retries_createindex_on_stale_config.js
      jstests/sharding/deleteOne_with_id_without_shard_key_stale_config.js
      jstests/sharding/drop_indexes_with_stale_config_error.js
      jstests/sharding/mongos_exhausts_stale_config_retries.js
      jstests/sharding/move_stale_mongos.js
      jstests/sharding/read_pref_multi_mongos_stale_config.js
      jstests/sharding/resharding_collection_cloner_stale_config_retry.js
      jstests/sharding/shard_router_handle_staleconfig.js
      jstests/sharding/sharded_index_consistency_metrics_stale_version_retries.js
      jstests/sharding/split_stale_mongos.js
      jstests/sharding/stale_config_error_for_direct_shard_operation.js
      jstests/sharding/stale_config_for_router_role.js
      jstests/sharding/stale_mongos_and_restarted_shards_agree_on_shard_version.js
      jstests/sharding/stale_version_write.js
      jstests/sharding/transactions_stale_database_version_errors.js
      jstests/sharding/transactions_stale_shard_version_errors.js
      jstests/sharding/updateOne_with_id_without_shard_key_stale_config.js
      

      However, that unlikely covers all the possible cases. We specify "num_mongos: 2" in 31 of 32 sharding concurrency and jscore passthrough suites. However, the shell just connects to the mongoses provided in the connection string in the round-robin fashion. So each of mongoses is unlikely to be very stale at any given point. This ticket is investigate if it is possible to make the shell connect to some mongoses significantly less than the other mongoses to induce the staleness, and then modify some of the suites to do that. 

            Assignee:
            Unassigned
            Reporter:
            Cheahuychou Mao
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: