[SERVER-76991] Create a "kitchen sink" suite Created: 10/May/23  Updated: 04/Jan/24

Status: Open
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Engineering Test Priority: Major - P3
Reporter: Judah Schvimer Assignee: Backlog - Replication Team
Resolution: Unresolved Votes: 0
Labels: former-quick-wins
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-77018 Deadlock between dbStats and 2 index ... Closed
related to SERVER-77962 Investigate automated injection of fa... Investigating
is related to SERVER-76051 Exercise dynamic concurrency adjustme... Closed
is related to SERVER-75262 Add a passthrough test that exercises... Open
is related to SERVER-78028 Add random failures to initial syncs ... Open
Assigned Teams:
Replication
Participants:

 Description   

mongosync has caught a number of server bugs just by being an aggressive load generator in ways our server testing doesn't do. It relaxes a lot of test assertions to be able to run very aggressive load against the server with minimal to no denylisting and then just checks for hangs, crashes, and data inconsistency. I'd love to bring some of this to the server.

Thoughts on the approach:

  • run multiple threads of jscore, concurrency, the mutational fuzzer, and maybe some of our other generational fuzzers in parallel
  • relax test assertions like mongosync does, other than shutdown consistency checking
  • consider turning collection drops into noops (maybe only a random portion of the time)
  • consider overriding namespace generation similar to mongosync to get more data collisions.
  • run it in sharded clusters with balancing enabled, ensuring that cross-shard transactions and balancing in fact occur
  • run stepdowns, terminates, and node killing in the background
  • ensure we get good coverage of resharding collections (namespace collisions should pick this up given our resharding concurrency workloads
  • run this with the config fuzzer so that we turn various different knobs along the way
  • consider periodically flipping the FCV and binary version if possible.
  • see if there are any failpoints to flip on and off in the background to pause at interesting points, but only ones that could exacerbate unfortunate timing.

This would then be a good workload to run under Antithesis.



 Comments   
Comment by Judah Schvimer [ 12/Jun/23 ]

This would go well with SERVER-77962.

Comment by Judah Schvimer [ 10/May/23 ]

Other things to potentially include:

  • adding and removing nodes (with FCBIS and logical initial sync)
  • adding and removing shards
  • movePrimary
  • different authorization methods
  • multi-tenancy on and off with data going to multiple tenants
Comment by Judah Schvimer [ 10/May/23 ]

The goal of this suite would really be "if a crash, deadlock, or data inconsistency could occur, this suite should be able to catch it". And then over time we can try to make that happen more quickly after we commit any bug.

Generated at Thu Feb 08 06:34:13 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.