BF-20601 indicates that the `dropRoles` transaction occasionally fails when it is unable to acquire an IX lock on the system.roles collection. This appears to occur exclusively in concurrency workloads with lock-free reads disabled, which involve numerous clients attempting to perform UMCs simultaneously. When lock-free reads are disabled, an S lock is taken on the system.roles collection every time a user needs to be loaded into memory from disk and every time the `usersInfo` command is run. As a result, all writes on the system.roles collection are not permitted during these periods to preserve consistency. It is likely that the `dropRoles` failures occurred due to concurrently-running operations that kept locking the system.roles collection.
There are a couple of ways we can reduce the likelihood of this contention in highly-concurrent test workloads.
- The default lock timeout is 5ms, which is likely sufficient for production workloads but too low for overloaded servers. We can try increasing it to a different default (i.e., 25ms) when testing commands are enabled.
- While the 'S' lock on the system.roles collection is necessary, it seems like there is a slight opportunity to reduce the size of the critical section during which it is held. The impact of this may be trivial, but it is likely in our best interest to minimize the time that the locks are held as much as possible.