Status: Needs Scheduling
Priority: Major - P3
Affects Version/s: None
Fix Version/s: None
For high volumes of short-lived reads and writes, SERVER-55030 showed there's some overhead due to the custom "spin lock" used during snapshot creation: https://github.com/wiredtiger/wiredtiger/blob/322951cb18905cdea2ae3004906c8e8e4e27462a/src/txn/txn.c#L262-L285
It contends with the transaction ID allocation here: https://github.com/wiredtiger/wiredtiger/blob/ca27d1c1f1c616bf016d0e3854a59b91a5dec908/src/include/txn_inline.h#L1224-L1229
The performance degradation is around 4% throughput loss for the 50read50update YCSB workload using secondary reads, with 32 threads on a 16 CPU cluster, compared to serializing all snapshot creations with an explicit mutex.
My understanding is that if an allocating thread gets scheduled out, it could take a long time for it to resume execution because all threads creating a snapshot will be spinning on that loop and consuming all available CPUs.
Instead of relying on WT_PAUSE, add an explicit backoff strategy that schedules out the blocked threads so that the allocating threads can make progress. SERVER-55030 showed that a simple version of this strategy removes the regression for the affected workload.
Another alternative could be to create the snapshot on a single thread and share it with all concurrent snapshot creations.