|
I don't believe this is a new issue or related to changes from the RSM. The 20 - 30s latency is measured internally on a single config server, and doesn't include any network communication. There's evidence in the BF that it could be the flow control, which I was hoping to either confirm or rule out with this ticket. SERVER-45880 is an existing flow control ticket that may explain what is happening if that is the case.
The portion of the test that is causing the latency spikes is the "setup" portion where we are inserting up to 1M new collections into the catalog from 32 threads so that we can test reads and writes with this large amount of metadata. I checked much older builds from the beginning of March where the build is green, and we see latency spikes in the 10s of seconds, rather than the 20 - 30 we see now. So the workload was showing early signs of eventually hitting the timeout value of 30s, and this single node has definitely slowed over time. I think we should see similar performance impact on a single node replica set test with similar batching/document size parameters. But as it stands now we aren't able to measure the performance with large catalogs due to this error in the test setup.
|