While loading data for py-tpcc flow control is engaged, the insert rate drops and a few inserts take 200 to 300 seconds.
This is from Percona. They previously gave us the repro for
WT-6444 via py-tpcc. In their report the first load into database tpcc1 takes ~20 minutes with a new mongod instance. After sleeping a few minutes and then repeating the load into database tpcc3 the second load takes ~500 minutes. They used a single-node replica set and my repro attempts do the same.
Part of this is a duplicate of
SERVER-46114 which was closed as works as designed. If you read all of the updates below, there is a chance that mongod gets stuck with flow control engaged, an insert statement that never finishes and mongod unable to shutdown. So I don't think works as designed is appropriate.
Summarizing what I see below in my repro attempts:
- this problem is new in 4.4.0. I tried but could not reproduce this with 4.2.9.
- many inserts take more than 5 seconds with 4.4.0 (up to 390 seconds ignoring the hang). No inserts take more than 5 seconds with 4.2.9
- in one test mongod got stuck. An insert statement was saturating a CPU core but making no progress for 1+ hour. It did not stop after killOp(). Shutting down mongod via "killall mongod" did not stop mongod and eventually I did kill -9.
- with flow control enabled and 4.4.0 there are stalls (inserts that take 10 to 390 seconds)
- with flow control disabled and 4.4.0 there are still stalls, but they are not as bad (10 to 60 seconds) as above
I have ftdc and mongod error logs for most of the results listed below. I can provide them if requested. There are many, so I prefer to do that on demand.