Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-50749

Re-loading is slow with py-tpcc

    • Type: Icon: Bug Bug
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 4.4.0
    • Component/s: None
    • Storage Engines
    • ALL
    • v5.1
    • Storage - Ra 2021-09-20, Storage - Ra 2021-10-04

      While loading data for py-tpcc flow control is engaged, the insert rate drops and a few inserts take 200 to 300 seconds.

      This is from Percona. They previously gave us the repro for WT-6444 via py-tpcc. In their report the first load into database tpcc1 takes ~20 minutes with a new mongod instance. After sleeping a few minutes and then repeating the load into database tpcc3 the second load takes ~500 minutes. They used a single-node replica set and my repro attempts do the same.

      Part of this is a duplicate of SERVER-46114 which was closed as works as designed. If you read all of the updates below, there is a chance that mongod gets stuck with flow control engaged, an insert statement that never finishes and mongod unable to shutdown. So I don't think works as designed is appropriate.

      Summarizing what I see below in my repro attempts:

      • this problem is new in 4.4.0. I tried but could not reproduce this with 4.2.9.
      • many inserts take more than 5 seconds with 4.4.0 (up to 390 seconds ignoring the hang). No inserts take more than 5 seconds with 4.2.9
      • in one test mongod got stuck. An insert statement was saturating a CPU core but making no progress for 1+ hour. It did not stop after killOp(). Shutting down mongod via "killall mongod" did not stop mongod and eventually I did kill -9.
      • with flow control enabled and 4.4.0 there are stalls (inserts that take 10 to 390 seconds)
      • with flow control disabled and 4.4.0 there are still stalls, but they are not as bad (10 to 60 seconds) as above

      I have ftdc and mongod error logs for most of the results listed below. I can provide them if requested. There are many, so I prefer to do that on demand.

        1. Screen Shot 2020-09-10 at 9.35.27 AM.png
          Screen Shot 2020-09-10 at 9.35.27 AM.png
          83 kB
        2. Screen Shot 2020-09-09 at 6.16.19 PM.png
          Screen Shot 2020-09-09 at 6.16.19 PM.png
          157 kB
        3. mongod.log.440.hang.gz
          19 kB
        4. metrics.2020-09-09T20-02-23Z-00000
          1.56 MB
        5. ftdc.tpcc.440.hang.tar
          1.32 MB
        6. flow1.mo440.tar.gz
          8.12 MB
        7. flow1.mo429.tar.gz
          6.16 MB
        8. flow0.mo440.tar.gz
          8.01 MB
        9. flow0.mo429.tar.gz
          4.32 MB
        10. example.png
          example.png
          320 kB

            Assignee:
            backlog-server-storage-engines [DO NOT USE] Backlog - Storage Engines Team
            Reporter:
            mark.callaghan@mongodb.com Mark Callaghan (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            17 Start watching this issue

              Created:
              Updated:
              Resolved: