Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-58873

Compaction procedure for small collection takes a significant amount of time

    • Type: Icon: Question Question
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Labels:
    • Storage Engines
    • 8

      Hi there, I have some questions about the compaction procedure timing.

      Our pre-requisites was:

      1. We started compaction first on our smallest collection, let's say collectionX size of

       

      "size" : 9624660994, 
      "count" : 22641357, 
      "avgObjSize" : 425, 
      "storageSize" : 13472161792, 
      "freeStorageSize" : 10151288832
      

       

      Also, we pre-checked the potential amount of disk resource we can free:

      db.collectionX.stats().wiredTiger["block-manager"]["file bytes available for reuse"] 9945591808

       

      So, with that in mind we started compaction by running this command:

      db.runCommand({compact: "collectionX"}) 
      source: host04:27017 syncedTo: Fri Jul 16 2021 00:23:20 GMT+0300 (MSK) 
      45 secs (0.01 hrs) behind the primary

       

      At pretty close to the same time we saw this message in mongo.log file:

      {"t":{"$date":"2021-07-16T00:23:20.875+03:00"},"s":"I", "c":"REPL", "id":21350, "ctx":"conn3866106","msg":"Going into maintenance mode","attr":{"otherMaintenanceModeTasksInProgress":0}} 
      
      {"t":{"$date":"2021-07-16T00:23:20.876+03:00"},"s":"I", "c":"REPL", "id":21358, "ctx":"conn3866106","msg":"Replica set state transition","attr":{"newState":"RECOVERING","oldState":"SECONDARY"}} 
      
      {"t":{"$date":"2021-07-16T00:23:20.911+03:00"},"s":"I", "c":"COMMAND", "id":20284, "ctx":"conn3866106","msg":"Compact begin","attr":{"namespace":"collectionX"}}
      

       

      That's exactly the time when compaction started, and then
      ......

      {"t":{"$date":"2021-07-17T14:01:15.957+03:00"},"s":"I", "c":"STORAGE", "id":22430, "ctx":"conn3866106","msg":"WiredTiger message","attr":{"message":"[1626519675:957490][104617:0x7fcaf60e0700], file:index-collectionX.wt, WT_SESSION.compact: [WT_VERB_COMPACT_PROGRESS] Compact running for 2253 seconds; reviewed 787078 pages, skipped 638436 pages, wrote 148642 pages"}}
      
      {"t":{"$date":"2021-07-17T14:03:33.200+03:00"},"s":"I", "c":"STORAGE", "id":22430, "ctx":"conn3866106","msg":"WiredTiger message","attr":{"message":"[1626519813:200226][104617:0x7fcaf60e0700], WT_SESSION.compact: [WT_VERB_CHECKPOINT_PROGRESS] saving checkpoint snapshot min: 25385578866, snapshot max: 25385578866 snapshot count: 0, oldest timestamp: (1626384195, 1628) , meta checkpoint timestamp: (1626384200, 1628)"}}
      

      ......

      [COMPACT_PROGRESS] Compact running for 2253 seconds; reviewed 787078 pages, skipped 638436 pages, wrote 148642 pages"}}
      
      {"t":{"$date":"2021-07-17T14:05:53.371+03:00"},"s":"I", "c":"STORAGE", "id":20286, "ctx":"conn3866106","msg":"Compact end","attr":{"namespace":"collectionX","freedBytes":26708127744}}

       

      It took smth like 38 hours to get the job done for this collection and indexes and additional 25 hours to catch up with the master.

      We haven't seen any anomalies or spikes in CPU/MEM or disk usage consumption during compaction, just some asserts in our monitoring, but not crucially, especially compared to average workday metrics.

      As for the result - it's pretty amazing, the final size of our collection on disk became 3204157440 (4 times less than we had had before compaction).

      But the main question is - is that a normal time for compaction procedure for this collection size(~13G)?

      How can we calculate the potential amount of time compaction may last?

      Maybe we can do it based on some objectives or average load metrics/total collections size on disks/whatever?

      Because we were slightly frustrated about a 38h compaction for such a small amount of disk space.

            Assignee:
            backlog-server-storage-engines [DO NOT USE] Backlog - Storage Engines Team
            Reporter:
            haltandcatchfire91@gmail.com Basil Markov
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated: