[SERVER-44752] Big CPU performance regression between 4.0.13 and 4.2.1 (on deletion it seems), about 16 times Created: 20/Nov/19  Updated: 08/Feb/20  Resolved: 08/Feb/20

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: 4.2.1
Fix Version/s: 4.2.3

Type: Bug Priority: Major - P3
Reporter: Adrien Jarthon Assignee: Bruce Lucas (Inactive)
Resolution: Duplicate Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File cpu 4.2.png     PNG File cpu-4.0.png     PNG File mongo-4.2-regression.png     PNG File workload.png    
Issue Links:
Duplicate
duplicates SERVER-44991 Performance regression in indexes wit... Closed
Related
Operating System: ALL
Participants:
Case:

 Description   

Hi, it's me again with another big performance regression
I recently tried upgrading to 4.2 to try the IO Watchdog and unfortunately found another big performance regression as I usually do when upgrading. With my same cron which allowed me to detect SERVER-35958, running at 11am UTC and doing a lot of insert and delete in one collection (roughly same number every day), I noticed this mongo CPU load difference:

4.0.13 4.2.1
Oct 17th 12pm UTC+1 Oct 19th 12pm UTC+1

Mind the scale here, the average CPU usage went from ~12% to ~15% (not a big deal as I noticed a decreased in disk IO write so I don't consider this a regression) But the spike in the middle during this cron, went from from +10% to +40% and the duration went from 30 minutes to ~2h, which means 16 times more CPU time generated by this script !

By looking at the log of the mongo requests this cron makes looking for the reason of the time increase I noticed there were no more calls than usual but the deletes were taking much longer, about 2s with 4.2.1 instead of 0.1s with 4.0.13, also they are getting slower and slower it seems. The script is doing roughly 300k inserts one after the other, interleaved with 5k deletes, all in the same collection. These are big delete though, containing up to 1000 IDs each.

> zcat cron-4.2.log.gz | fgrep delete | wc -l
5522
> zcat cron-4.2.log.gz | fgrep insert | wc -l
282192
> zcat cron-4.0.log.gz | fgrep delete | wc -l
5530
> zcat cron-4.0.log.gz | fgrep insert | wc -l
282860

> zcat cron-4.0.log.gz | fgrep delete | fgrep SUCCEEDED | head
D, [2019-11-17T11:05:09.753261 #2934] DEBUG -- : MONGODB | db2.updn.io:27017 | updown.delete | SUCCEEDED | 0.073s
D, [2019-11-17T11:05:12.168360 #2934] DEBUG -- : MONGODB | db2.updn.io:27017 | updown.delete | SUCCEEDED | 0.074s
D, [2019-11-17T11:05:14.504066 #2934] DEBUG -- : MONGODB | db2.updn.io:27017 | updown.delete | SUCCEEDED | 0.083s
D, [2019-11-17T11:05:16.813694 #2934] DEBUG -- : MONGODB | db2.updn.io:27017 | updown.delete | SUCCEEDED | 0.065s
D, [2019-11-17T11:05:19.014325 #2934] DEBUG -- : MONGODB | db2.updn.io:27017 | updown.delete | SUCCEEDED | 0.072s
D, [2019-11-17T11:05:21.253102 #2934] DEBUG -- : MONGODB | db2.updn.io:27017 | updown.delete | SUCCEEDED | 0.077s
D, [2019-11-17T11:05:23.437944 #2934] DEBUG -- : MONGODB | db2.updn.io:27017 | updown.delete | SUCCEEDED | 0.074s
D, [2019-11-17T11:05:25.730171 #2934] DEBUG -- : MONGODB | db2.updn.io:27017 | updown.delete | SUCCEEDED | 0.088s
D, [2019-11-17T11:05:27.930548 #2934] DEBUG -- : MONGODB | db2.updn.io:27017 | updown.delete | SUCCEEDED | 0.065s
D, [2019-11-17T11:05:30.302693 #2934] DEBUG -- : MONGODB | db2.updn.io:27017 | updown.delete | SUCCEEDED | 0.069s
 
> zcat cron-4.0.log.gz | fgrep delete | fgrep SUCCEEDED | tail
D, [2019-11-17T11:30:31.186904 #2934] DEBUG -- : MONGODB | db2.updn.io:27017 | updown.delete | SUCCEEDED | 0.089s
D, [2019-11-17T11:30:31.592353 #2934] DEBUG -- : MONGODB | db2.updn.io:27017 | updown.delete | SUCCEEDED | 0.113s
D, [2019-11-17T11:30:31.999389 #2934] DEBUG -- : MONGODB | db2.updn.io:27017 | updown.delete | SUCCEEDED | 0.090s
D, [2019-11-17T11:30:32.364738 #2934] DEBUG -- : MONGODB | db2.updn.io:27017 | updown.delete | SUCCEEDED | 0.104s
D, [2019-11-17T11:30:32.703364 #2934] DEBUG -- : MONGODB | db2.updn.io:27017 | updown.delete | SUCCEEDED | 0.103s
D, [2019-11-17T11:30:33.106916 #2934] DEBUG -- : MONGODB | db2.updn.io:27017 | updown.delete | SUCCEEDED | 0.162s
D, [2019-11-17T11:30:33.475759 #2934] DEBUG -- : MONGODB | db2.updn.io:27017 | updown.delete | SUCCEEDED | 0.096s
D, [2019-11-17T11:30:33.783008 #2934] DEBUG -- : MONGODB | db2.updn.io:27017 | updown.delete | SUCCEEDED | 0.102s
D, [2019-11-17T11:30:34.084768 #2934] DEBUG -- : MONGODB | db2.updn.io:27017 | updown.delete | SUCCEEDED | 0.094s
D, [2019-11-17T11:35:49.991567 #2934] DEBUG -- : MONGODB | db2.updn.io:27017 | updown.delete | SUCCEEDED | 0.145s

> zcat cron-4.2.log.gz | fgrep delete | fgrep SUCCEEDED | head
D, [2019-11-19T11:05:10.667866 #17303] DEBUG -- : MONGODB | db2.updn.io:27017 | updown.delete | SUCCEEDED | 0.121s
D, [2019-11-19T11:05:13.187504 #17303] DEBUG -- : MONGODB | db2.updn.io:27017 | updown.delete | SUCCEEDED | 0.087s
D, [2019-11-19T11:05:16.095652 #17303] DEBUG -- : MONGODB | db2.updn.io:27017 | updown.delete | SUCCEEDED | 0.121s
D, [2019-11-19T11:05:18.996173 #17303] DEBUG -- : MONGODB | db2.updn.io:27017 | updown.delete | SUCCEEDED | 0.153s
D, [2019-11-19T11:05:21.984816 #17303] DEBUG -- : MONGODB | db2.updn.io:27017 | updown.delete | SUCCEEDED | 0.109s
D, [2019-11-19T11:05:25.213349 #17303] DEBUG -- : MONGODB | db2.updn.io:27017 | updown.delete | SUCCEEDED | 0.121s
D, [2019-11-19T11:05:28.302843 #17303] DEBUG -- : MONGODB | db2.updn.io:27017 | updown.delete | SUCCEEDED | 0.122s
D, [2019-11-19T11:05:31.374013 #17303] DEBUG -- : MONGODB | db2.updn.io:27017 | updown.delete | SUCCEEDED | 0.075s
D, [2019-11-19T11:05:33.750134 #17303] DEBUG -- : MONGODB | db2.updn.io:27017 | updown.delete | SUCCEEDED | 0.094s
D, [2019-11-19T11:05:36.268803 #17303] DEBUG -- : MONGODB | db2.updn.io:27017 | updown.delete | SUCCEEDED | 0.073s
 
> zcat cron-4.2.log.gz | fgrep delete | fgrep SUCCEEDED | tail
D, [2019-11-19T12:49:15.333525 #17303] DEBUG -- : MONGODB | db2.updn.io:27017 | updown.delete | SUCCEEDED | 1.529s
D, [2019-11-19T12:49:17.627543 #17303] DEBUG -- : MONGODB | db2.updn.io:27017 | updown.delete | SUCCEEDED | 1.956s
D, [2019-11-19T12:49:18.903869 #17303] DEBUG -- : MONGODB | db2.updn.io:27017 | updown.delete | SUCCEEDED | 0.278s
D, [2019-11-19T12:49:21.606581 #17303] DEBUG -- : MONGODB | db2.updn.io:27017 | updown.delete | SUCCEEDED | 2.404s
D, [2019-11-19T12:49:23.395561 #17303] DEBUG -- : MONGODB | db2.updn.io:27017 | updown.delete | SUCCEEDED | 1.506s
D, [2019-11-19T12:49:24.454508 #17303] DEBUG -- : MONGODB | db2.updn.io:27017 | updown.delete | SUCCEEDED | 0.794s
D, [2019-11-19T12:49:27.017986 #17303] DEBUG -- : MONGODB | db2.updn.io:27017 | updown.delete | SUCCEEDED | 2.242s
D, [2019-11-19T12:49:28.572405 #17303] DEBUG -- : MONGODB | db2.updn.io:27017 | updown.delete | SUCCEEDED | 1.246s
D, [2019-11-19T12:49:31.762715 #17303] DEBUG -- : MONGODB | db2.updn.io:27017 | updown.delete | SUCCEEDED | 2.878s
D, [2019-11-19T12:56:04.510676 #17303] DEBUG -- : MONGODB | db2.updn.io:27017 | updown.delete | SUCCEEDED | 0.077s

I'll upload the full log for the Oct 17th and Oct 19th samples and the diagnostic data as soon as you provide an upload portal.



 Comments   
Comment by Bruce Lucas (Inactive) [ 08/Feb/20 ]

Thanks, good idea. And thanks again for reporting this, and for your patience while we diagnosed and fixed it.

Comment by Adrien Jarthon [ 08/Feb/20 ]

You may want to enter "4.2.3" as Fix Version for people who find this ticket later?

Comment by Bruce Lucas (Inactive) [ 08/Feb/20 ]

Thanks for the update Adrien. I will go ahead and close this ticket then.

Comment by Adrien Jarthon [ 08/Feb/20 ]

Hi again, I put 4.2.3 in production 5 days ago and I'm please to say that the CPU usage during my daily load is down again, back to similar levels than 4.0.13, if not slightly lower. Thanks!

Comment by Bruce Lucas (Inactive) [ 24/Dec/19 ]

Thanks Adrien. Will leave this ticket open in "waiting for user" state until then.

Comment by Adrien Jarthon [ 23/Dec/19 ]

Great, I will not have much time to do testing in the coming weeks so I'm gonna wait for SERVER-44991 to be merged, and if it's not improving the regression I'll investigate more.

Comment by Bruce Lucas (Inactive) [ 23/Dec/19 ]

Thanks for the testing bigbourin@gmail.com. It's good to hear that 4.2.2 was a substantial improvement.

Regarding the remaining regression, I think there's a good chance this is SERVER-44991. Here are two options:

  • Wait and re-test after SERVER-44991 is fixed. We currently have a fix, which I'm hoping to see in 4.2.3.
  • Collect perf data on 4.2.2 as I outlined above. The perf signature for SERVER-44991 is distinctive so we should be able to tell from that whether the issue is SERVER-44991, or if it is something else we can investigate.

If you're willing to continue pursuing this on your end we can leave this ticket open; please let me know which course you would prefer.

Comment by Adrien Jarthon [ 21/Dec/19 ]

Ok so I tested 4.2.2 again as primary, and the good news is that it's much better, the bad news is that's it's still more than twice as CPU consuming as 4.0.13, here is a nice chart I made comparing 4.0.13 and 4.2.2 under this same daily load:

I show the same time period from 2 different days:

  • Dec 16th, in blue, when 4.0.13 was primary
  • Dec 19th, in red, when 4.2.2 was primary

About the CPU increase under normal load, this machine with 4.2.2 has slightly more load and I already noticed (in the description) a small CPU regression in general which IMO is not a big deal because 4.2 brings lower I/O usage in return. So here we're looking at the big load in the middle mostly.

Here is a recap table of how long the cron takes and how much CPU time mongo consumes:

Version Time CPU
4.0.13 30min ~10% x 30min = 3min CPU time
4.2.1 1h50m ~40% x 1h50m = 44min CPU time
4.2.2 40min ~20% x 40min = 8min CPU time

I uploaded the diagnostic data file covering Dec 16th and Dec 19th where you can find this load (starts at 11:05am UTC or 10:05am UTC depending on the server because they are not on the same timezone)

All in all this is much more acceptable of course, but still making mongo fast is hard so avoiding regression as soon as possible when it's easier to remove should be important to you guys I guess? so we avoid this kind of things: SERVER-42698 which nobody wants to fix now.

But if you judge this normal and don't want to fix I won't be mad if you close.

Comment by Adrien Jarthon [ 06/Dec/19 ]

Thanks Bruce, I'll give 4.2.2 a try when it's available in the repos then and will let you know how it goes.

Comment by Bruce Lucas (Inactive) [ 06/Dec/19 ]

bigbourin@gmail.com, thanks very much for the further details. Meanwhile we have identified a couple of performance issues in 4.2 that are potential candidates.

  • WT-5218. This may be a candidate because we are seeing a higher rate of pages read into the cache during the daily jobs on 4.2.1. The fix for this is available now in a 4.2.2 release candidate, and should be available in GA 4.2.2 next week. We should rule this out before doing further debugging.
  • SERVER-44991 is a regression in CPU utilization under some circumstances that we recently discovered (and is still under investigation). If upgrading to 4.2.2 doesn't fix the issue a next step would be to collect perf data. We typically do it like this:

    while true; do perf record -a -g -F 99 -o perf.data.$(date -u +%FT%TZ) sleep 60; done
    # then run perf script as above on the subset of files of interest
    for fn in ...; do perf script -i $fn >$fn.txt; done
    

    This breaks the perf data collection into more manageable one-minute chunks. It's important that perf script be run on the same machine where the data is collected while mongod is still running in order to symbolize the backtraces collected by perf record. Ideally this should be collected both for 4.0.13 and 4.2.2, but the signal (excess CPU) is large enough here that 4.2.2 alone might be sufficient. After collecting the data upload the *.txt files and diagnostic.data and we'll take a look and see if it is SERVER-44991 or something else.

Comment by Adrien Jarthon [ 22/Nov/19 ]

Sure I can!

Yes the script runs from 11:05 to 11:35 (under 4.0) and 11:05 to 12:56 (under 4.2). What it does basically is aggregate together some time series data, for example it'll take 24 hourly records and merge them in a 1 daily record, thus producing 24 deleted and 1 insert.

Here is code of the Mongoid model and the compact method which is called every day at 11am to group hourly entries from 2 days back into daily records:

class Apdex
  include Mongoid::Document
 
  field :hostname, type: String
  field :time, type: Time
  field :total_duration, type: Integer, default: 0
  field :timings, type: Hash, default: {'redirect' => 0, 'namelookup' => 0, 'connect' => 0, 'appconnect' => 0, 'pretransfer' => 0, 'starttransfer' => 0, 'total' => 0}
 
  # Response time distribution (for live Apdex computation)
  field :samples, type: Integer, default: 0
  field :under, type: Hash, default: {'125' => 0, '250' => 0, '500' => 0, '1000' => 0, '2000' => 0, '4000' => 0, '8000' => 0}
  field :failures, type: Integer, default: 0
  field :group_by, type: String # nil (hour), 'day' or 'month'
 
  belongs_to :check, optional: true
 
  def self.report hostname, check, status, time: Time.now, timings: {}
    group_by = (check.period < 20.minutes ? nil : 'day')
    time = (group_by == 'day' ? time.utc.midday : time.beginning_of_hour)
    record = Apdex.find_or_initialize_by(check: check, hostname: hostname, time: time, group_by: group_by)
    record.samples += 1
    # Best response time estimation: total - redirect - latest of (SSL handshake,  TCP connect or namelookup)
    response_time = (timings['total'] || 0) - (timings['redirect'] || 0) - [(timings['namelookup'] || 0), (timings['connect'] || 0), (timings['appconnect'] || 0)].max
    record.total_duration += response_time
    timings.each do |k, v|
      record.timings[k] = (record.timings[k] || 0) + v
    end
    if status == :up
      record.under['125'] = (record.under['125'] || 0) + 1 if response_time < 125
      record.under['250'] = (record.under['250'] || 0) + 1 if response_time < 250
      record.under['500'] = (record.under['500'] || 0) + 1 if response_time < 500
      record.under['1000'] = (record.under['1000'] || 0) + 1 if response_time < 1000
      record.under['2000'] = (record.under['2000'] || 0) + 1 if response_time < 2000
      record.under['4000'] = (record.under['4000'] || 0) + 1 if response_time < 4000
      record.under['8000'] = (record.under['8000'] || 0) + 1 if response_time < 8000
    else
      record.failures += 1
    end
    record.save
  end
 
  # Merges old apdex entries together (1/hour => 1/day => 1/month)
  def self.compact! upto: 48.hours.ago, group_by: 'day'
    groups = {}
    from = (group_by == 'month' ? 'day' : nil)
    to_destroy = []
    Apdex.where(group_by: from).lt(time: upto).each do |apdex|
      group_time = if group_by == 'day'
        apdex.time.utc.midday
      elsif group_by == 'month'
        apdex.time.utc.beginning_of_month + 15.days
      end
      groups[[apdex.check_id, apdex.hostname, group_time, group_by]] ||= Apdex.find_or_initialize_by(check_id: apdex.check_id, hostname: apdex.hostname, time: group_time, group_by: group_by)
      group = groups[[apdex.check_id, apdex.hostname, group_time, group_by]]
      group.samples += apdex.samples
      group.total_duration += apdex.total_duration
      apdex.under.each { |k, v| group.under[k] += v if group.under[k] }
      apdex.timings.each { |k, v| group.timings[k] += v if group.timings[k] }
      group.failures += apdex.failures
      to_destroy << apdex.id
      if to_destroy.size > 1000
        Apdex.in(id: to_destroy).delete_all
        to_destroy.clear
      end
    end
    groups.each_value(&:save)
    Apdex.in(id: to_destroy).delete_all
  end
end

There is some finds in between as you said, to find if the daily record exists before creating it (in which case it is updated) and also some getMore to iterate on the apdex records to aggregate. I have batched the delete in group of 1000 because it used to be much faster than doing one by one or smaller batches at least in 3.4, 3.6 and 4.0.

There is indeed a constant (and very stable) stream of queries on the server. It's running my monitoring service https://updown.io and each monitoring request I do will update the apdex collection (using the `report` method visible above), this method does a find query and then either an update or an insert (most often update), it is called about 160 times/second. There is also some update on other collections at the same time.

There is also as you see some hourly crons generating some requests (not a lot) and more particularly at the 31st minute is an hourly mongodump (the one fetching a lot of documents in this chart I suppose).

So the history of these documents is:

  • The `report` method creates them as necessary (mostly at the beginning of the hour)
  • Once it's created, during the same hour, the `report` method may find and update the same record several times
  • After 1h, a new document is inserted and the previous one is not updated any more
  • After 48h, the compact method cron merges multiple of these records into a new one
  • The deletes looks mostly sequential as I'm deleting all records created during the same 24h every time, though the order of deletion is not necessarily very sequential? I'm not sorting as you can see so not sure.

Here are the collections stats, TLDR there's a lot of small and very similar records (21M).
I left the entire output in case there are other interesting things for you:

rs0:PRIMARY> db.apdexes.stats()
{
  "ns" : "updown.apdexes",
  "size" : 8214667235,
  "count" : 21985737,
  "avgObjSize" : 373,
  "storageSize" : 9320517632,
  "capped" : false,
  "wiredTiger" : {
    "metadata" : {
      "formatVersion" : 1
    },
    "creationString" : "access_pattern_hint=none,allocation_size=4KB,app_metadata=(formatVersion=1),assert=(commit_timestamp=none,read_timestamp=none),block_allocation=best,block_compressor=snappy,cache_resident=false,checksum=on,colgroups=,collator=,columns=,dictionary=0,encryption=(keyid=,name=),exclusive=false,extractor=,format=btree,huffman_key=,huffman_value=,ignore_in_memory_cache_size=false,immutable=false,internal_item_max=0,internal_key_max=0,internal_key_truncate=true,internal_page_max=4KB,key_format=q,key_gap=10,leaf_item_max=0,leaf_key_max=0,leaf_page_max=32KB,leaf_value_max=64MB,log=(enabled=true),lsm=(auto_throttle=true,bloom=true,bloom_bit_count=16,bloom_config=,bloom_hash_count=8,bloom_oldest=false,chunk_count_limit=0,chunk_max=5GB,chunk_size=10MB,merge_custom=(prefix=,start_generation=0,suffix=),merge_max=15,merge_min=0),memory_page_image_max=0,memory_page_max=10m,os_cache_dirty_max=0,os_cache_max=0,prefix_compression=false,prefix_compression_min=4,source=,split_deepen_min_child=0,split_deepen_per_child=0,split_pct=90,type=file,value_format=u",
    "type" : "file",
    "uri" : "statistics:table:collection-73-6984886530218540453",
    "LSM" : {
      "bloom filter false positives" : 0,
      "bloom filter hits" : 0,
      "bloom filter misses" : 0,
      "bloom filter pages evicted from cache" : 0,
      "bloom filter pages read into cache" : 0,
      "bloom filters in the LSM tree" : 0,
      "chunks in the LSM tree" : 0,
      "highest merge generation in the LSM tree" : 0,
      "queries that could have benefited from a Bloom filter that did not exist" : 0,
      "sleep for LSM checkpoint throttle" : 0,
      "sleep for LSM merge throttle" : 0,
      "total size of bloom filters" : 0
    },
    "block-manager" : {
      "allocations requiring file extension" : 0,
      "blocks allocated" : 3202501,
      "blocks freed" : 3135623,
      "checkpoint size" : 8288432128,
      "file allocation unit size" : 4096,
      "file bytes available for reuse" : 1031155712,
      "file magic number" : 120897,
      "file major version number" : 1,
      "file size in bytes" : 9320517632,
      "minor version number" : 0
    },
    "btree" : {
      "btree checkpoint generation" : 2050,
      "column-store fixed-size leaf pages" : 0,
      "column-store internal pages" : 0,
      "column-store variable-size RLE encoded values" : 0,
      "column-store variable-size deleted values" : 0,
      "column-store variable-size leaf pages" : 0,
      "fixed-record size" : 0,
      "maximum internal page key size" : 368,
      "maximum internal page size" : 4096,
      "maximum leaf page key size" : 2867,
      "maximum leaf page size" : 32768,
      "maximum leaf page value size" : 67108864,
      "maximum tree depth" : 5,
      "number of key/value pairs" : 0,
      "overflow pages" : 0,
      "pages rewritten by compaction" : 0,
      "row-store internal pages" : 0,
      "row-store leaf pages" : 0
    },
    "cache" : {
      "bytes currently in the cache" : 5145100310,
      "bytes dirty in the cache cumulative" : 104442399934,
      "bytes read into cache" : 657305767722,
      "bytes written from cache" : 85962297848,
      "checkpoint blocked page eviction" : 1641,
      "data source pages selected for eviction unable to be evicted" : 3840,
      "eviction walk passes of a file" : 1349883,
      "eviction walk target pages histogram - 0-9" : 108735,
      "eviction walk target pages histogram - 10-31" : 256906,
      "eviction walk target pages histogram - 128 and higher" : 0,
      "eviction walk target pages histogram - 32-63" : 317078,
      "eviction walk target pages histogram - 64-128" : 667164,
      "eviction walks abandoned" : 115286,
      "eviction walks gave up because they restarted their walk twice" : 0,
      "eviction walks gave up because they saw too many pages and found no candidates" : 21701,
      "eviction walks gave up because they saw too many pages and found too few candidates" : 16629,
      "eviction walks reached end of tree" : 99764,
      "eviction walks started from root of tree" : 153616,
      "eviction walks started from saved location in tree" : 1196267,
      "hazard pointer blocked page eviction" : 1429,
      "in-memory page passed criteria to be split" : 9,
      "in-memory page splits" : 5,
      "internal pages evicted" : 336137,
      "internal pages split during eviction" : 6,
      "leaf pages split during eviction" : 13154,
      "modified pages evicted" : 2265651,
      "overflow pages read into cache" : 0,
      "page split during eviction deepened the tree" : 0,
      "page written requiring cache overflow records" : 0,
      "pages read into cache" : 60425055,
      "pages read into cache after truncate" : 0,
      "pages read into cache after truncate in prepare state" : 0,
      "pages read into cache requiring cache overflow entries" : 0,
      "pages requested from the cache" : 3373844802,
      "pages seen by eviction walk" : 150747000,
      "pages written from cache" : 3198413,
      "pages written requiring in-memory restoration" : 22205,
      "tracked dirty bytes in the cache" : 12689958,
      "unmodified pages evicted" : 58550530
    },
    "cache_walk" : {
      "Average difference between current eviction generation when the page was last considered" : 0,
      "Average on-disk page image size seen" : 0,
      "Average time in cache for pages that have been visited by the eviction server" : 0,
      "Average time in cache for pages that have not been visited by the eviction server" : 0,
      "Clean pages currently in cache" : 0,
      "Current eviction generation" : 0,
      "Dirty pages currently in cache" : 0,
      "Entries in the root page" : 0,
      "Internal pages currently in cache" : 0,
      "Leaf pages currently in cache" : 0,
      "Maximum difference between current eviction generation when the page was last considered" : 0,
      "Maximum page size seen" : 0,
      "Minimum on-disk page image size seen" : 0,
      "Number of pages never visited by eviction server" : 0,
      "On-disk page image sizes smaller than a single allocation unit" : 0,
      "Pages created in memory and never written" : 0,
      "Pages currently queued for eviction" : 0,
      "Pages that could not be queued for eviction" : 0,
      "Refs skipped during cache traversal" : 0,
      "Size of the root page" : 0,
      "Total number of pages currently in cache" : 0
    },
    "compression" : {
      "compressed pages read" : 24049280,
      "compressed pages written" : 3069840,
      "page written failed to compress" : 0,
      "page written was too small to compress" : 128413
    },
    "cursor" : {
      "bulk-loaded cursor-insert calls" : 0,
      "close calls that result in cache" : 0,
      "create calls" : 18412,
      "cursor operation restarted" : 5217,
      "cursor-insert key and value bytes inserted" : 1765986943,
      "cursor-remove key bytes removed" : 27790124,
      "cursor-update value bytes updated" : 0,
      "cursors reused from cache" : 58930733,
      "insert calls" : 4631139,
      "modify calls" : 17700357,
      "next calls" : 0,
      "open cursor count" : 0,
      "prev calls" : 1,
      "remove calls" : 5558174,
      "reserve calls" : 0,
      "reset calls" : 164123963,
      "search calls" : 1120052537,
      "search near calls" : 0,
      "truncate calls" : 0,
      "update calls" : 0
    },
    "reconciliation" : {
      "dictionary matches" : 0,
      "fast-path pages deleted" : 0,
      "internal page key bytes discarded using suffix compression" : 3139721,
      "internal page multi-block writes" : 8315,
      "internal-page overflow keys" : 0,
      "leaf page key bytes discarded using prefix compression" : 0,
      "leaf page multi-block writes" : 17641,
      "leaf-page overflow keys" : 0,
      "maximum blocks required for a page" : 1,
      "overflow values written" : 0,
      "page checksum matches" : 51967,
      "page reconciliation calls" : 3125666,
      "page reconciliation calls for eviction" : 1128037,
      "pages deleted" : 13460
    },
    "session" : {
      "object compaction" : 0
    },
    "transaction" : {
      "update conflicts" : 0
    }
  },
  "nindexes" : 4,
  "totalIndexSize" : 1658314752,
  "indexSizes" : {
    "check_id_1_time_1" : 352993280,
    "time_1" : 135737344,
    "group_by_1_time_1" : 118075392,
    "_id_" : 1051508736
  },
  "ok" : 1,
  "operationTime" : Timestamp(1574464255, 106),
  "$clusterTime" : {
    "clusterTime" : Timestamp(1574464255, 106),
    "signature" : {
      "hash" : BinData(0,"fu+woxFnFDuC7o2QguDHG+JtQiQ="),
      "keyId" : NumberLong("6706776180722237441")
    }
  }
}

Let me know if you need anything else!

Comment by Bruce Lucas (Inactive) [ 21/Nov/19 ]

Thanks for the report Adrien.

Can you help me with some background information and context regarding the workload. Here's what I see in the ftdc data:

The batch job in question seems to run from A to B (about 11:05 to about 11:35 (under 4.0, longer under 4.2 because the deletes are slower). It seems to consist of a lot of deletes followed by a lot of inserts, and there may also be some finds during both the deletes and the inserts (based on document returned). This seems to be consistent with what I see in your cron logs.

At the same time there appears to be a constant stream of background activity, mostly updates, punctuated by some other jobs that run at hourly intervals.

Can you confirm or correct the above?

What is the history of the documents being deleted - were they inserted by the previous instance of this cron job, by the periodic jobs inserting documents every hour, or have they been in the collection for a long time? Have the documents that were deleted been updated since inserting? Is the daily cron job in question operating on its own collection independent of the other activity on the system? Are the deletes distributed randomly in the collection or are they sequential?

Also can you attach collection stats for the collection in question so I can get an idea of the number of indexes, size of collection, and size of documents. Are the documents generally all about the same size, or is there a wide range of variation?

What I have seen so far is that during the deletes there is much higher cache pressure and eviction activity, which I believe explains the higher CPU and lower delete rate. I'm also seeing about 2x the amount of dirty data generated in the cache per delete under 4.2, which could possibly explain the higher cache pressure. I'm hoping that clearer picture of the workload and history of the documents might help me understand the reason for this. It may also help us reproduce the problem in isolation.

Comment by Adrien Jarthon [ 21/Nov/19 ]

Thanks, I uploaded the diagnostic data and the two requests logs related to this script running against 4.0.13 and 4.2.1

Comment by Danny Hatcher (Inactive) [ 21/Nov/19 ]

As requested, here is an uploader link: https://10gen-httpsupload.s3.amazonaws.com/upload_forms/550e137f-ce8e-4cf8-8123-47d82d1d3b7e.html

Generated at Thu Feb 08 05:06:52 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.