Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 4.2.1
Component/s: None
Labels:
None

Backwards Compatibility:
Fully Compatible
Operating System:
ALL

I have a 3-node replica set running version 4.2.1 on Ubuntu 18.04. Previously we had been on 4.0.4. I'm noticing that the cluster is significantly slower than it was in 4.0.4 both in terms of bulk insert speed on the primary, impact on the speed of other operations while those inserts jobs are running, and the replication lag we see on the secondaries.

I think at least on the primary this has something to do with elevated cache pressure, but I'm kind of just guessing. I'm happy to provide diagnostic files and logs privately.

We're now running with featureCompatibilityVersion 4.2, flow control off, and enableMajorityReadConcern false, and have had some improvement, but we're still in trouble with cache pressure and insert speed, and the replication lag is quite high when we're running jobs that insert millions of records.

Some performance numbers, each with upwards of 15-25m rows:

With all nodes on 4.0.4: a job averaged 875 rows/sec. The local secondary mostly kept up (maybe 10-60 seconds lag). The geographically remote secondary was never more than 300 seconds behind.
With the secondaries on 4.2.1 and the primary on 4.0.4: a job averaged 484 rows/sec. Both secondaries were about 5000-10000 seconds behind by the end of the job.
On a similar but standalone node on 4.2.1, the job averaged 4012 rows/sec
With all nodes on 4.2.1 and default settings for flowControl and enableMajorityReadConcern: average 265 rows/sec and all other operations were significantly impacted.
After disabling flowControl: average 712 rows/sec, but both replicas were about 5000 seconds lagged and all other operations were significantly delayed, for example I saw a document update take 62673ms with {{timeWaitingMicros: { cache: 62673547 }}}
After disabling flowControl and setting enableMajorityReadConcern to false: no specific insert number but I'm seeing "tracked dirty bytes in the cache" at about 1.5 GB while an insert job is running. One secondary is lagged by 4 sec and the other by 1183 sec.

A couple of notes on our setup:

Our bulk inserts are of ~2.5 KB documents with a large number of indexes (20, one of which is multi-key - array of strings);
We do have a couple of change stream listeners (not on the same collection being written to by bulk writes) so some level of majority read concern is still being supported (I suppose)

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

SERVER-44991.svg
526 kB
Dec 08 2019 07:39:47 PM UTC
wt_row_leaf_key_work.png
81 kB
Dec 08 2019 07:49:12 PM UTC

is related to

SERVER-44881 Giant slow oplog entries are being logged in full

Backlog

related to

SERVER-44991 Performance regression in indexes with keys with common prefixes

Closed

Assignee:: Dmitry Agranat

Reporter:: Michael Smith

Participants:: Dmitry Agranat, Michael Smith

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Created:: Dec 04 2019 08:56:28 PM UTC

Updated:: Oct 29 2023 10:14:26 PM UTC

Resolved:: Jan 28 2020 01:58:16 PM UTC

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates