Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Won't Do
Priority: Major - P3
Fix Version/s: Server_Docs_[20240108]
Affects Version/s: None
Component/s: C2C, Server
Labels:

Epic Link:
DOCSP-22764

Original Downstream Change Summary

We should document that when setting up the destination cluster, users should set minSnapshotHistoryWindowSeconds=0 to avoid the issue of worse latency on the destination after the sync completes.

Description of Linked Ticket

Problem Statement/Rationale

What is going wrong? What action would you like the Engineering team to take?

After the sync is complete, the same workload baseline execution on the destination replica set shows 26% latency increase. There is also an increase in average User CPU of 26% while CPU IOWait increases by 17%, compared with the same Baseline Workload executed on the Source Replica Set.

Steps to Reproduce

How could an engineer replicate the issue you’re reporting?

This is a 3 node replica set mongosync test with the 100GB dataset.

Expected Results

What do you expect to happen?

Similar latencies and CPU/Storage usage on source and destination replica set.

Actual Results

What do you observe is happening?

After the sync is complete, the same baseline workload on the destination replica set shows much worse numbers.

Additional Notes

Any additional information that may be useful to include.

Even if the query metrics are exactly the same, the underlying data distribution of the WiredTiger table seems different. We can see that for the same number of documents accessed, the blocks, cache pages and bytes accessed are different:

-WiredTiger reads 71% more blocks and writes 16% more blocks to disk or filesystem cache.

WiredTiger reads 63% more pages and writes 39% more pages from disk or filesystem cache to WiredTiger cache.
WiredTiger reads 20% more bytes and writes 5.5% more bytes.

This could be caused by the difference in how the data was created in the first place (with the Genny loader) and how it was synced, ending with different physical layouts on disk. This affects the time needed to perform the checkpoints and the latencies significantly.

Let me know if you need me to attach any extra information.

Assignee:: Alison Huh

Reporter:: Backlog - Core Eng Program Management Team

Participants:: Alison Huh, Backlog - Core Eng Program Management Team

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: Jul 08 2022 09:03:14 PM UTC

Updated:: Jan 08 2024 05:03:59 PM UTC

Resolved:: Dec 18 2023 04:01:33 PM UTC

Days since reply:: 22 weeks, 5 days ago

Details

Description

Description of Linked Ticket

Problem Statement/Rationale

Steps to Reproduce

Expected Results

Actual Results

Additional Notes

Attachments

Activity

People

Dates