[SERVER-42847] Update initialsync perf test Created: 16/Aug/19  Updated: 06/Dec/22  Resolved: 16/Aug/19

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Allison Easton Assignee: Backlog - Replication Team
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
Assigned Teams:
Replication
Participants:

 Description   

From the comment from Henrik Ingo onĀ this pull request, we need to change the behavior of the initialsync test. Since the workload is being used for larger documents now, Henrik suggested that our current document format should be changed as the current one doesn't accurately simulate compression.



 Comments   
Comment by Allison Easton [ 16/Aug/19 ]

Accidentally made a server ticket instead of perf

Comment by Allison Easton [ 16/Aug/19 ]

Added Henrik's full comment here in case the pull request link goes away when the request is merged

I don't like this new behavior. It makes it harder to triage this test if there are rules like "b is true if option_b is true but also if option_a > 1024". Also, it will be odd that initialsync with doc_size = 1025 is much faster than doc_size = 1023.

I think what you want is to add a second long field, which is not indexed and used as overflow when doc_size > 1024:

// code placeholder

while (Object.bsonsize(long_string) < Math.min(doc_size, 1000)) {
    long_string.str += "a";
}
while (Object.bsonsize(long_string) < doc_size) {
    other_long_string.str += "a";
}

{{}}

For testing large documents I also feel we should care more about emulating realistic compression than just a string of "a"'s. You could randomly draw letters from an array where frequency of a letter is same as in English text:

function randomLetter() {
    # Emulate English text for realistic compression
    # https://en.wikipedia.org/wiki/Letter_frequency
    # No authoritative source on average word length, but googling based consensus is 5.
    letters = "aaaaaaaabccddddeeeeeeeeeeeeeeffgghhhhhhiiiiiiikllllmmnnnnnnnoooooooopprrrrrrsssssstttttttttuuuvwwyy        
    ";
    var n = Random.randInt(letters.length);
    return letters[n];        
}

{{}}

A minor detail: this code will be more accurate wrt doc_size if it is made into a function that takes the new doc as argument:

 function addLongString (doc) {
    doc.long_string = "" + Random.randInt(100000000);
    doc.other_long_string = "" + Random.randInt(100000000);
    while (Object.bsonsize(doc) < Math.min(doc_size, 1000)) {
        doc.long_string += randomLetter();
   ....

{{}}

Generated at Thu Feb 08 05:01:35 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.