Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Done
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
- bf-fix
- stability

Sprint:
Storage Engines - 2022-10-31, 2023-05-30 - 7.0 Readiness, StorEng - 2023-06-13, 2023-06-27 Lord of the Sprints, 2023-07-11 WiredTractor, 2023-07-25 Absolute unit, StorEng - 2023-08-08, ASeasonTooMany-2023-08-22, BermudaTriangle- 2023-09-05
Story Points:
8
Case:

Resharding uses $sample internally. I.e., it is using a WT random cursor. In a resharding performance test, occasionally the test fails when $sample repeatedly fails to find 100 unique documents.

In this ticket we should reproduce the failure, adding instrumentation to WT as needed, and once we understand the issue find a way to make random cursors behave better in the problem case.

The problem test is the ReshardCollection.yml genny workload. It inserts 100,000 10KB documents split evenly across two shards. It then reshards the cluster while 100 threads perform reads and writes (find and update commands). Resharding tries to get ~200 samples from each shard via $sample. Occasionally, the sample includes duplicate keys. We see an error when 100 consecutive attempts to get ~200 unique keys all fail.

$sample is allowed to return duplicate keys. But given the number of keys and size of the sample, having this happen repeatedly is surprising and undesirable.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

image-2023-07-06-16-57-09-560.png
213 kB
Jul 06 2023 06:57:11 AM UTC
image-2023-07-20-16-59-31-389.png
131 kB
Jul 20 2023 06:59:32 AM UTC
image-2023-07-20-16-59-36-750.png
131 kB
Jul 20 2023 06:59:37 AM UTC
image-2023-08-04-13-27-16-408.png
185 kB
Aug 04 2023 03:27:18 AM UTC
image-2023-08-07-09-50-53-626.png
59 kB
Aug 06 2023 11:50:54 PM UTC
image-2023-08-09-13-51-28-356.png
164 kB
Aug 09 2023 03:51:32 AM UTC
reproducer.txt
5 kB
Jul 17 2023 05:40:08 AM UTC
results_1.txt
15 kB
Jul 17 2023 05:43:40 AM UTC
results_last_key.txt
465 kB
Jul 18 2023 07:32:18 AM UTC
results_random_sample_size.rtf
21 kB
Jul 17 2023 05:52:22 AM UTC
results_reset_reader.rtf
20 kB
Jul 17 2023 05:47:59 AM UTC
tree_struct.txt
9.91 MB
Jul 18 2023 07:36:10 AM UTC

is duplicated by

SERVER-29446 $sample stage could not find a non-duplicate document while using a random cursor

Closed

is related to

WT-11533 Investigate python reproducer showing weakness in random cursor with invisible records

Open

WT-11547 Investigate mongosync and mongosMerge random cursor frequent duplicate keys failure

Open

SERVER-29446 $sample stage could not find a non-duplicate document while using a random cursor

Closed

WT-11532 Fix session reset RNG by using cursor RNG

Closed

SERVER-78841 Make the number of samples per chunk in the SamplingBasedInitialSplitPolicy configurable

Closed

WT-11534 Document WT on random cursor functionality

Closed

related to

WT-11385 Investigate how a page with a few entries can be created despite of the existence of pages with lots of entries

Open

(2 is related to, 1 related to)

Assignee:: Jie Chen
Reporter:: Keith Smith
Collaborators:: Etienne Petrel
Votes:: 0 Vote for this issue
Watchers:: 22 Start watching this issue

Created:: Aug 25 2021 04:40:31 PM UTC
Updated:: Apr 29 2024 08:34:07 AM UTC
Resolved:: Aug 25 2023 03:29:27 AM UTC

Details

Description

Attachments

Attachments

Issue Links

Forms

Activity

People

Dates