Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 4.0.0-rc6, 4.1.1
Affects Version/s: None
Component/s: Replication, Storage
Labels:
None

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Backport Requested:

v4.0
Sprint:
Repl 2018-06-18
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Keeping correct counts with the WiredTigerSizeStorer is complex, error prone and seemingly impossible. Particularly with the move to only checkpointing stable data colliding with crashes, rollback and deletions due to capped collections. Thanks to geert.bosch's recent dive into other size storer related problems, I am proud to now announce collection renames (within the same database) are added to the list of operations that will require careful handling of minutiae to maintain correct-er counts.

One common scenario the WiredTiger integration layer attempts to keep correct is coming back online after a clean shutdown at an arbitrary stable timestamp. The state of (non-empty) collections and their sizes is that the size storer table contains the correct size after replication replays from the stable timestamp to the top of oplog (where the node left off when shutting down).

To do this, the code refrains from updating counts when in replication recovery (among some other conditions).

One exception to this rule is when a collection is created during replication recovery. This condition is unfortunately necessary because the WTSizeStorer maps "idents" to counts. When a collection is recreated during replication recovery, a new ident is chosen (the previous one is lost to the void). Because the previous mapping, albeit correct, is lost, the code must count inserts coming in to be correct.

The intersection of these behaviors along with renameCollection's behavior to create a new record store object (referencing the same underlying table) will juke the WTRecordStore constructor into allowing size adjustments during replication recovery on the same underlying ident.

Thus a sequence involving a rename from A -> B that manifests as an incorrect count:

At shutdown collection B has 2 documents and a correct count of 2.
At the stable timestamp, Collection A exists with 1 document and a count of 2.
Replication recovery plays a rename from A -> B. This marks the collection for size adjustment.
Replication recovery inserts a second document into B. This increases the count from 2 -> 3.

The attached data files, when brought up as a replica set (on localhost:27017), will demonstrate count() != itcount()

Note that replication recovery replaying a sequence of:

create collection A
insert
rename A -> B
insert

must allow size adjustments on B. As if it's being "inherited" from A.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

renameMiscount.tgz
359 kB
Jun 06 2018 12:55:29 AM UTC

is related to

SERVER-34976 clear the "needing size adjustment" set at the beginning of replication rollback

Closed

SERVER-34977 subtract capped deletes from fastcount during replication recovery

Closed

related to

SERVER-35483 rollback makes config.transactions fastcount inaccurate

Closed

Assignee:: Judah Schvimer
Reporter:: Daniel Gottlieb (Inactive)
Participants:: Daniel Gottlieb, Githook User, Judah Schvimer
Votes:: 0 Vote for this issue
Watchers:: 6 Start watching this issue

Created:: Jun 06 2018 01:03:07 AM UTC
Updated:: Oct 29 2023 10:31:02 PM UTC
Resolved:: Jun 12 2018 06:18:37 PM UTC

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

PagerDuty