Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 3.6.6, 4.0.0-rc0, WT3.1.0
Affects Version/s: None
Component/s: None
Labels:
None

Sprint:
Storage Non-NYC 2018-05-07
Story Points:
None

Sorry for the poor title, but we now have a repro case along with an understanding of how data files got into a specific state.

Consider MongoDB starting up on existing data files and setting the oldest timestamp = stable timestamp = WT recovery timestamp = 100

Consider a WT transaction that performs the following MongoDB writes with Recover to a Timestamp table logging rules:

Writes to a collection table at time 110 (not WT journaled)
Writes to the _id index table at time 110 (not WT journaled)
Writes to the oplog at time 110 (journaled by WT)

MongoDB advances the stable timestamp to 120.

WT senses cache pressure and chooses a page where all writes are visible to evict. It chooses to evict the page for the collection table including the write at 110. This becomes a new block on disk that is otherwise unreferenced by the previous WT checkpoint. This writes a file_sync journal entry outside of any checkpoint log records.

The node crashes without ever taking a checkpoint. When the node restarts, it first runs WT recovery. WT creates a checkpoint that updates the collection table to a root node that now includes the write at 110. The index table does not include the corresponding write at 110. The WT system checkpoint time remains at 100.

This creates a problem when MongoDB runs replication/oplog recovery. The insert at 110 will now create a duplicate document in the collection table.

Shown below are the WT table metadata from before recovery is ran and after. It demonstrates a checkpoint was taken and, based on the size, new data is included. The new checkpoint_lsn matches the file_sync log record lsn.

access_pattern_hint=none,
allocation_size=4KB,
app_metadata=(formatVersion=1),
assert=(commit_timestamp=none,read_timestamp=none),
block_allocation=best,
block_compressor=snappy,
cache_resident=false,
checkpoint=(
    WiredTigerCheckpoint.105=(
	addr="01c00781e4ecd10c96c01781e48fc4f005c08481e4d61a7687808080e31a5fc0e3193fc0",
	order=105,
	time=1524851546,
	size=1667072,
	write_gen=226)),
checkpoint_lsn=(11,1252224),
checksum=on,
collator=,columns=,dictionary=0,encryption=(keyid=,name=),
format=btree,
huffman_key=,huffman_value=,
id=65,
ignore_in_memory_cache_size=false,
internal_item_max=0,
internal_key_max=0,
internal_key_truncate=true,
internal_page_max=4KB,
key_format=q,
key_gap=10,
leaf_item_max=0,
leaf_key_max=0,
leaf_page_max=32KB,
leaf_value_max=64MB,
log=(enabled=false),
memory_page_max=10m,
os_cache_dirty_max=0,
os_cache_max=0,
prefix_compression=false,
prefix_compression_min=4,
split_deepen_min_child=0,
split_deepen_per_child=0,
split_pct=90,
value_format=u,
version=(major=1,minor=1)

After:

access_pattern_hint=none,
allocation_size=4KB,
app_metadata=(formatVersion=1),
assert=(commit_timestamp=none,read_timestamp=none),
block_allocation=best,
block_compressor=snappy,
cache_resident=false,
checkpoint=(
    WiredTigerCheckpoint.106=(
	addr="01c0ba81e4fc2781bbc11a81e437359414c11b81e411671792808080e31ebfc0e31dffc0",
	order=106,
	time=1524851624,
	size=1978368,
	write_gen=244)),
checkpoint_lsn=(12,21533056),
checksum=on,
collator=,columns=,dictionary=0,encryption=(keyid=,name=),
format=btree,
huffman_key=,huffman_value=,
id=65,
ignore_in_memory_cache_size=false,
internal_item_max=0,
internal_key_max=0,
internal_key_truncate=true,
internal_page_max=4KB,
key_format=q,
key_gap=10,
leaf_item_max=0,
leaf_key_max=0,
leaf_page_max=32KB,
leaf_value_max=64MB,
log=(enabled=false),
memory_page_max=10m,
os_cache_dirty_max=0,
os_cache_max=0,
prefix_compression=false,
prefix_compression_min=4,
split_deepen_min_child=0,
split_deepen_per_child=0,
split_pct=90,
value_format=u,
version=(major=1,minor=1)

related to

WT-8695 Remove file_close_sync config and disallow single file checkpoint

Closed

Assignee:: Susan LoVerso (Inactive)
Reporter:: Daniel Gottlieb (Inactive)
Votes:: 0 Vote for this issue
Watchers:: 8 Start watching this issue

Created:: May 01 2018 01:05:49 AM UTC
Updated:: Oct 29 2023 04:46:42 PM UTC
Resolved:: May 03 2018 08:42:41 PM UTC

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates