[SERVER-47694] fix multikey. again Created: 22/Apr/20  Updated: 29/Oct/23  Resolved: 24/Apr/20

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: 4.0.20, 4.4.0-rc3, 4.7.0

Type: Bug Priority: Major - P3
Reporter: Daniel Gottlieb (Inactive) Assignee: Daniel Gottlieb (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
related to SERVER-48042 Correct failpoint resetting in valida... Closed
related to SERVER-56877 insert operations may fail to set ind... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.4, v4.2, v4.0
Sprint: Execution Team 2020-05-04
Participants:
Linked BF Score: 34

 Description   

The multikey variable on indexes serves a dual purpose:

  1. First, readers seeing true cannot assume an index contains a record id at most once (thus requiring the reader to dedup documents).
  2. Second, writers seeing true may skip updating the catalog.

In the absence of using the storage engine snapshot to determine the multikey state of an index, a single variable is insufficient to satisfy the concurrent reader-writer contract as well as the concurrent writer-writer contract.

This setting of multikey in an onCommit prevents the writer-writer contract from being violated. Writers can only skip setting multikey once a storage engine transaction has successfully committed.

However it allows a reader-writer error. A reader that slices a writer's storage engine commit and its onCommit handlers can see a snapshot with multikey data, but see an in memory value of false.

The alternative of unconditionally setting multikey outside of the onCommit, prior to the storage engine committing leads to a writer-writer problem. A few cases can go wrong. For brevity, we can describe the most egregious case. Suppose the writer that flips multikey in memory rolls back its storage engine transaction. Future writers will never try to correct the multikey value on disk.



 Comments   
Comment by Githook User [ 26/Feb/21 ]

Author:

{'name': 'Louis Williams', 'email': 'louis.williams@mongodb.com', 'username': 'louiswilliams'}

Message: Revert "SERVER-54742 Initialize IndexCatalogEntry multikey state directly from durable catalog"

This reverts commit 46fc6fb233e85538a94dda6eea91cc2ac34cee15.

Revert "SERVER-47694: fix multikey. again"

This reverts commit ecd41f8b3bfe2154921cbcede9040d535a46e0c5.
Branch: v4.2
https://github.com/mongodb/mongo/commit/a56afdff77a84c37d6af0ba77e7068a0b5d593c0

Comment by Githook User [ 08/Feb/21 ]

Author:

{'name': 'Louis Williams', 'email': 'louis.williams@mongodb.com', 'username': 'louiswilliams'}

Message: SERVER-47694: fix multikey. again

Split the single _isMultikey variable on an IndexCatalogEntry(Impl) into two
separate variables: _isMultikeyForReader and _isMultikeyForWriter.

_isMultikeyForReader is flipped as early as possible. Readers concurrent
with multikey flipping may forgo a possible optimization when their snapshot
sees no multikey data.

_isMultikeyForWriter is flipped after the storage engine commits a multikey
change to the on-disk catalog. At this point, writers may, under some
circumstances, optimize away some catalog writes.

Move logic for optimizing readers (multikey paths, clearing query cache) outside
of the onCommit.

Adds a failpoint widenWUOWChangesWindow which sleeps transaction commit and
onCommit/onRollback handlers.

Have validate assert multikey paths are set correctly for the documents observed
during its collection scan.

(cherry picked from commit 3566db153ea61fb10d3ef11ea917fc7bc93eac4d)
Branch: v4.2
https://github.com/mongodb/mongo/commit/ecd41f8b3bfe2154921cbcede9040d535a46e0c5

Comment by Githook User [ 10/Jun/20 ]

Author:

{'name': 'Daniel Gottlieb', 'email': 'daniel.gottlieb@mongodb.com', 'username': 'dgottlieb'}

Message: SERVER-47694: fix multikey. again

Split the single _isMultikey variable on an IndexCatalogEntry(Impl) into two
separate variables: _isMultikeyForReader and _isMultikeyForWriter.

_isMultikeyForReader is flipped as early as possible. Readers concurrent
with multikey flipping may forgo a possible optimization when their snapshot
sees no multikey data.

_isMultikeyForWriter is flipped after the storage engine commits a multikey
change to the on-disk catalog. At this point, writers may, under some
circumstances, optimize away some catalog writes.

Move logic for optimizing readers (multikey paths, clearing query cache) outside
of the onCommit.

(cherry picked from commit 3566db153ea61fb10d3ef11ea917fc7bc93eac4d)
Branch: v4.0
https://github.com/mongodb/mongo/commit/bafd10233ae480b8c98ff212658a9d003a741da7

Comment by Githook User [ 24/Apr/20 ]

Author:

{'name': 'Daniel Gottlieb', 'email': 'daniel.gottlieb@mongodb.com', 'username': 'dgottlieb'}

Message: SERVER-47694: fix multikey. again

Split the single _isMultikey variable on an IndexCatalogEntry(Impl) into two
separate variables: _isMultikeyForReader and _isMultikeyForWriter.

_isMultikeyForReader is flipped as early as possible. Readers concurrent
with multikey flipping may forgo a possible optimization when their snapshot
sees no multikey data.

_isMultikeyForWriter is flipped after the storage engine commits a multikey
change to the on-disk catalog. At this point, writers may, under some
circumstances, optimize away some catalog writes.

Move logic for optimizing readers (multikey paths, clearing query cache) outside
of the onCommit.

Adds a failpoint widenWUOWChangesWindow which sleeps transaction commit and
onCommit/onRollback handlers.

Have validate assert multikey paths are set correctly for the documents observed
during its collection scan.

(cherry picked from commit 3566db153ea61fb10d3ef11ea917fc7bc93eac4d)
Branch: v4.4
https://github.com/mongodb/mongo/commit/ad996a13f7060100201d7f7824b894bdf2fe24ef

Comment by Githook User [ 24/Apr/20 ]

Author:

{'name': 'Daniel Gottlieb', 'email': 'daniel.gottlieb@mongodb.com', 'username': 'dgottlieb'}

Message: SERVER-47694: fix multikey. again

Split the single _isMultikey variable on an IndexCatalogEntry(Impl) into two
separate variables: _isMultikeyForReader and _isMultikeyForWriter.

_isMultikeyForReader is flipped as early as possible. Readers concurrent
with multikey flipping may forgo a possible optimization when their snapshot
sees no multikey data.

_isMultikeyForWriter is flipped after the storage engine commits a multikey
change to the on-disk catalog. At this point, writers may, under some
circumstances, optimize away some catalog writes.

Move logic for optimizing readers (multikey paths, clearing query cache) outside
of the onCommit.

Adds a failpoint widenWUOWChangesWindow which sleeps transaction commit and
onCommit/onRollback handlers.

Have validate assert multikey paths are set correctly for the documents observed
during its collection scan.
Branch: master
https://github.com/mongodb/mongo/commit/3566db153ea61fb10d3ef11ea917fc7bc93eac4d

Generated at Thu Feb 08 05:14:58 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.