[SERVER-34440] Secondary reads from internal (non-network) clients should read at the last applied timestamp Created: 12/Apr/18  Updated: 29/Oct/23  Resolved: 14/May/18

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: 4.0.0-rc0

Type: Bug Priority: Major - P3
Reporter: Louis Williams Assignee: Louis Williams
Resolution: Fixed Votes: 0
Labels: nyc
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-34343 Initial sync should not timestamp sec... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Storage NYC 2018-05-21
Participants:

 Description   

When checking whether or not to read from the local snapshot (last applied timestamp) on a secondary, we restrict to clients where isFromUserConnection is true. This is to prevent a bug where rolled back index builds are still visible (similar to the invariant in BF-8258). See this patch build failure where the check for isFromUserConnection is removed:

The most likely cause is that rollback does not correctly rewind the last applied snapshot timestamp. We should evaluate how replication rolls back its timestamps and ensure that the local snapshot timestamp is correctly updated.

The solution should be for users of ShouldNotConflictWithSecondaryBatchApplication who opt-out of taking the PBWM lock should be able to read without a timestamp. This applies to the threads listed in the comments, with the addition of rsBackgroundSync.



 Comments   
Comment by Githook User [ 14/May/18 ]

Author:

{'name': 'Louis Williams', 'email': 'louis.williams@mongodb.com', 'username': 'louiswilliams'}

Message: SERVER-34440 Internal readers who opt-out of the PBWM lock when using AutoGetCollectionForRead should not read at the last-applied timestamp.
Branch: master
https://github.com/mongodb/mongo/commit/4e5d23881a130d854bf0b546ef7e4a6902c3c5b9

Comment by Louis Williams [ 08/May/18 ]

Discussed with spencer, daniel.gottlieb, and judah@mongodb.com. We believe there is no reason that any of the internal readers listed above, nor any readers that opt-out of taking the PBWM lock using ShouldNotConflictWithSecondaryBatchApplicationBlock should read at the last applied timestamp. These readers would already have be exposed to reading inconsistent data, if any. For oplog reads by FTDC or repl, we do not advance the oplog "all-committed" time until the end of batches, so there is no additional risk posed to readers. Rollback needs to read without a timestamp when reloading the catalog, or the bug described above can occur. In all known cases, reading without a timestamp from internal readers is acceptable behavior.

No further work is is required for this ticket.

Comment by Louis Williams [ 30/Apr/18 ]

I have identified at least one failure with rollback related to this change. Consider the following scenario:

  • A rollback occurs. A dropIndex command is one of the rolled-back operations.
  • The last applied timestamp on the rolled-back secondary is reset to time T1 and the oplog is truncated. The primary is at time T2, the current clusterTime.
  • The index re-build on the secondary is timestamped at time T2, which is ahead of the last applied time, T1
  • The collection catalog is refreshed with a read timestamp at T1, when the index is not ready yet. The in-memory catalog information about the index is that it is unfinished.
  • Replication continues normally until time T3 when the secondary is up-to-date.
  • A reader on the secondary reads at timestamp T3. The on-disk catalog shows the index as complete, but because the in-memory catalog was loaded at timestamp T1 when the index was incomplete, an invariant is triggered.

Possible solutions:

  • Don't timestamp index builds at the current clusterTime. This is similar to SERVER-34343
  • Don't refresh (or refresh again) the catalog once time T2 is the current last applied timestamp on the secondary
Comment by Louis Williams [ 30/Apr/18 ]

The following internal threads use AutoGetCollectionForRead, and would be affected by reading from the last applied timestamp:

  • "ftdc": Reads from the oplog
  • "rsBackgroundSync": Reads from both oplog and replicated collections
  • "ReplBatcher": Reads from the oplog
  • "monitoring keys for HMAC": Reads from admin.system.keys
Generated at Thu Feb 08 04:36:41 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.