[SERVER-33488] WT SizeStorer dislikes RTT Created: 26/Feb/18  Updated: 29/Oct/23  Resolved: 07/Apr/18

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: 3.7.4

Type: Task Priority: Major - P3
Reporter: Daniel Gottlieb (Inactive) Assignee: Kyle Suarez
Resolution: Fixed Votes: 0
Labels: rollback-functional
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-34976 clear the "needing size adjustment" s... Closed
related to SERVER-33493 Have WT RTT rollback keep correct counts Closed
related to SERVER-33525 Fix replication and sharding tests to... Closed
Backwards Compatibility: Fully Compatible
Sprint: Repl 2018-04-09
Participants:
Linked BF Score: 56

 Description   

Counts in the WTSizeStorer table are not adjusted in the same transaction that performs an insert/update as that would become a serialization point for concurrent inserts/deletes and would result in an expensive WCE.

Instead, an atomic counter is maintained and flushed every so often to the SizeStorer. Among other things, this data is flushed on clean shutdown. These writes are not timestamped (would be difficult) and thus what's put on disk is the counts for "now" which are unlikely to be the counts as of the stable timestamp. At startup, when replication plays forward the oplog during recovery, an insert that was already accounted for in the sizestorer's view of the data, will be counted again.

The proposed fix, trust the WTSizeStorer to have the proper counts for collections after recovery is played. Specifically:

  1. Introduce state representing the server (or operation context) is in "recovery mode".
  2. Step 1, however breaks a special case: when the creation of a collection wasn't included in a stable timestamp, so the collection gets recreated during recovery with an `ident` that's different than the one used at shutdown.
    • Introduce more state, the set of collections created during recovery.
    • Allow updates to `_changeNumRecords`/`_increaseDataSize` if the collection being updated is in this set.
  3. Another special case comes up when the collection exists in the stable checkpoint, but none of the writes made it into the stable checkpoint. When a collection is empty, as deemed by a cursor "findOne", the record store setup assumes its count should be zero. This is correct in a non-RTT world, but would violate the expectation that the WTSizeStorer is the authority of counts. This code would also needs to be adjusted.

These changes would keep WT counts accurate on clean shutdown, but not on rollback. SERVER-33493 is tracking changes for that purpose.



 Comments   
Comment by Githook User [ 07/Apr/18 ]

Author:

{'email': 'kyle.suarez@mongodb.com', 'name': 'Kyle Suarez', 'username': 'ksuarz'}

Message: SERVER-33488 conditionally update WT size metadata during startup recovery
Branch: master
https://github.com/mongodb/mongo/commit/fae36f1444627d28bd18e7395962078a729b940a

Comment by Daniel Gottlieb (Inactive) [ 02/Mar/18 ]

Existing tests that had coverage of this behavior and will likely be changed to temporarily use itcount:

 jstests/auth/upgrade_noauth_to_keyfile.js                |  6 +++---
 jstests/ssl/upgrade_noauth_to_x509_ssl.js                |  6 +++---
 jstests/ssl/upgrade_to_ssl.js                            |  8 ++++----
 jstests/ssl/upgrade_to_x509_ssl.js                       |  8 ++++----
 jstests/sslSpecial/upgrade_noauth_to_x509_nossl.js       |  4 ++--
 jstests/sslSpecial/upgrade_to_ssl_nossl.js               |  6 +++---
 jstests/sslSpecial/upgrade_to_x509_ssl_nossl.js          | 10 +++++-----

They should be reverted to using "fast-count" as part of this patch.

Comment by Judah Schvimer [ 28/Feb/18 ]

This ticket should add tests to ensure we maintain correct counts across clean restart.

Generated at Thu Feb 08 04:33:34 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.