[SERVER-74684] Size storer not being flushed periodically Created: 07/Mar/23  Updated: 12/Jan/24  Resolved: 03/Apr/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 6.2.0-rc0, 6.3.0-rc0
Fix Version/s: 7.0.0-rc0, 6.3.1

Type: Bug Priority: Major - P3
Reporter: Gregory Noma Assignee: Dianna Hohensee (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
is depended on by SERVER-75326 Add support for flushing sizeStorer t... Closed
Problem/Incident
is caused by SERVER-69363 Ident reaper to handle failed ident d... Closed
Related
Assigned Teams:
Storage Execution
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v6.3, v6.2
Steps To Reproduce:
  1. Insert some documents
  2. Wait >60 seconds
  3. Crash the server
  4. Fast count does not reflect inserts
Sprint: Execution Team 2023-04-03, Execution Team 2023-04-17
Participants:
Linked BF Score: 120

 Description   

It appears that the size storer isn't writing to the sizeStorer table periodically like it should be, either based on number of inserts or on amount of time elapsed.



 Comments   
Comment by Githook User [ 14/Apr/23 ]

Author:

{'name': 'Dianna Hohensee', 'email': 'dianna.hohensee@mongodb.com', 'username': 'DiannaHohensee'}

Message: SERVER-74684 Return logic to periodically flush the size storer

(cherry picked from commit 23f1d4b9a6cef4550be9f936c7d62547dac32945)
Branch: v6.3
https://github.com/mongodb/mongo/commit/28abbf6dc6faef55d8b9c08f4aeadcc649244eb8

Comment by Dianna Hohensee (Inactive) [ 05/Apr/23 ]

Backups look like they are smart about calling syncSizeInfo: all the backup functions – e.g. beginBackup, extendBackupCursor – in the WiredTigerKVEngine appear to call sync before running. recoverToStableTimestamp as well. Clean shutdown still syncs as always, too, to be clear.

Comment by Dianna Hohensee (Inactive) [ 05/Apr/23 ]

Ah, my bad, I didn't originally think of v6.3. We've got a backport to v6.3 scheduled.

Collection counts and data sizes were and are again only flushed to storage every 60 seconds or roughly 100,000 writes, whichever is sooner. The periodic trigger is either 100,000 sync checks (called by WTSessionCache::releaseSession()) or 60 seconds (but then a write has to call WTSessionCache::releaseSession() to provoke the sync check).

Yes, I think v6.3 servers will have very inaccurate counts and data sizes if they were running for a while, receiving a lot of writes, and there was an unclean shutdown. 60 seconds / 100,000 writes periods can also result in quite a bit of skew.

Comment by Eric Milkie [ 04/Apr/23 ]

Also does this mean any unclean shutdowns, or backup snapshots, now can have exceedingly inaccurate collection counts and data sizes, whereas before they were only inaccurate to approximately a second's worth of writes?

Comment by Eric Milkie [ 04/Apr/23 ]

I'm confused.  Can someone fill in an accurate "Affects Versions" field for this ticket?  Dianna says in a comment that it "only changed in v7.0", but this ticket is also linked as "caused by SERVER-69363", which was released in 6.2.

Comment by Githook User [ 31/Mar/23 ]

Author:

{'name': 'Dianna Hohensee', 'email': 'dianna.hohensee@mongodb.com', 'username': 'DiannaHohensee'}

Message: SERVER-74684 Return logic to periodically flush the size storer
Branch: master
https://github.com/mongodb/mongo/commit/23f1d4b9a6cef4550be9f936c7d62547dac32945

Comment by Dianna Hohensee (Inactive) [ 29/Mar/23 ]

Some good news. It looks like this only changed in v7.0. SERVER-69363 removed some code that appears to (very) sneakily trigger flushes.

When a session is released (WTSessionCache code), there is a check about dropping some queued idents – I'm not familiar with what the purpose of this code is, it appears to have originated when integration for WT was added way back when. And then in the check to see whether there are idents queued to drop, the sizeStorer is flushed on a periodic basis.

Generated at Thu Feb 08 06:28:10 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.