[SERVER-24266] Large discrepancy in SizeStorer after unclean shutdown Created: 24/May/16  Updated: 10/Jun/23

Status: Backlog
Project: Core Server
Component/s: WiredTiger
Affects Version/s: 3.0.12, 3.2.6, 3.3.6
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Linda Qin Assignee: Backlog - Storage Execution Team
Resolution: Unresolved Votes: 1
Labels: pm-1820
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by SERVER-20063 WiredTiger collection size cache can ... Closed
Related
related to DOCS-7979 Size and count in collection stats ar... Closed
is related to SERVER-45718 Index builder can skip index building... Closed
Assigned Teams:
Storage Execution
Operating System: ALL
Steps To Reproduce:
  1. Start a standalone mongod instance with wiredTiger storage engine.
  2. Insert documents to a collection until the data volume runs out of disk space.
  3. Allocate/free more disk space for the data volume.
  4. Start the mongod instance again.
  5. Check the results of count() and itcount() for the collection. The numbers are different.
Participants:
Case:

 Description   

The mongod instance (with WiredTiger storage engine) is down when there is insufficient disk space for the data volume. After adding more space to the dbpath and restart the mongod instance, count() returns incorrect value as below:

> db.users.count()
1527296
> db.users.find().itcount()
1605004

SERVER-19472 and DOCS-5682 mentioned that count may off by up to 1000 documents after an unclean shutdown. However, count off to 70k+ documents in my test.

Have tested 3.0.12, 3.2.6 and 3.3.6, all have the same issue.

db.collection.validate(true) will reset the value for count() to the correct number.



 Comments   
Comment by Michael Cahill (Inactive) [ 01/Jun/16 ]

ravind.kumar, yes, that is a more accurate statement.

The issue is the updates performed during storage engine recovery (or with nojournal, the updates in between the last sizeStorer sync and the final checkpoint). Those updates are not reflected in the "size storer" table (because the storage engine doesn't know that size storer is special). So the maximum drift is determined by the time since the last flush of the size storer (every 60 seconds assuming there is any storage engine activity at all) combined with the the time since the last checkpoint completed (and checkpoints are triggered every 60 seconds by default).

The two conditions make the 60 second upper bound somewhat fuzzy, but it should hold in all but the most extreme cases.

Comment by Ravind Kumar (Inactive) [ 31/May/16 ]

michael.cahill, would it be accurate to say "size and count may drift by the number of updates performed within up to 60 seconds of the unclean shutdown." ? I'm assuming that, if there was an unclean shutdown at T = 61, updates at T = 1 would have been fine, and if no writes occurred from T = 1 to T = 60, then there generally should not be any drift. Assuming the flush period synced exactly with the unclean shutdown.

Comment by Daniel Pasette (Inactive) [ 26/May/16 ]

Yes. That restores the correct size and count information. There is a slight change in that starting in 3.2, it is no longer required to run a "full" validate to update the size/count info. See: SERVER-20286

Comment by Ravind Kumar (Inactive) [ 26/May/16 ]

The documentation currently suggests using validate. Is this still a valid solution?

Comment by Daniel Pasette (Inactive) [ 26/May/16 ]

I have been walking around with the idea that the sizeStorer was being flushed every 1000 docs. Thanks for the clarification.
We should immediately fix the documentation (DOCS-7979). We should leave this open as a placeholder to decide whether or not we want to invest effort in tightening the time between persisting to disk.

Comment by Michael Cahill (Inactive) [ 26/May/16 ]

pasette, unless I'm misunderstanding the code, we update the in-memory cache every 1000 document writes, but the updates only become persistent when WiredTigerKVEngine::syncSizeInfo is called, most commonly every 60 seconds as part of WiredTigerKVEngine::haveDropsQueued.

We could get more aggressive about flushing the cache and put a time bound on this instead (e.g. every second we scan all the collections and flush any changes). I suspect there may be issues writing to size storer as part of the unit of work that triggered the size update (since that update could abort or just stall, blocking further updates to the size storer table).

Comment by Daniel Pasette (Inactive) [ 25/May/16 ]

The sizeStorer is meant to persist the document count and size every 1000 document writes. If the journal is enabled, this should result in up to ~1000 documents drift (this is not totally exact because of the window in which the journal commits is not synchronized to the 1000 document count). How could this condition occur where documents are written to the data store but not the sizeStorer?

Generated at Thu Feb 08 04:05:51 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.