[SERVER-24266] Large discrepancy in SizeStorer after unclean shutdown Created: 24/May/16 Updated: 10/Jun/23 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | WiredTiger |
| Affects Version/s: | 3.0.12, 3.2.6, 3.3.6 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Linda Qin | Assignee: | Backlog - Storage Execution Team |
| Resolution: | Unresolved | Votes: | 1 |
| Labels: | pm-1820 | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||
| Assigned Teams: |
Storage Execution
|
||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||
| Steps To Reproduce: |
|
||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||
| Description |
|
The mongod instance (with WiredTiger storage engine) is down when there is insufficient disk space for the data volume. After adding more space to the dbpath and restart the mongod instance, count() returns incorrect value as below:
Have tested 3.0.12, 3.2.6 and 3.3.6, all have the same issue. db.collection.validate(true) will reset the value for count() to the correct number. |
| Comments |
| Comment by Michael Cahill (Inactive) [ 01/Jun/16 ] |
|
ravind.kumar, yes, that is a more accurate statement. The issue is the updates performed during storage engine recovery (or with nojournal, the updates in between the last sizeStorer sync and the final checkpoint). Those updates are not reflected in the "size storer" table (because the storage engine doesn't know that size storer is special). So the maximum drift is determined by the time since the last flush of the size storer (every 60 seconds assuming there is any storage engine activity at all) combined with the the time since the last checkpoint completed (and checkpoints are triggered every 60 seconds by default). The two conditions make the 60 second upper bound somewhat fuzzy, but it should hold in all but the most extreme cases. |
| Comment by Ravind Kumar (Inactive) [ 31/May/16 ] |
|
michael.cahill, would it be accurate to say "size and count may drift by the number of updates performed within up to 60 seconds of the unclean shutdown." ? I'm assuming that, if there was an unclean shutdown at T = 61, updates at T = 1 would have been fine, and if no writes occurred from T = 1 to T = 60, then there generally should not be any drift. Assuming the flush period synced exactly with the unclean shutdown. |
| Comment by Daniel Pasette (Inactive) [ 26/May/16 ] |
|
Yes. That restores the correct size and count information. There is a slight change in that starting in 3.2, it is no longer required to run a "full" validate to update the size/count info. See: |
| Comment by Ravind Kumar (Inactive) [ 26/May/16 ] |
|
The documentation currently suggests using validate. Is this still a valid solution? |
| Comment by Daniel Pasette (Inactive) [ 26/May/16 ] |
|
I have been walking around with the idea that the sizeStorer was being flushed every 1000 docs. Thanks for the clarification. |
| Comment by Michael Cahill (Inactive) [ 26/May/16 ] |
|
pasette, unless I'm misunderstanding the code, we update the in-memory cache every 1000 document writes, but the updates only become persistent when WiredTigerKVEngine::syncSizeInfo is called, most commonly every 60 seconds as part of WiredTigerKVEngine::haveDropsQueued. We could get more aggressive about flushing the cache and put a time bound on this instead (e.g. every second we scan all the collections and flush any changes). I suspect there may be issues writing to size storer as part of the unit of work that triggered the size update (since that update could abort or just stall, blocking further updates to the size storer table). |
| Comment by Daniel Pasette (Inactive) [ 25/May/16 ] |
|
The sizeStorer is meant to persist the document count and size every 1000 document writes. If the journal is enabled, this should result in up to ~1000 documents drift (this is not totally exact because of the window in which the journal commits is not synchronized to the 1000 document count). How could this condition occur where documents are written to the data store but not the sizeStorer? |