[SERVER-35431] rollback does not correct sizeStorer data sizes Created: 05/Jun/18 Updated: 06/Dec/22 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Judah Schvimer | Assignee: | Backlog - Storage Execution Team |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | pm-1820 | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Assigned Teams: |
Storage Execution
|
||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||
| Linked BF Score: | 18 | ||||||||||||||||||||||||||||
| Description |
|
We just keep the data size the same when we recover to a stable timestamp instead of correcting it like we do with counts: https://github.com/mongodb/mongo/blob/f757bc52b926943bc748f0dc33173ab16e980f61/src/mongo/db/repl/storage_interface_impl.cpp#L1025-L1028 This means that the size reported in collStats will be wrong, it also can have the side effect of slowly decreasing the effective size of a capped collection, since the system will think it's more full than it actually is. Validate will fix the size. |
| Comments |
| Comment by Geert Bosch [ 05/Feb/20 ] |
|
This ticket really is two issues:
A newly added node may have significantly less fragmentation and better compression than a long-lived node that has processed lots of remove and update operations. Deciding chunk migration based on storageSize could lead to unstable behavior where chunks move back and forth depending on which node of a replicaset is used to find the storageSize of a collection. Additionally dataSize is important as it determines memory pressure for data access. If we'd balance to shards to both have a storageSize of 100 GB, but one uncompresses to 300 GB and the other to 600 GB it is likely that the latter node will perform much worse as it can cache a much smaller fraction of its data. The expectation is that over time storage sizes will balance out. |
| Comment by Kaloian Manassiev [ 29/Aug/18 ] |
|
The enableSharding command uses the totalSize field from listDatabases. Looking at listDatabases, this value is derived from DatabaseCatalogEntry::sizeOnDisk, which eventually calls into RecordStore::storageSize. So I guess the answer to your question is that the primary shard selection uses storageSize and not dataSize. |
| Comment by Alyson Cabral (Inactive) [ 28/Aug/18 ] |
|
spencer or kaloian.manassiev do we know if we choose the primary shard for a database with dataSize or storageSize? https://docs.mongodb.com/manual/core/sharded-cluster-shards/ I believe we spoke about this in person, but just so it's captured here, in addition to capped collections becoming the incorrect size, these numbers are also used in balancing. |
| Comment by Michael Cahill (Inactive) [ 13/Jun/18 ] |
|
We can address this issue for capped collections without dramatic changes such as For general collections, we could reduce the drift by (a) accounting for inserts that are rolled back and (b) estimating the effect of deletes on the data size (e.g., by estimating that all deleted documents are the average document size). We don't have enough information (either in the oplog or efficiently available in WiredTiger) to deal with all size-changing updates, but we should be able to avoid systematic drift. |
| Comment by Gregory McKeon (Inactive) [ 07/Jun/18 ] |
|
spencer to follow up with milkie to see if there's a possible fix for this in the storage layer. |