[SERVER-54675] Round collection data size to zero if found to be negative on startup or coming out of replication rollback Created: 20/Feb/21  Updated: 29/Oct/23  Resolved: 25/Mar/21

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 4.4.7, 5.0.0-rc0

Type: Task Priority: Major - P3
Reporter: Max Hirschhorn Assignee: Gregory Noma
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
related to SERVER-74103 Increase storage log verbosity in opl... Closed
Backwards Compatibility: Fully Compatible
Backport Requested:
v4.4
Sprint: Execution Team 2021-03-22, Execution Team 2021-04-05
Participants:
Linked BF Score: 34

 Description   

Replication rollback may lead to the collection's reported data size to drift from its actual data size. As observed in the random_moveChunk_broadcast_delete_transaction.js FSM workload when running with stepdowns enabled, it is possible for the collection's reported data size to become negative from having overcounted the effects of the delete operations.

(Note that while Collection::dataSize() returns a uint64_t, it may actually represents a signed 64-bit integer.)

Overcounting the effects of the delete operations may cause a chunk migration to incorrectly fail with a ChunkTooBig error response due to calculating a nonsensical average document size of 737869762948382062 bytes. This is likely only an issue in testing because it effectively requires the workload to be deleting all of the documents in the collection.

[fsm_workload_test:random_moveChunk_broadcast_delete_transaction] 2021-02-18T05:36:26.438+0000         	"errmsg" : "Cannot move chunk: the maximum number of documents for a chunk is 0, the maximum chunk size is 67108864, average document size is 737869762948382062. Found 25 documents in chunk  ns: test98_fsmdb0.fsmcoll0 { skey: 200.0 } -> { skey: 300.0 }",

// Use the average object size to estimate how many objects a full chunk would carry do that
// while traversing the chunk's range using the sharding index, below there's a fair amount of
// slack before we determine a chunk is too large because object sizes will vary.
unsigned long long maxRecsWhenFull;
long long avgRecSize;
 
const long long totalRecs = collection->numRecords(opCtx);
if (totalRecs > 0) {
    avgRecSize = collection->dataSize(opCtx) / totalRecs;
    // The calls to numRecords() and dataSize() are not atomic so it is possible that the data
    // size becomes smaller than the number of records between the two calls, which would result
    // in average record size of zero
    if (avgRecSize == 0) {
        avgRecSize = BSONObj::kMinBSONLength;
    }
    maxRecsWhenFull = _args.getMaxChunkSizeBytes() / avgRecSize;
    maxRecsWhenFull = 130 * maxRecsWhenFull / 100;  // pad some slack
} else {
    avgRecSize = 0;
    maxRecsWhenFull = kMaxObjectPerChunk + 1;
}

https://github.com/mongodb/mongo/blob/dbf6cdde5434c5e0fe7d6435fbe74b5da53595d4/src/mongo/db/s/migration_chunk_cloner_source_legacy.cpp#L856



 Comments   
Comment by Githook User [ 17/May/21 ]

Author:

{'name': 'Gregory Noma', 'email': 'gregory.noma@gmail.com', 'username': 'gregorynoma'}

Message: SERVER-54675 Avoid storing negative values in WiredTigerSizeStorer::SizeInfo

(cherry picked from commit 9989aca5d45bf2d7c21f69d8a0192347c39cfcb5)
Branch: v4.4
https://github.com/mongodb/mongo/commit/94248c51d414a7354045573879847fc4311cfa39

Comment by Githook User [ 25/Mar/21 ]

Author:

{'name': 'Gregory Noma', 'email': 'gregory.noma@gmail.com', 'username': 'gregorynoma'}

Message: SERVER-54675 Avoid storing negative values in WiredTigerSizeStorer::SizeInfo
Branch: master
https://github.com/mongodb/mongo/commit/9989aca5d45bf2d7c21f69d8a0192347c39cfcb5

Comment by Daniel Gottlieb (Inactive) [ 22/Feb/21 ]

I think the storage/collection layer should prevent underflow and return a 0 here.

But for completeness, I think it's worth being clear that data size will drift every time we crash/rollback data on a collection (and correcting that would be a cost payed by restart/rollback recovery). For example, I would expect insert-only collections to report a larger data size than what the collection actually contains. Every rollback would increase the error (in an absolute number of bytes way). Hopefully that's not a concern for other related calculations sharding/moveChunk makes.

Comment by Max Hirschhorn [ 22/Feb/21 ]

That's a fair point kaloian.manassiev. I had been under the impression there wouldn't be an appetite to make data size accurate following replication rollback and didn't think about asking the storage execution team for anything else.

geert.bosch, louis.williams, daniel.gottlieb, what do you all think about clamping data size to zero if the value is found to be negative on startup or coming out of replication rollback?

Comment by Kaloian Manassiev [ 22/Feb/21 ]

It seems like a bug to me that after recovery, Collection::dataSize can return a non-sensical value. Shouldn't this be an execution/storage bug instead?

Generated at Thu Feb 08 05:34:11 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.