Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 4.4.7, 5.0.0-rc0
Affects Version/s: None
Component/s: Sharding
Labels:
None

Backwards Compatibility:
Fully Compatible
Backport Requested:

v4.4
Sprint:
Execution Team 2021-03-22, Execution Team 2021-04-05
Linked BF Score:
34
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Replication rollback may lead to the collection's reported data size to drift from its actual data size. As observed in the random_moveChunk_broadcast_delete_transaction.js FSM workload when running with stepdowns enabled, it is possible for the collection's reported data size to become negative from having overcounted the effects of the delete operations.

(Note that while Collection::dataSize() returns a uint64_t, it may actually represents a signed 64-bit integer.)

Overcounting the effects of the delete operations may cause a chunk migration to incorrectly fail with a ChunkTooBig error response due to calculating a nonsensical average document size of 737869762948382062 bytes. This is likely only an issue in testing because it effectively requires the workload to be deleting all of the documents in the collection.

[fsm_workload_test:random_moveChunk_broadcast_delete_transaction] 2021-02-18T05:36:26.438+0000         	"errmsg" : "Cannot move chunk: the maximum number of documents for a chunk is 0, the maximum chunk size is 67108864, average document size is 737869762948382062. Found 25 documents in chunk  ns: test98_fsmdb0.fsmcoll0 { skey: 200.0 } -> { skey: 300.0 }",

// Use the average object size to estimate how many objects a full chunk would carry do that
// while traversing the chunk's range using the sharding index, below there's a fair amount of
// slack before we determine a chunk is too large because object sizes will vary.
unsigned long long maxRecsWhenFull;
long long avgRecSize;

const long long totalRecs = collection->numRecords(opCtx);
if (totalRecs > 0) {
    avgRecSize = collection->dataSize(opCtx) / totalRecs;
    // The calls to numRecords() and dataSize() are not atomic so it is possible that the data
    // size becomes smaller than the number of records between the two calls, which would result
    // in average record size of zero
    if (avgRecSize == 0) {
        avgRecSize = BSONObj::kMinBSONLength;
    }
    maxRecsWhenFull = _args.getMaxChunkSizeBytes() / avgRecSize;
    maxRecsWhenFull = 130 * maxRecsWhenFull / 100;  // pad some slack
} else {
    avgRecSize = 0;
    maxRecsWhenFull = kMaxObjectPerChunk + 1;
}

https://github.com/mongodb/mongo/blob/dbf6cdde5434c5e0fe7d6435fbe74b5da53595d4/src/mongo/db/s/migration_chunk_cloner_source_legacy.cpp#L856

related to

SERVER-74103 Increase storage log verbosity in oplog_sampling.js

Closed

Assignee:: Gregory Noma
Reporter:: Max Hirschhorn
Participants:: Daniel Gottlieb, Githook User, Gregory Noma, Kaloian Manassiev, Max Hirschhorn
Votes:: 0 Vote for this issue
Watchers:: 5 Start watching this issue

Created:: Feb 20 2021 12:49:40 AM UTC
Updated:: Oct 29 2023 09:57:14 PM UTC
Resolved:: Mar 25 2021 02:51:59 PM UTC
Confidence Status Last Update:: 23/Mar/21 10:55 PM

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates