Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 4.4.17, 6.0.1, 5.0.11, 6.1.0-rc0
Affects Version/s: 4.2.0, 4.4.0, 5.0.0, 6.0.0-rc11
Component/s: Sharding
Labels:
- neweng
- sharding-nyc-subteam1

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Backport Requested:

v6.0, v5.0, v4.4, v4.2
Sprint:
Sharding 2022-07-25, Sharding 2022-08-08
Linked BF Score:
13
Story Points:
3
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

As part of chunk migration, the recipient shard writes op=n no-op oplog entries and updates its config.transactions records to account for retryable writes and transactions touching the range being migrated which were previously run on the donor shard. This procedure ensures a retryable write still cannot be performed a second time even after the chunk has been migrated and the retries are now targeted to the recipient shard.

A retryable findAndModify command stores, depending on the MongoDB version, a copy of the preImage or postImage document either (a) in an op=n no-op oplog entry or (b) in the config.image_collection collection on the donor shard. The recipient shard always writes an op=n no-op oplog entry containing the preImage or postImage document. One property of the op=n no-op oplog entry written by the recipient shard is that the oplog entry always have its 'o2' field fill in. In particular, for a preImage or postImage document the 'o2' field will be an empty BSONObj.

Notably, the primary of the recipient shard skips updating its config.transactions record when writing the op=n no-op oplog entry containing the preImage or postImage document. Instead the op=n no-op oplog entry encapsulating the originating update or delete for the preImage or postImage document will cause the config.transactions record on the primary of the recipient shard to be updated. However, the SessionUpdateTracker class used by secondaries to update the config.transactions record as part of secondary oplog application doesn't have symmetric behavior. A secondary of the recipient shard will update its config.transactions record when processing the op=n no-op oplog entry containing the preImage or postImage document because those oplog entries when written by session migration do have an 'o2' field.

// Ignore pre/post image no-op oplog entries. These entries will not have an o2 field.
>if (entry.getOpType() == OpTypeEnum::kNoop) {
    if (!entry.getFromMigrate() || !*entry.getFromMigrate()) {
        return {};
    }

    if (!entry.getObject2()) {
        return {};
    }
}

This bug does not enable retryable writes to be executed more than once. This is because the only way for the primary of the recipient shard to skip updating its config.transactions record is if the chunk migration ends up failing for some reason and the recipient shard never processes the oplog entry of the originating update or delete for the preImage or postImage document. However, the chunk migration failing means the range still belongs to the donor shard and so any retries will continue to be targeted to the donor shard, which will correctly not execute the retryable write more than once.

It can be helpful to see an example of what the oplog entries look like before and after session migration. The following oplog entries from the donor shard

{ lsid: { id: UUID(\"809ef603-fdf2-4911-9a74-89d9def7c5c4\"), uid: BinData(0, E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649B934CA495991B7852B855) }, txnNumber: 2, op: \"n\", ns: \"test.mycoll\", ui: UUID(\"93b18fb7-935d-40ac-bf81-aab5bcffb3aa\"), o: { _id: 0.0, x: 10.0, y: 2.0 }, stmtId: 0, ts: Timestamp(0, 0), t: -1, v: 2, wall: new Date(1656027741947), prevOpTime: { ts: Timestamp(0, 0), t: -1 } }
{ lsid: { id: UUID(\"809ef603-fdf2-4911-9a74-89d9def7c5c4\"), uid: BinData(0, E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649B934CA495991B7852B855) }, txnNumber: 2, op: \"u\", ns: \"test.mycoll\", ui: UUID(\"93b18fb7-935d-40ac-bf81-aab5bcffb3aa\"), o: { $v: 2, diff: { i: { y: 2.0 } } }, o2: { x: 10.0, _id: 0.0 }, needsRetryImage: \"postImage\", stmtId: 0, ts: Timestamp(1656027741, 109), t: 1, v: 2, wall: new Date(1656027741872), prevOpTime: { ts: Timestamp(0, 0), t: -1 } }

are transformed into the following oplog entries on the recipient shard.

{ lsid: { id: UUID(\"809ef603-fdf2-4911-9a74-89d9def7c5c4\"), uid: BinData(0, E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649B934CA495991B7852B855) }, txnNumber: 2, op: \"n\", ns: \"test.mycoll\", ui: UUID(\"93b18fb7-935d-40ac-bf81-aab5bcffb3aa\"), o: { _id: 0.0, x: 10.0, y: 2.0 }, o2: {}, stmtId: 0, fromMigrate: true, ts: Timestamp(0, 0), t: 1, v: 2, wall: new Date(1656027741947), prevOpTime: { ts: Timestamp(0, 0), t: -1 } }
{ lsid: { id: UUID(\"809ef603-fdf2-4911-9a74-89d9def7c5c4\"), uid: BinData(0, E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649B934CA495991B7852B855) }, txnNumber: 2, op: \"n\", ns: \"test.mycoll\", ui: UUID(\"93b18fb7-935d-40ac-bf81-aab5bcffb3aa\"), o: { $sessionMigrateInfo: 1 }, o2: { lsid: { id: UUID(\"809ef603-fdf2-4911-9a74-89d9def7c5c4\"), uid: BinData(0, E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649B934CA495991B7852B855) }, txnNumber: 2, op: \"u\", ns: \"test.mycoll\", ui: UUID(\"93b18fb7-935d-40ac-bf81-aab5bcffb3aa\"), o: { $v: 2, diff: { i: { y: 2.0 } } }, o2: { x: 10.0, _id: 0.0 }, needsRetryImage: \"postImage\", stmtId: 0, ts: Timestamp(1656027741, 109), t: 1, v: 2, wall: new Date(1656027741872), prevOpTime: { ts: Timestamp(0, 0), t: -1 } }, postImageOpTime: { ts: Timestamp(1656027741, 121), t: 1 }, stmtId: 0, fromMigrate: true, ts: Timestamp(0, 0), t: 1, v: 2, wall: new Date(1656027741872), prevOpTime: { ts: Timestamp(0, 0), t: -1 } }

is related to

SERVER-36004 SessionUpdateTracker should ignore no-op entries for pre/post image oplogs

Closed

Assignee:: Abdul Qadeer
Reporter:: Max Hirschhorn
Participants:: Abdul Qadeer, Githook User, Max Hirschhorn
Votes:: 0 Vote for this issue
Watchers:: 3 Start watching this issue

Created:: Jun 23 2022 11:46:47 PM UTC
Updated:: Oct 29 2023 09:36:30 PM UTC
Resolved:: Aug 02 2022 04:24:10 PM UTC
Confidence Status Last Update:: 18/Jul/22 3:15 PM

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates