[SERVER-33730] Update Linkbench to use transaction ids for doTxn commands. Created: 07/Mar/18  Updated: 29/Oct/23  Resolved: 25/Apr/18

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor - P4
Reporter: William Schultz (Inactive) Assignee: James O'Leary
Resolution: Fixed Votes: 0
Labels: Linkbench
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-32443 Create a sys-perf task for running li... Closed
Related
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Repl 2018-03-26, Performance 2018-04-23, Performance 2018-05-07
Participants:
Linked BF Score: 19

 Description   

Running Linkbench currently fails against master with this error:

Error: Command failed with error 72: 'doTxn can only be run with a transaction ID.'

This is due to the changes from SERVER-32320, specifically this commit. We should update it so that doTxn commands run inside a session and pass transaction ids correctly.



 Comments   
Comment by Spencer Brody (Inactive) [ 17/Apr/18 ]

I skimmed it, nothing jumped out at me. Will defer to Will for the rest of the review.

Comment by William Schultz (Inactive) [ 16/Apr/18 ]

jim.oleary I took a first pass at the CR. Mostly looks good, just a few cosmetic suggestions and questions from my end. As far as the crashes you are seeing locally, the symptoms look very similar to those shown in BF-8742, which is, at present, unresolved. You may indeed be running into a bug on master that needs to be fixed. Hopefully we will get some insight on the root cause of that issue relatively soon. Thanks.

Comment by James O'Leary [ 13/Apr/18 ]

william.schultz / spencer can you guys have a look at CR.

Thanks.

Comment by James O'Leary [ 13/Apr/18 ]

While testing locally, I encountered multiple issues.

  1. Secondary Crash #1:

    2018-04-13T11:10:36.826+0100 E -        [repl writer worker 3] Assertion: Location10334: BSONObj size: 0 (0x0) is invalid. Size must be between 0 and 16793600(16MB) First element: EOO src/mongo/bson/bsonobj.cpp 101
    2018-04-13T11:10:36.830+0100 F REPL     [repl writer worker 3] writer worker caught exception: Location10334: BSONObj size: 0 (0x0) is invalid. Size must be between 0 and 16793600(16MB) First element: EOO on: { lsid: { id: UUID("59541402-f9b2-4c4c-8e3d-b53a83072619"), uid: BinData(0, E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649B934CA495991B7852B855) }, txnNumber: 223162, op: "u", ns: "linkdb0.counttable", ui: UUID("079c92f6-3a81-4b62-96e1-9f0f55a29b22"), o: { $v: 1, $set: { count: 0, time: 1523614236817, version: 3 } }, o2: { _id: { id1: 8506004, link_type: 123456790 } }, ts: Timestamp(1523614236, 337), t: 1, h: -6638644240128923138, v: 2, wall: new Date(1523614236810), stmtId: 0, prevOpTime: { ts: Timestamp(0, 0), t: -1 } }
    2018-04-13T11:10:36.830+0100 F REPL     [rsSync-0] Failed to apply batch of operations. Number of operations in batch: 2. First operation: { lsid: { id: UUID("59541402-f9b2-4c4c-8e3d-b53a83072619"), uid: BinData(0, E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649B934CA495991B7852B855) }, txnNumber: 223162, op: "c", ns: "linkdb0.$cmd", o: { applyOps: [ { op: "u", ns: "linkdb0.linktable", ui: UUID("31ed02fd-179c-4216-8f13-efd02207aa7c"), o: { $v: 1, $set: { visibility: "0" } }, o2: { _id: { id1: 8506004, link_type: 123456790, id2: 8506004 } } }, { op: "u", ns: "linkdb0.counttable", ui: UUID("079c92f6-3a81-4b62-96e1-9f0f55a29b22"), o: { $v: 1, $set: { count: 0, time: 1523614236817, version: 3 } }, o2: { _id: { id1: 8506004, link_type: 123456790 } } } ] }, ts: Timestamp(1523614236, 337), t: 1, h: -6638644240128923138, v: 2, wall: new Date(1523614236810), stmtId: 0, prevOpTime: { ts: Timestamp(0, 0), t: -1 } }. Last operation: { lsid: { id: UUID("59541402-f9b2-4c4c-8e3d-b53a83072619"), uid: BinData(0, E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649B934CA495991B7852B855) }, txnNumber: 223163, op: "c", ns: "linkdb0.$cmd", o: { applyOps: [ { op: "i", ns: "linkdb0.linktable", ui: UUID("31ed02fd-179c-4216-8f13-efd02207aa7c"), o: { _id: { id1: 1208621, link_type: 123456790, id2: 1208660 }, id1: 1208621, link_type: 123456790, id2: 1208660, visibility: "1", version: 0, time: 1523614236818, data: BinData(0, 2C4E52233C43543242623A474A205B32214E262A2E414753) } }, { op: "u", ns: "linkdb0.counttable", ui: UUID("079c92f6-3a81-4b62-96e1-9f0f55a29b22"), o: { $v: 1, $set: { count: 3, time: 1523614236818, version: 1 } }, o2: { _id: { id1: 1208621, link_type: 123456790 } } } ] }, ts: Timestamp(1523614236, 338), t: 1, h: 5388033127531589455, v: 2, wall: new Date(1523614236810), stmtId: 0, prevOpTime: { ts: Timestamp(0, 0), t: -1 } }. Oplog application failed in writer thread 12: Location10334: BSONObj size: 0 (0x0) is invalid. Size must be between 0 and 16793600(16MB) First element: EOO
    2018-04-13T11:10:36.830+0100 F -        [rsSync-0] Fatal assertion 34437 Location10334: BSONObj size: 0 (0x0) is invalid. Size must be between 0 and 16793600(16MB) First element: EOO at src/mongo/db/repl/sync_tail.cpp 880
    2018-04-13T11:10:36.830+0100 F -        [rsSync-0] 
     
    ***aborting after fassert() failure
    

  2. Secondary Crash #2: and the server refuses start.

    2018-04-13T12:09:58.886+0100 E -        [repl writer worker 11] Assertion: Location10334: BSONObj size: 0 (0x0) is invalid. Size must be between 0 and 16793600(16MB) First element: EOO src/mongo/bson/bsonobj.cpp 101
    2018-04-13T12:09:58.895+0100 F REPL     [repl writer worker 11] writer worker caught exception: Location10334: BSONObj size: 0 (0x0) is invalid. Size must be between 0 and 16793600(16MB) First element: EOO on: { lsid: { id: UUID("42da017f-d0de-4a6c-b708-1acd79343f33"), uid: BinData(0, E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649B934CA495991B7852B855) }, txnNumber: 118641, op: "u", ns: "linkdb0.counttable", ui: UUID("bb62c7c7-b572-4c4e-b82a-b6c58e17dbb7"), o: { $v: 1, $set: { count: 1, time: 1523617798857, version: 3 } }, o2: { _id: { id1: 5660916, link_type: 123456790 } }, ts: Timestamp(1523617798, 325), t: 1, h: -3037865117135880662, v: 2, wall: new Date(1523617798854), stmtId: 0, prevOpTime: { ts: Timestamp(0, 0), t: -1 } }
    2018-04-13T12:09:58.895+0100 F REPL     [rsSync-0] Failed to apply batch of operations. Number of operations in batch: 6. First operation: { op: "u", ns: "linkdb0.nodetable", ui: UUID("2cf6cb9e-016f-452e-9d7f-c01cad6a49d4"), o: { $v: 1, $set: { data: BinData(0, CDC55DA645549CAA87454392A67B453AA084485DDC4766B0B6497878A076CC9BD19C9AABC498CC6860858FAFA1903B4AB5394384AF86B2996B70D3336BAB65B747A8CC677854...), time: 1523617798, version: 2 } }, o2: { _id: 70263 }, ts: Timestamp(1523617798, 324), t: 1, h: 8725014857418142761, v: 2, wall: new Date(1523617798854) }. Last operation: { op: "u", ns: "linkdb0.nodetable", ui: UUID("2cf6cb9e-016f-452e-9d7f-c01cad6a49d4"), o: { $v: 1, $set: { data: BinData(0, 81B899D0778DB25D6258D99D8DD95B9FC5C88E568B5C859EA7D180D1437FAE35C656AA8C38BA5D54), time: 1523617798, version: 2 } }, o2: { _id: 2870977 }, ts: Timestamp(1523617798, 329), t: 1, h: 2711843353491773794, v: 2, wall: new Date(1523617798864) }. Oplog application failed in writer thread 2: Location10334: BSONObj size: 0 (0x0) is invalid. Size must be between 0 and 16793600(16MB) First element: EOO
    2018-04-13T12:09:58.895+0100 F -        [rsSync-0] Fatal assertion 34437 Location10334: BSONObj size: 0 (0x0) is invalid. Size must be between 0 and 16793600(16MB) First element: EOO at src/mongo/db/repl/sync_tail.cpp 880
    2018-04-13T12:09:58.895+0100 F -        [rsSync-0] 
     
    ***aborting after fassert() failure
    

  3. Client Side exception, maybe I need to make the txn sizes smaller:

    com.mongodb.MongoCommandException: Command failed with error 10334: 'BSONObj size: 0 (0x0) is invalid. Size must be between 0 and 16793600(16MB) First element: EOO' on server localhost:27017. The full response is { "applied" : 2, "code" : 10334, "codeName" : "Location10334", "errmsg" : "BSONObj size: 0 (0x0) is invalid. Size must be between 0 and 16793600(16MB) First element: EOO", "results" : [false, false], "ok" : 0.0, "operationTime" : { "$timestamp" : { "t" : 1523614116, "i" : 603 } }, "$clusterTime" : { "clusterTime" : { "$timestamp" : { "t" : 1523614116, "i" : 603 } }, "signature" : { "hash" : { "$binary" : "AAAAAAAAAAAAAAAAAAAAAAAAAAA=", "$type" : "00" }, "keyId" : { "$numberLong" : "0" } } } }
    

Anecdotally running 'show collections' in a 3.6.3 shell may also have locked the server (while the request phase was running).

Comment by James O'Leary [ 13/Apr/18 ]

I've added a CR. I don't seem to be able to set any reviewers.

Any / all comments are welcome.

Comment by Gregory McKeon (Inactive) [ 12/Apr/18 ]

david.daly how is this progressing? Need anything from us?

Comment by David Daly [ 06/Apr/18 ]

Okay, grabbing and putting in our next sprint.

Comment by William Schultz (Inactive) [ 03/Apr/18 ]

I don't think this is particularly high priority right now, at least not more important than finishing up other transactions work. Is this correct spencer?

Generated at Thu Feb 08 04:34:24 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.