[SERVER-38121] multikey index ops in a transaction can cause secondaries to hang Created: 13/Nov/18  Updated: 28/Nov/18  Resolved: 19/Nov/18

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 4.1.5
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Randolph Tan Assignee: Siyuan Zhou
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-37199 Yield locks of transactions in second... Closed
Participants:

 Description   

If a replication batch has an operation that causes the setIndexIsMultikey to get triggered, it can block indefinitely trying to acquire the database X lock if a prepared transaction is holding on it.



 Comments   
Comment by Judah Schvimer [ 28/Nov/18 ]

In that case, multikey writes probably do require the DB X lock. The "finer grained locking for DDL ops" project can address that if it wants, but SERVER-37199 fixes this without changing the current DB X lock.

Comment by Geert Bosch [ 27/Nov/18 ]

We rely on the DB X lock to ensure that collection catalog data is not changed concurrently with readers accessing it. If the data you're changing is otherwise protected against concurrent access, you shouldn't need it.

Comment by Judah Schvimer [ 20/Nov/18 ]

siyuan.zhou, I agree a database IX lock would be sufficient. This write is to the catalog record for the collection, so I believe a collection X lock is required, geert.bosch may be able to answer better.

Comment by Siyuan Zhou [ 20/Nov/18 ]

judah.schvimer, I believe the code Randolph referring to is this line. However, I don't see why an X lock on database is necessary. Can we acquire the database lock in IX mode instead?

Moreover, it seems the lock mode of setMultikey on primary is nothing different than normal insert, so I assume it's IX on both database and collection. Can we make the secondary the same as the primary since we are already here?

Nevertheless, SERVER-37199 will fix it.

Comment by Siyuan Zhou [ 19/Nov/18 ]

I believe this will be solved by SERVER-37199. Marking a dup of it. Will add a test case for this.

Comment by Tess Avitabile (Inactive) [ 19/Nov/18 ]

siyuan.zhou, we hope that this will go away if we yield prepared transaction's locks on secondaries. Can you please confirm (and test) this in your implementation?

Comment by Randolph Tan [ 13/Nov/18 ]

Sample oplog entry seen in practice:

{
  ts: Timestamp(1542146041, 83),
  t: 1,
  h: -307928861360103986,
  v: 2,
  op: "c",
  ns: "admin.$cmd",
  wall: new Date(1542146041519),
  lsid: { id: UUID("8ed1445e-b7ef-467d-b032-81a6be08337c"), uid: BinData(0, E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649B934CA495991B7852B855) },
  txnNumber: 0,
  stmtId: 0,
  prevOpTime: { ts: Timestamp(0, 0), t: -1 },
  prepare: true,
  o: {
    applyOps: [
       { op: "i", ns: "test.jstests_type3", ui: UUID("329df5ae-f742-44c0-96f7-4e5bfce00bb4"), o: { _id: ObjectId('5beb47f9cf8ae2991627c378'), a: [ [ "c" ] ] } },
       { op: "d", ns: "test.jstests_type3", ui: UUID("329df5ae-f742-44c0-96f7-4e5bfce00bb4"), o: { _id: ObjectId('5beb47f9cf8ae2991627c378') } },
       { op: "i", ns: "test.jstests_type3", ui: UUID("329df5ae-f742-44c0-96f7-4e5bfce00bb4"), o: { _id: ObjectId('5beb47f9cf8ae2991627c37d'), a: function () { var a = 0; } } },
       { op: "d", ns: "test.jstests_type3", ui: UUID("329df5ae-f742-44c0-96f7-4e5bfce00bb4"), o: { _id: ObjectId('5beb47f9cf8ae2991627c37d') } },
       { op: "i", ns: "test.jstests_type3", ui: UUID("329df5ae-f742-44c0-96f7-4e5bfce00bb4"), o: { _id: ObjectId('5beb47f9cf8ae2991627c37f'), a: Timestamp(1542146041, 82) } },
       { op: "d", ns: "test.jstests_type3", ui: UUID("329df5ae-f742-44c0-96f7-4e5bfce00bb4"), o: { _id: ObjectId('5beb47f9cf8ae2991627c37f') } },
       { op: "i", ns: "test.jstests_type3", ui: UUID("329df5ae-f742-44c0-96f7-4e5bfce00bb4"), o: { _id: ObjectId('5beb47f9cf8ae2991627c381'), a: new Date(1542146041517) } }
    ],
    prepare: true
  }
}

Generated at Thu Feb 08 04:48:01 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.