Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-6269

Dead lock in accessing txn_global->visibility_rwlock in commit and las commit

    • Type: Icon: Bug Bug
    • Resolution: Won't Fix
    • Priority: Icon: Critical - P2 Critical - P2
    • None
    • Affects Version/s: None
    • Component/s: None

      There is a deadlock problem in txn_global->visibility_rwlock lock. Following is the sequence of steps where the issue can happen:

      1. A thread first took the read lock is the thread that is performing the commit operation. As part of commit operation, it is resolving the prepared updates, it leads to evict a page to replace it with a page that is required. During this eviction process, the contents of that page need to write to the lookaside file.
      2. By that time, the checkpoint thread tries to perform the checkpoint and it tries to take the write lock on txn_global->visibility_rwlock, but as the lock is already taken by the thread, the checkpoint thread waits.
      3. Now the thread that is already taken the lock is trying to finish the lookaside write and as part of the lookaside commit operation, it tries to take the read lock on txn_global->visibility_rwlock, but taking further read locks on that as the checkpoint writer is waiting on that lock.

      Thread take lock -> checkpoint waiting for Thread -> Thread again wants to take the lock, but checkpoint prevents it.

      This problem cannot occur in 4.4 due to changes with durable history. As part of durable history changes, the need for the transaction that is required to perform the history store changes are not required, so these issues of deadlock cannot occur anymore. Still, this problem can happen in MongoDB 4.2.

            Assignee:
            haribabu.kommi@mongodb.com Haribabu Kommi
            Reporter:
            haribabu.kommi@mongodb.com Haribabu Kommi
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: