Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-1575

Deadlock opening multiple cursors in parallel

    • Type: Icon: Task Task
    • Resolution: Done
    • WT2.5.1
    • Affects Version/s: None
    • Component/s: None
    • Labels:

      If an application opens multiple statistics cursors in parallel WiredTiger can end up deadlocked between the DHANDLE and SCHEMA locks.

      Example stack traces - full stack traces are available in JIRA SERVER-16738 ticket.

      Thread 1:

      WT-2  0x00007ffff7bc6480 in __GI___pthread_mutex_lock (mutex=0x3420400) at ../nptl/pthread_mutex_lock.c:79
      WT-3  0x0000000001ef5262 in __wt_spin_lock (session=0x3547280, t=0x3420400) at src/third_party/wiredtiger/src/include/mutex.i:175
      WT-4  0x0000000001ef6000 in __wt_session_get_btree (session=0x3547280, uri=0x1593d000b "file:index-491659-4306617738107441063.wt", checkpoint=0x0, cfg=0x7ffff0bacec0, flags=8)
          at src/third_party/wiredtiger/src/session/session_dhandle.c:397
      WT-5  0x0000000001ef58c0 in __wt_session_get_btree_ckpt (session=0x3547280, uri=0x1593d000b "file:index-491659-4306617738107441063.wt", cfg=0x7ffff0bacec0, flags=0)
          at src/third_party/wiredtiger/src/session/session_dhandle.c:229
      WT-6  0x0000000001e9a2da in __curstat_file_init (session=0x3547280, uri=0x1593d000b "file:index-491659-4306617738107441063.wt", cfg=0x7ffff0bacec0, cst=0xf441be00)
          at src/third_party/wiredtiger/src/cursor/cur_stat.c:379
      

      Thread 2:

      WT-3  0x0000000001ef5262 in __wt_spin_lock (session=0x35459c0, t=0x34204c0) at src/third_party/wiredtiger/src/include/mutex.i:175
      WT-4  0x0000000001ef5f93 in __wt_session_get_btree (session=0x35459c0, uri=0x141baf9f0 "file:collection-491658-4306617738107441063.wt", checkpoint=0x0, cfg=0x0, flags=8)
          at src/third_party/wiredtiger/src/session/session_dhandle.c:397
      WT-5  0x0000000001e7cb70 in __conn_btree_apply_internal (session=0x35459c0, dhandle=0x1041edc00, func=0x1e9a1fb <__curstat_checkpoint>, cfg=0x7ffff13b4b20)
          at src/third_party/wiredtiger/src/conn/conn_dhandle.c:484
      WT-6  0x0000000001e7cd26 in __wt_conn_btree_apply (session=0x35459c0, apply_checkpoints=1, uri=0x141baf9f0 "file:collection-491658-4306617738107441063.wt", func=0x1e9a1fb <__curstat_checkpoint>,
          cfg=0x7ffff13b4b20) at src/third_party/wiredtiger/src/conn/conn_dhandle.c:526
      WT-7  0x0000000001e9a4a1 in __curstat_file_init (session=0x35459c0, uri=0x172e0c00b "file:collection-491658-4306617738107441063.wt", cfg=0x7ffff13b4ef0, cst=0xf62ac000)
          at src/third_party/wiredtiger/src/cursor/cur_stat.c:413
      

      The sequence of events leading up to the deadlock is:

      • Thread 2 grabs the DHANDLE lock in *curstat_file_init before calling *wt_conn_btree_apply.
      • Thread 1 doesn't hold the DHANDLE lock and calls session_get_btree, which grabs the SCHEMA lock, then waits on the DHANDLE lock that is held by thread 2.
      • Thread 2 ends up in *wt_session_get_btree while already holding the DHANDLE lock. *wt_session_get_btree attempts to get the SCHEMA lock which is held by thread 1.

      Thread 2 is the "problem" thread - all other operations that need both the SCHEMA and DHANDLE locks get the SCHEMA lock first. It's probably enough to change __curstat_file_init so that it takes both the SCHEMA and DHANDLE locks at the start.

            Assignee:
            Unassigned Unassigned
            Reporter:
            alexander.gorrod@mongodb.com Alexander Gorrod
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: