Details

    • Type: Task
    • Status: Closed
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: WT2.5.2
    • Labels:

      Description

      Hi!

      Sometimes when converting DB from BDB to WiredTiger we get EBUSY when trying to truncate() index file.
      The conversion code basically looks like the following:

      =============================
      system("rm -rf \"" WT_HOME "\" && mkdir \"" WT_HOME "\"");

      wiredtiger_open(WT_HOME, NULL, "create,checkpoint=(wait=30)", &conn);

      conn->open_session(conn, NULL, NULL, &session);
      session->create(session, "table:main", "key_format=u,value_format=u,columns=(k,v)");
      session->create(session, "index:main:index", "columns=(v)");
      session->create(session, "index:main:index1", "columns=(v)");

      /_ open BDB table, create cursor _/

      while (/_ BDB cursor next() returns records _/) {
      WT_ITEM ki, vi;
      /_ fill ki and vi from BDB record _/

      cursor->set_key(cursor, &ki);
      cursor->set_value(cursor, &vi);
      cursor->insert(cursor);
      }

      cursor->close(cursor);

      session->truncate(session, "file:main_index.wti", NULL, NULL, NULL);

      ============================= END

      The last truncate() call fails sometimes with EBUSY (Device or resource busy)
      In cases where I saw this problem conversion takes quite a long time so probably it's somehow related to periodic checkpoints configured with 30 sec period.
      I was able to reproduce it only on both 2.5.0 and develop.

      Please advise!

        Issue Links

          Activity

          Hide
          sueloverso Sue Loverso added a comment -

          FTR, that commit reference was a typo and should have been 1634, not this issue.

          Show
          sueloverso Sue Loverso added a comment - FTR, that commit reference was a typo and should have been 1634, not this issue.
          Hide
          ddanderson Don Anderson (Inactive) added a comment -

          @dmitri-shubin, thanks for this report. I am able to reproduce this, and I can see why this happens. The truncate operation needs an exclusive lock to complete, and does not get it, because the checkpoint thread has a lock. I'll look into whether we want to document this or do something different in wiredtiger. In the meantime, doing a retry around the call to truncate is a workaround.

          For the record, here is the where the truncate thread detects EBUSY:

          __wt_session_lock_dhandle(session=0x0000000101007b00, flags=4) + 253 at session_dhandle.c:82
             79  			 * will get the necessary lock, so don't bother here.
             80  			 */
             81  			if (LF_ISSET(WT_DHANDLE_LOCK_ONLY) || special_flags == 0) {
          -> 82  				WT_RET(__wt_try_writelock(session, dhandle->rwlock));
             83  				F_SET(dhandle, WT_DHANDLE_EXCLUSIVE);
             84  				locked = WRITELOCK;
             85  			}
          (lldb) bt
          * thread WT-1: tid = 0xc9accd, 0x0000000100080d04 libwiredtiger-2.5.1.dylib`__break + 4 at conn_dhandle.c:126, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
              frame #0: 0x0000000100080d04 libwiredtiger-2.5.1.dylib`__break + 4 at conn_dhandle.c:126
            * frame WT-1: 0x000000010012f68d libwiredtiger-2.5.1.dylib`__wt_session_lock_dhandle(session=0x0000000101007b00, flags=4) + 253 at session_dhandle.c:82
              frame WT-2: 0x0000000100130166 libwiredtiger-2.5.1.dylib`__wt_session_get_btree(session=0x0000000101007b00, uri=0x0000000100000f81, checkpoint=0x0000000000000000, cfg=0x0000000000000000, flags=4) + 790 at session_dhandle.c:381
              frame WT-3: 0x00000001001265cb libwiredtiger-2.5.1.dylib`__truncate_file(session=0x0000000101007b00, name=0x0000000100000f81) + 187 at schema_truncate.c:27
              frame WT-4: 0x000000010012637b libwiredtiger-2.5.1.dylib`__wt_schema_truncate(session=0x0000000101007b00, uri=0x0000000100000f81, cfg=0x00007fff5fbfec80) + 107 at schema_truncate.c:113
              frame WT-5: 0x000000010012c47e libwiredtiger-2.5.1.dylib`__session_truncate(wt_session=0x0000000101007b00, uri=0x0000000100000f81, start=0x0000000000000000, stop=0x0000000000000000, config=0x0000000000000000) + 1198 at session_api.c:609
              frame WT-6: 0x0000000100000d5e bug`main + 1646 at bug.c:60
              frame WT-7: 0x00007fff90e555c9 libdyld.dylib`start + 1
          

          and here is what the checkpoint thread is doing:

            thread WT-4: tid = 0xc9acfa, 0x00000001000652cd libwiredtiger-2.5.1.dylib`__wt_lex_compare(user_item=0x00000001003011d0, tree_item=0x00000001058dbbc0) + 205 at btree.i:1205
              frame #0: 0x00000001000652cd libwiredtiger-2.5.1.dylib`__wt_lex_compare(user_item=0x00000001003011d0, tree_item=0x00000001058dbbc0) + 205 at btree.i:1205
              frame WT-1: 0x00000001000650c7 libwiredtiger-2.5.1.dylib`__wt_compare(session=0x0000000101007840, collator=0x0000000000000000, user_item=0x00000001003011d0, tree_item=0x00000001058dbbc0, cmpp=0x0000000100686424) + 55 at btree.i:1223
              frame WT-2: 0x00000001000637bb libwiredtiger-2.5.1.dylib`__verify_dsk_row(session=0x0000000101007840, addr=0x00000001001493dd, dsk=0x000000010b806000) + 2715 at bt_vrfy_dsk.c:389
              frame WT-3: 0x00000001000625c3 libwiredtiger-2.5.1.dylib`__wt_verify_dsk_image(session=0x0000000101007840, addr=0x00000001001493dd, dsk=0x000000010b806000, size=24526, empty_page_ok=0) + 1603 at bt_vrfy_dsk.c:160
              frame WT-4: 0x0000000100063d46 libwiredtiger-2.5.1.dylib`__wt_verify_dsk(session=0x0000000101007840, addr=0x00000001001493dd, buf=0x00000001073bda38) + 54 at bt_vrfy_dsk.c:176
              frame WT-5: 0x00000001000428ee libwiredtiger-2.5.1.dylib`__wt_bt_write(session=0x0000000101007840, buf=0x00000001073bda38, addr=0x00000001006867a0, addr_sizep=0x0000000100686728, checkpoint=0, compressed=0) + 718 at bt_io.c:176
              frame WT-6: 0x0000000100112531 libwiredtiger-2.5.1.dylib`__rec_split_write(session=0x0000000101007840, r=0x00000001073bda20, bnd=0x000000010c5b0600, buf=0x00000001073bda38, last_block=1) + 2497 at rec_write.c:2883
              frame WT-7: 0x0000000100114ad5 libwiredtiger-2.5.1.dylib`__rec_split_finish_std(session=0x0000000101007840, r=0x00000001073bda20) + 485 at rec_write.c:2603
              frame WT-8: 0x000000010010942d libwiredtiger-2.5.1.dylib`__rec_split_finish(session=0x0000000101007840, r=0x00000001073bda20) + 189 at rec_write.c:2618
              frame WT-9: 0x0000000100106f3b libwiredtiger-2.5.1.dylib`__rec_row_leaf(session=0x0000000101007840, r=0x00000001073bda20, page=0x000000010b749000, salvage=0x0000000000000000) + 4187 at rec_write.c:4585
              frame WT-10: 0x0000000100102860 libwiredtiger-2.5.1.dylib`__wt_reconcile(session=0x0000000101007840, ref=0x0000000102d2e8c0, salvage=0x0000000000000000, flags=0) + 1568 at rec_write.c:413
              frame WT-11: 0x000000010005d322 libwiredtiger-2.5.1.dylib`__sync_file(session=0x0000000101007840, syncop=16) + 626 at bt_sync.c:70
              frame WT-12: 0x000000010005cff6 libwiredtiger-2.5.1.dylib`__wt_cache_op(session=0x0000000101007840, ckptbase=0x0000000000000000, op=16) + 230 at bt_sync.c:223
              frame WT-13: 0x000000010013ebab libwiredtiger-2.5.1.dylib`__checkpoint_write_leaves(session=0x0000000101007840, cfg=0x0000000100686e90) + 43 at txn_ckpt.c:280
              frame WT-14: 0x000000010013ea00 libwiredtiger-2.5.1.dylib`__checkpoint_apply(session=0x0000000101007840, cfg=0x0000000100686e90, op=0x000000010013eb80) + 160 at txn_ckpt.c:184
              frame WT-15: 0x000000010013d9ec libwiredtiger-2.5.1.dylib`__wt_txn_checkpoint(session=0x0000000101007840, cfg=0x0000000100686e90) + 2012 at txn_ckpt.c:363
              frame WT-16: 0x000000010012dfa4 libwiredtiger-2.5.1.dylib`__session_checkpoint(wt_session=0x0000000101007840, config=0x0000000000000000) + 724 at session_api.c:889
              frame WT-17: 0x0000000100080b3c libwiredtiger-2.5.1.dylib`__ckpt_server(arg=0x0000000101007840) + 156 at conn_ckpt.c:87
              frame WT-18: 0x00007fff8d04c268 libsystem_pthread.dylib`_pthread_body + 131
              frame WT-19: 0x00007fff8d04c1e5 libsystem_pthread.dylib`_pthread_start + 176
              frame WT-20: 0x00007fff8d04a41d libsystem_pthread.dylib`thread_start + 13
          

          Show
          ddanderson Don Anderson (Inactive) added a comment - @dmitri-shubin, thanks for this report. I am able to reproduce this, and I can see why this happens. The truncate operation needs an exclusive lock to complete, and does not get it, because the checkpoint thread has a lock. I'll look into whether we want to document this or do something different in wiredtiger. In the meantime, doing a retry around the call to truncate is a workaround. For the record, here is the where the truncate thread detects EBUSY: __wt_session_lock_dhandle(session=0x0000000101007b00, flags=4) + 253 at session_dhandle.c:82 79 * will get the necessary lock, so don't bother here. 80 */ 81 if (LF_ISSET(WT_DHANDLE_LOCK_ONLY) || special_flags == 0) { -> 82 WT_RET(__wt_try_writelock(session, dhandle->rwlock)); 83 F_SET(dhandle, WT_DHANDLE_EXCLUSIVE); 84 locked = WRITELOCK; 85 } (lldb) bt * thread WT-1: tid = 0xc9accd, 0x0000000100080d04 libwiredtiger-2.5.1.dylib`__break + 4 at conn_dhandle.c:126, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1 frame #0: 0x0000000100080d04 libwiredtiger-2.5.1.dylib`__break + 4 at conn_dhandle.c:126 * frame WT-1: 0x000000010012f68d libwiredtiger-2.5.1.dylib`__wt_session_lock_dhandle(session=0x0000000101007b00, flags=4) + 253 at session_dhandle.c:82 frame WT-2: 0x0000000100130166 libwiredtiger-2.5.1.dylib`__wt_session_get_btree(session=0x0000000101007b00, uri=0x0000000100000f81, checkpoint=0x0000000000000000, cfg=0x0000000000000000, flags=4) + 790 at session_dhandle.c:381 frame WT-3: 0x00000001001265cb libwiredtiger-2.5.1.dylib`__truncate_file(session=0x0000000101007b00, name=0x0000000100000f81) + 187 at schema_truncate.c:27 frame WT-4: 0x000000010012637b libwiredtiger-2.5.1.dylib`__wt_schema_truncate(session=0x0000000101007b00, uri=0x0000000100000f81, cfg=0x00007fff5fbfec80) + 107 at schema_truncate.c:113 frame WT-5: 0x000000010012c47e libwiredtiger-2.5.1.dylib`__session_truncate(wt_session=0x0000000101007b00, uri=0x0000000100000f81, start=0x0000000000000000, stop=0x0000000000000000, config=0x0000000000000000) + 1198 at session_api.c:609 frame WT-6: 0x0000000100000d5e bug`main + 1646 at bug.c:60 frame WT-7: 0x00007fff90e555c9 libdyld.dylib`start + 1 and here is what the checkpoint thread is doing: thread WT-4: tid = 0xc9acfa, 0x00000001000652cd libwiredtiger-2.5.1.dylib`__wt_lex_compare(user_item=0x00000001003011d0, tree_item=0x00000001058dbbc0) + 205 at btree.i:1205 frame #0: 0x00000001000652cd libwiredtiger-2.5.1.dylib`__wt_lex_compare(user_item=0x00000001003011d0, tree_item=0x00000001058dbbc0) + 205 at btree.i:1205 frame WT-1: 0x00000001000650c7 libwiredtiger-2.5.1.dylib`__wt_compare(session=0x0000000101007840, collator=0x0000000000000000, user_item=0x00000001003011d0, tree_item=0x00000001058dbbc0, cmpp=0x0000000100686424) + 55 at btree.i:1223 frame WT-2: 0x00000001000637bb libwiredtiger-2.5.1.dylib`__verify_dsk_row(session=0x0000000101007840, addr=0x00000001001493dd, dsk=0x000000010b806000) + 2715 at bt_vrfy_dsk.c:389 frame WT-3: 0x00000001000625c3 libwiredtiger-2.5.1.dylib`__wt_verify_dsk_image(session=0x0000000101007840, addr=0x00000001001493dd, dsk=0x000000010b806000, size=24526, empty_page_ok=0) + 1603 at bt_vrfy_dsk.c:160 frame WT-4: 0x0000000100063d46 libwiredtiger-2.5.1.dylib`__wt_verify_dsk(session=0x0000000101007840, addr=0x00000001001493dd, buf=0x00000001073bda38) + 54 at bt_vrfy_dsk.c:176 frame WT-5: 0x00000001000428ee libwiredtiger-2.5.1.dylib`__wt_bt_write(session=0x0000000101007840, buf=0x00000001073bda38, addr=0x00000001006867a0, addr_sizep=0x0000000100686728, checkpoint=0, compressed=0) + 718 at bt_io.c:176 frame WT-6: 0x0000000100112531 libwiredtiger-2.5.1.dylib`__rec_split_write(session=0x0000000101007840, r=0x00000001073bda20, bnd=0x000000010c5b0600, buf=0x00000001073bda38, last_block=1) + 2497 at rec_write.c:2883 frame WT-7: 0x0000000100114ad5 libwiredtiger-2.5.1.dylib`__rec_split_finish_std(session=0x0000000101007840, r=0x00000001073bda20) + 485 at rec_write.c:2603 frame WT-8: 0x000000010010942d libwiredtiger-2.5.1.dylib`__rec_split_finish(session=0x0000000101007840, r=0x00000001073bda20) + 189 at rec_write.c:2618 frame WT-9: 0x0000000100106f3b libwiredtiger-2.5.1.dylib`__rec_row_leaf(session=0x0000000101007840, r=0x00000001073bda20, page=0x000000010b749000, salvage=0x0000000000000000) + 4187 at rec_write.c:4585 frame WT-10: 0x0000000100102860 libwiredtiger-2.5.1.dylib`__wt_reconcile(session=0x0000000101007840, ref=0x0000000102d2e8c0, salvage=0x0000000000000000, flags=0) + 1568 at rec_write.c:413 frame WT-11: 0x000000010005d322 libwiredtiger-2.5.1.dylib`__sync_file(session=0x0000000101007840, syncop=16) + 626 at bt_sync.c:70 frame WT-12: 0x000000010005cff6 libwiredtiger-2.5.1.dylib`__wt_cache_op(session=0x0000000101007840, ckptbase=0x0000000000000000, op=16) + 230 at bt_sync.c:223 frame WT-13: 0x000000010013ebab libwiredtiger-2.5.1.dylib`__checkpoint_write_leaves(session=0x0000000101007840, cfg=0x0000000100686e90) + 43 at txn_ckpt.c:280 frame WT-14: 0x000000010013ea00 libwiredtiger-2.5.1.dylib`__checkpoint_apply(session=0x0000000101007840, cfg=0x0000000100686e90, op=0x000000010013eb80) + 160 at txn_ckpt.c:184 frame WT-15: 0x000000010013d9ec libwiredtiger-2.5.1.dylib`__wt_txn_checkpoint(session=0x0000000101007840, cfg=0x0000000100686e90) + 2012 at txn_ckpt.c:363 frame WT-16: 0x000000010012dfa4 libwiredtiger-2.5.1.dylib`__session_checkpoint(wt_session=0x0000000101007840, config=0x0000000000000000) + 724 at session_api.c:889 frame WT-17: 0x0000000100080b3c libwiredtiger-2.5.1.dylib`__ckpt_server(arg=0x0000000101007840) + 156 at conn_ckpt.c:87 frame WT-18: 0x00007fff8d04c268 libsystem_pthread.dylib`_pthread_body + 131 frame WT-19: 0x00007fff8d04c1e5 libsystem_pthread.dylib`_pthread_start + 176 frame WT-20: 0x00007fff8d04a41d libsystem_pthread.dylib`thread_start + 13
          Hide
          ddanderson Don Anderson (Inactive) added a comment -

          Fixes from WT-1767 referenced in WT-1404 do not fix this problem.

          Show
          ddanderson Don Anderson (Inactive) added a comment - Fixes from WT-1767 referenced in WT-1404 do not fix this problem.
          Hide
          ddanderson Don Anderson (Inactive) added a comment -

          @agorrod, @michaelcahill, here's where I am with this - I'd like some help with the way forward. While holding the schema lock, truncate tries to take an exclusive lock on each data handle it needs (__wt_try_writelock). If it fails at this step, EBUSY is returned. Waiting for the data handle lock causes deadlocks with checkpoint, which is locking in the other order. I haven't explored always taking a schema lock in advance of the data handle lock in checkpoint. Are there performance implications?

          Changing truncate to retry (outside all the locks) basically works, except that bug006 hangs - it's holding a cursor open while doing a truncate - it expects an EBUSY error. Creating a limit on the retries "fixes" this, but would probably make a truncate fail during a much longer checkpoint.

          With the current state of affairs (where the caller has to know to retry), I'm afraid that programs that call truncate are going to be put into production without retries (because they appear to work) and start failing at random times due to checkpoint. Changing truncate to do unlimited retries internally does fix this - but are there other concerns? Doing a limited number of retries would make checkpoint EBUSY fails even more spurious.

          I'm inclined to change truncate to do unlimited retries and modify bug006, if you think changing checkpoint could work, I can explore that instead.

          Show
          ddanderson Don Anderson (Inactive) added a comment - @agorrod, @michaelcahill, here's where I am with this - I'd like some help with the way forward. While holding the schema lock, truncate tries to take an exclusive lock on each data handle it needs ( __wt_try_writelock ). If it fails at this step, EBUSY is returned. Waiting for the data handle lock causes deadlocks with checkpoint, which is locking in the other order. I haven't explored always taking a schema lock in advance of the data handle lock in checkpoint. Are there performance implications? Changing truncate to retry (outside all the locks) basically works, except that bug006 hangs - it's holding a cursor open while doing a truncate - it expects an EBUSY error. Creating a limit on the retries "fixes" this, but would probably make a truncate fail during a much longer checkpoint. With the current state of affairs (where the caller has to know to retry), I'm afraid that programs that call truncate are going to be put into production without retries (because they appear to work) and start failing at random times due to checkpoint. Changing truncate to do unlimited retries internally does fix this - but are there other concerns? Doing a limited number of retries would make checkpoint EBUSY fails even more spurious. I'm inclined to change truncate to do unlimited retries and modify bug006, if you think changing checkpoint could work, I can explore that instead.
          Hide
          agorrod Alex Gorrod added a comment -

          Don Anderson Thanks for the analysis. For some context verify takes the

          ::checkpoint_lock

          to avoid exactly this issue. I suspect that we don't want truncate to wait on the checkpoint lock (since it should be able to happen in parallel).

          Maybe we should retry truncate while there is a checkpoint active? We could figure that out be calling

           and 

          __wt_spin_unlock

           or add in a new connection wide checkpointing flag. There is already a 

          WT_BTREE::checkpointing

           flag - but I suspect the relevant btree in 

          WT_SESSION::dhandle{{`}} will have been cleared by the time we get back up to the truncate call.

          Show
          agorrod Alex Gorrod added a comment - Don Anderson Thanks for the analysis. For some context verify takes the ::checkpoint_lock to avoid exactly this issue. I suspect that we don't want truncate to wait on the checkpoint lock (since it should be able to happen in parallel). Maybe we should retry truncate while there is a checkpoint active? We could figure that out be calling and __wt_spin_unlock or add in a new connection wide checkpointing flag. There is already a WT_BTREE::checkpointing flag - but I suspect the relevant btree in WT_SESSION::dhandle{{`}} will have been cleared by the time we get back up to the truncate call.
          Hide
          michael.cahill Michael Cahill added a comment -

          For now (until we get stuck into WT-1598), I think we should just make full-table truncates (without cursors) work the same as exclusive operations like verify and salvage...

          Show
          michael.cahill Michael Cahill added a comment - For now (until we get stuck into WT-1598 ), I think we should just make full-table truncates (without cursors) work the same as exclusive operations like verify and salvage...
          Hide
          ddanderson Don Anderson (Inactive) added a comment -

          With the merge of WT-1782, the problem is fixed - truncate will wait. The downside is that the wait can be a long time if there's a big checkpoint.

          Show
          ddanderson Don Anderson (Inactive) added a comment - With the merge of WT-1782 , the problem is fixed - truncate will wait. The downside is that the wait can be a long time if there's a big checkpoint.
          Hide
          dmitri-shubin Dmitri Shubin added a comment -

          It's ok for us.
          Thank you!

          Show
          dmitri-shubin Dmitri Shubin added a comment - It's ok for us. Thank you!
          Hide
          ramon.fernandez Ramon Fernandez added a comment -

          Additional ticket information from GitHub

          This ticket was referenced in the following commits:
          Show
          ramon.fernandez Ramon Fernandez added a comment - Additional ticket information from GitHub This ticket was referenced in the following commits: 5bf11d893548804b890836a3d9ef4335c4319bb7 a9f0e3ac769a060a8c3b06bac71fbed0e5f46cc6 2e15cd6fc322c90c763394c52056bfebd4153aad

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:
                Days since reply:
                1 year, 48 weeks, 6 days ago
                Date of 1st Reply: