Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: WT2.7.0
    • Labels:
      None
    • # Replies:
      8
    • Last comment by Customer:
      true

      Description

      There was a jenkins test failure with the following symptoms:

      16: read      0, remove   2017, update   7983
       17: read      0, remove   2017, update   7983
       18: read      0, remove   2017, update   7983
       19: read      0, remove   2017, update   7983
      [1437710017:301571][77975:0x7fd9d57fa700], t: ../src/txn/txn.c, 101: txn_state->snap_min == WT_TXN_NONE || session->txn.isolation == WT_ISO_READ_UNCOMMITTED || !__wt_txn_visible_all(session, txn_state->snap_min)
      [1437710017:301631][77975:0x7fd9d57fa700], t: aborting WiredTiger library
      t: process 77975
          1: 10 readers, 10 writers
       read thread  0 starting: tid: 77975:0x7fd9e7440700, file: file:wt.000
      

      The test was:
      http://build.wiredtiger.com:8080/job/wiredtiger-pull-request-tester/4104/

      The failure isn't due to any changes in the pull request - since it only touches the wtperf benchmark code.

        Issue Links

          Activity

          Hide
          alexander.gorrod Alexander Gorrod added a comment -

          Another occurrence http://build.wiredtiger.com:8080/job/wiredtiger-linux-directio/3403/

          Call stack:

          (gdb) where
          #0  0x00000030d5634a98 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:55
          #1  0x00000030d563672a in __GI_abort () at abort.c:89
          #2  0x00000000004aea8d in __wt_abort (session=session@entry=0xec2eb0)
              at ../src/os_posix/os_abort.c:25
          #3  0x000000000044cc69 in __wt_assert (session=session@entry=0xec2eb0, error=error@entry=0,
              file_name=file_name@entry=0x4f3ae3 "../src/txn/txn.c", line_number=line_number@entry=101,
              fmt=fmt@entry=0x4e73dd "%s") at ../src/support/err.c:468
          #4  0x00000000004536c2 in __wt_txn_release_snapshot (session=0xec2eb0) at ../src/txn/txn.c:98
          #5  __wt_txn_release (session=session@entry=0xec2eb0) at ../src/txn/txn.c:431
          #6  0x0000000000453a76 in __wt_txn_commit (session=0xec2eb0, cfg=<optimized out>)
              at ../src/txn/txn.c:549
          #7  0x000000000040639f in writer_op (s=0xee2160, cursor=0x7f772401cd20, session=0xec2eb0)
              at ../../../test/thread/rw.c:266
          #8  writer (arg=0x12) at ../../../test/thread/rw.c:319
          #9  0x00000030d5a07555 in start_thread (arg=0x7f775a7fc700) at pthread_create.c:333
          #10 0x00000030d5701f3d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
          

          Global transaction information:

          (gdb) p $2->txn_global
          $3 = {current = 15218, last_running = 15218, oldest_id = 15209, scan_count = 0,
            checkpoint_id = 0, checkpoint_gen = 0, checkpoint_pinned = 0, nsnap_rwlock = 0xebde50,
            nsnap_oldest_id = 0, nsnaph = {stqh_first = 0x0, stqh_last = 0xebc778}, states = 0xedd4a0}
          (gdb) p *$2->txn_global.states@20
          $5 = {{id = 0, snap_min = 0}, {id = 0, snap_min = 0}, {id = 0, snap_min = 0}, {id = 0,
              snap_min = 0}, {id = 0, snap_min = 0}, {id = 0, snap_min = 0}, {id = 0, snap_min = 0}, {
              id = 0, snap_min = 0}, {id = 0, snap_min = 0}, {id = 0, snap_min = 15207}, {id = 0,
              snap_min = 0}, {id = 0, snap_min = 0}, {id = 0, snap_min = 0}, {id = 0, snap_min = 0}, {
              id = 0, snap_min = 0}, {id = 0, snap_min = 0}, {id = 0, snap_min = 0}, {id = 0,
              snap_min = 0}, {id = 0, snap_min = 0}, {id = 0, snap_min = 0}}
          

          Which shows that 15207 is in the table, but the oldest ID has been updated to 15209.

          Show
          alexander.gorrod Alexander Gorrod added a comment - Another occurrence http://build.wiredtiger.com:8080/job/wiredtiger-linux-directio/3403/ Call stack: (gdb) where #0 0x00000030d5634a98 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:55 #1 0x00000030d563672a in __GI_abort () at abort.c:89 #2 0x00000000004aea8d in __wt_abort (session=session@entry=0xec2eb0) at ../src/os_posix/os_abort.c:25 #3 0x000000000044cc69 in __wt_assert (session=session@entry=0xec2eb0, error=error@entry=0, file_name=file_name@entry=0x4f3ae3 "../src/txn/txn.c", line_number=line_number@entry=101, fmt=fmt@entry=0x4e73dd "%s") at ../src/support/err.c:468 #4 0x00000000004536c2 in __wt_txn_release_snapshot (session=0xec2eb0) at ../src/txn/txn.c:98 #5 __wt_txn_release (session=session@entry=0xec2eb0) at ../src/txn/txn.c:431 #6 0x0000000000453a76 in __wt_txn_commit (session=0xec2eb0, cfg=<optimized out>) at ../src/txn/txn.c:549 #7 0x000000000040639f in writer_op (s=0xee2160, cursor=0x7f772401cd20, session=0xec2eb0) at ../../../test/thread/rw.c:266 #8 writer (arg=0x12) at ../../../test/thread/rw.c:319 #9 0x00000030d5a07555 in start_thread (arg=0x7f775a7fc700) at pthread_create.c:333 #10 0x00000030d5701f3d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 Global transaction information: (gdb) p $2->txn_global $3 = {current = 15218, last_running = 15218, oldest_id = 15209, scan_count = 0, checkpoint_id = 0, checkpoint_gen = 0, checkpoint_pinned = 0, nsnap_rwlock = 0xebde50, nsnap_oldest_id = 0, nsnaph = {stqh_first = 0x0, stqh_last = 0xebc778}, states = 0xedd4a0} (gdb) p *$2->txn_global.states@20 $5 = {{id = 0, snap_min = 0}, {id = 0, snap_min = 0}, {id = 0, snap_min = 0}, {id = 0, snap_min = 0}, {id = 0, snap_min = 0}, {id = 0, snap_min = 0}, {id = 0, snap_min = 0}, { id = 0, snap_min = 0}, {id = 0, snap_min = 0}, {id = 0, snap_min = 15207}, {id = 0, snap_min = 0}, {id = 0, snap_min = 0}, {id = 0, snap_min = 0}, {id = 0, snap_min = 0}, { id = 0, snap_min = 0}, {id = 0, snap_min = 0}, {id = 0, snap_min = 0}, {id = 0, snap_min = 0}, {id = 0, snap_min = 0}, {id = 0, snap_min = 0}} Which shows that 15207 is in the table, but the oldest ID has been updated to 15209 .
          Hide
          alexander.gorrod Alexander Gorrod added a comment -

          Michael Cahill This indicates a problem in __wt_txn_update_oldest - I reviewed that code and can't spot a bug that would trigger here.

          I'll take another look later, but it'd be great if you can look too.

          Show
          alexander.gorrod Alexander Gorrod added a comment - Michael Cahill This indicates a problem in __wt_txn_update_oldest - I reviewed that code and can't spot a bug that would trigger here. I'll take another look later, but it'd be great if you can look too.
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexg@wiredtiger.com'}

          Message: WT-2021: Fix a race when getting a transaction ID.

          There was a case where we thought the system was read-only. We could
          assign a transaction ID that wasn't subsequently seen by an oldest
          ID update.
          Branch: develop
          https://github.com/wiredtiger/wiredtiger/commit/32a59be7f48caf5bdda408282737ca04b667fae9

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexg@wiredtiger.com'} Message: WT-2021 : Fix a race when getting a transaction ID. There was a case where we thought the system was read-only. We could assign a transaction ID that wasn't subsequently seen by an oldest ID update. Branch: develop https://github.com/wiredtiger/wiredtiger/commit/32a59be7f48caf5bdda408282737ca04b667fae9
          Hide
          sue.loverso Sue LoVerso added a comment -

          I hit this again on my logging branch (which is merged with develop as of changeset 4545a8b14cf30b358f on August 3). I was running the log-append.wtperf workload. This is not readily reproducible as I've been running this workload a lot. I saw this on my AWS SSD system.

          I will post more debugging info.

          Show
          sue.loverso Sue LoVerso added a comment - I hit this again on my logging branch (which is merged with develop as of changeset 4545a8b14cf30b358f on August 3). I was running the log-append.wtperf workload. This is not readily reproducible as I've been running this workload a lot. I saw this on my AWS SSD system. I will post more debugging info.
          Hide
          sue.loverso Sue LoVerso added a comment -

          Stack:

          Thread 10 (Thread 0x7f6d1c831700 (LWP 13907)):
          #0  0x00007f6d214f7943 in select () from /lib64/libc.so.6
          #1  0x000000000043f1c8 in __wt_sleep (seconds=100, micro_seconds=0) at ../src/os_posix/os_sleep.c:23
          #2  0x00000000004681b2 in __wt_attach (session=0x26768c0) at ../src/support/global.c:118
          #3  0x00000000004db212 in __wt_abort (session=0x26768c0) at ../src/os_posix/os_abort.c:22
          #4  0x00000000004679f1 in __wt_assert (session=0x26768c0, error=0, file_name=0x51afca "../src/txn/txn.c", line_number=101, fmt=0x51afc7 "%s")
              at ../src/support/err.c:468
          #5  0x000000000046eb1d in __wt_txn_release_snapshot (session=0x26768c0) at ../src/txn/txn.c:98
          #6  0x000000000049bbe4 in __sync_file (session=0x26768c0, syncop=16) at ../src/btree/bt_sync.c:178
          #7  0x000000000049bdcc in __wt_cache_op (session=0x26768c0, ckptbase=0x0, op=16) at ../src/btree/bt_sync.c:258
          #8  0x00000000004715dd in __checkpoint_write_leaves (session=0x26768c0, cfg=0x7f6d1c830e90) at ../src/txn/txn_ckpt.c:277
          #9  0x00000000004711bf in __checkpoint_apply (session=0x26768c0, cfg=0x7f6d1c830e90, op=0x4715b7 <__checkpoint_write_leaves>) at ../src/txn/txn_ckpt.c:184
          #10 0x00000000004720b7 in __wt_txn_checkpoint (session=0x26768c0, cfg=0x7f6d1c830e90) at ../src/txn/txn_ckpt.c:406
          #11 0x0000000000463aa9 in __session_checkpoint (wt_session=0x26768c0, config=0x0) at ../src/session/session_api.c:1014
          #12 0x0000000000416b4c in __ckpt_server (arg=0x26768c0) at ../src/conn/conn_ckpt.c:95
          

          Transaction information:

          (gdb) p *txn_state
          $1 = {id = 0, snap_min = 28775065}
          (gdb) p session->txn
          $2 = {id = 0, isolation = WT_ISO_READ_COMMITTED, snap_min = 28775065, snap_max = 28775075, snapshot = 0x26c5ee0, snapshot_count = 5, txn_logsync = 8, 
            mod = 0x7f6cecdf0c70, mod_alloc = 960, mod_count = 0, logrec = 0x0, notify = 0x0, ckpt_lsn = {file = 1, offset = 0}, full_ckpt = 0, ckpt_nsnapshot = 0, 
            ckpt_snapshot = 0x0, flags = 8}
          (gdb) p $3->txn_global
          $4 = {current = 34930892, last_running = 34930892, oldest_id = 29560002, scan_count = 0, checkpoint_id = 0, checkpoint_gen = 2, checkpoint_pinned = 0, 
            nsnap_rwlock = 0x2672200, nsnap_oldest_id = 0, nsnaph = {stqh_first = 0x0, stqh_last = 0x2670d98}, states = 0x26917b0}
          (gdb) p *$3->txn_global.states@20
          $5 = {{id = 0, snap_min = 0}, {id = 0, snap_min = 0}, {id = 0, snap_min = 0}, {id = 0, snap_min = 0}, {id = 0, snap_min = 0}, {id = 0, snap_min = 0}, {id = 0, 
              snap_min = 28775065}, {id = 0, snap_min = 0} <repeats 13 times>}
          

          Show
          sue.loverso Sue LoVerso added a comment - Stack: Thread 10 (Thread 0x7f6d1c831700 (LWP 13907)): #0 0x00007f6d214f7943 in select () from /lib64/libc.so.6 #1 0x000000000043f1c8 in __wt_sleep (seconds=100, micro_seconds=0) at ../src/os_posix/os_sleep.c:23 #2 0x00000000004681b2 in __wt_attach (session=0x26768c0) at ../src/support/global.c:118 #3 0x00000000004db212 in __wt_abort (session=0x26768c0) at ../src/os_posix/os_abort.c:22 #4 0x00000000004679f1 in __wt_assert (session=0x26768c0, error=0, file_name=0x51afca "../src/txn/txn.c", line_number=101, fmt=0x51afc7 "%s") at ../src/support/err.c:468 #5 0x000000000046eb1d in __wt_txn_release_snapshot (session=0x26768c0) at ../src/txn/txn.c:98 #6 0x000000000049bbe4 in __sync_file (session=0x26768c0, syncop=16) at ../src/btree/bt_sync.c:178 #7 0x000000000049bdcc in __wt_cache_op (session=0x26768c0, ckptbase=0x0, op=16) at ../src/btree/bt_sync.c:258 #8 0x00000000004715dd in __checkpoint_write_leaves (session=0x26768c0, cfg=0x7f6d1c830e90) at ../src/txn/txn_ckpt.c:277 #9 0x00000000004711bf in __checkpoint_apply (session=0x26768c0, cfg=0x7f6d1c830e90, op=0x4715b7 <__checkpoint_write_leaves>) at ../src/txn/txn_ckpt.c:184 #10 0x00000000004720b7 in __wt_txn_checkpoint (session=0x26768c0, cfg=0x7f6d1c830e90) at ../src/txn/txn_ckpt.c:406 #11 0x0000000000463aa9 in __session_checkpoint (wt_session=0x26768c0, config=0x0) at ../src/session/session_api.c:1014 #12 0x0000000000416b4c in __ckpt_server (arg=0x26768c0) at ../src/conn/conn_ckpt.c:95 Transaction information: (gdb) p *txn_state $1 = {id = 0, snap_min = 28775065} (gdb) p session->txn $2 = {id = 0, isolation = WT_ISO_READ_COMMITTED, snap_min = 28775065, snap_max = 28775075, snapshot = 0x26c5ee0, snapshot_count = 5, txn_logsync = 8, mod = 0x7f6cecdf0c70, mod_alloc = 960, mod_count = 0, logrec = 0x0, notify = 0x0, ckpt_lsn = {file = 1, offset = 0}, full_ckpt = 0, ckpt_nsnapshot = 0, ckpt_snapshot = 0x0, flags = 8} (gdb) p $3->txn_global $4 = {current = 34930892, last_running = 34930892, oldest_id = 29560002, scan_count = 0, checkpoint_id = 0, checkpoint_gen = 2, checkpoint_pinned = 0, nsnap_rwlock = 0x2672200, nsnap_oldest_id = 0, nsnaph = {stqh_first = 0x0, stqh_last = 0x2670d98}, states = 0x26917b0} (gdb) p *$3->txn_global.states@20 $5 = {{id = 0, snap_min = 0}, {id = 0, snap_min = 0}, {id = 0, snap_min = 0}, {id = 0, snap_min = 0}, {id = 0, snap_min = 0}, {id = 0, snap_min = 0}, {id = 0, snap_min = 28775065}, {id = 0, snap_min = 0} <repeats 13 times>}
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

          Message: WT-2021 Fix a bug moving the oldest ID forward (introduced by WT-1967).
          Branch: develop
          https://github.com/wiredtiger/wiredtiger/commit/62d69b2cc37b37fbe7f67a8046ac3dd5cac0597a

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'} Message: WT-2021 Fix a bug moving the oldest ID forward (introduced by WT-1967 ). Branch: develop https://github.com/wiredtiger/wiredtiger/commit/62d69b2cc37b37fbe7f67a8046ac3dd5cac0597a
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

          Message: Merge pull request #2109 from wiredtiger/oldest-id-fix

          WT-2021 Fix a bug moving the oldest ID forward (introduced by WT-1967).
          Branch: develop
          https://github.com/wiredtiger/wiredtiger/commit/7b302d3fbcdc7dad6d89c193e32b2e90b5575b74

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'} Message: Merge pull request #2109 from wiredtiger/oldest-id-fix WT-2021 Fix a bug moving the oldest ID forward (introduced by WT-1967 ). Branch: develop https://github.com/wiredtiger/wiredtiger/commit/7b302d3fbcdc7dad6d89c193e32b2e90b5575b74
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexg@wiredtiger.com'}

          Message: WT-2021: Fix a race when getting a transaction ID.

          There was a case where we thought the system was read-only. We could
          assign a transaction ID that wasn't subsequently seen by an oldest
          ID update.

          (cherry picked from commit 32a59be7f48caf5bdda408282737ca04b667fae9)
          Branch: mongodb-3.0
          https://github.com/wiredtiger/wiredtiger/commit/cd8c1e8312224017cf2ee2cc1bb0649a5b574b2d

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexg@wiredtiger.com'} Message: WT-2021 : Fix a race when getting a transaction ID. There was a case where we thought the system was read-only. We could assign a transaction ID that wasn't subsequently seen by an oldest ID update. (cherry picked from commit 32a59be7f48caf5bdda408282737ca04b667fae9) Branch: mongodb-3.0 https://github.com/wiredtiger/wiredtiger/commit/cd8c1e8312224017cf2ee2cc1bb0649a5b574b2d

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:
                Days since reply:
                1 year, 33 weeks ago
                Date of 1st Reply: