Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-2210

crash when using lz4 compressor in column-store table

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: WT2.6.1
    • Fix Version/s: WT2.7.0
    • Labels:
      None
    • # Replies:
      9
    • Last comment by Customer:
      true

      Description

      Hi!

      After we enable LZ4 block compression for our column-store table WT started to crash with the following backtrace:

      #1  <signal handler called>
      #2  0x00007fe1a02f75eb in __rec_split_raw_worker (session=0x7fe18fc40500, r=0x7fe18d85f600, next_len=0, no_more_rows=1) at /tb/builds/thd/sbn/2.6/src/thirdparty/wiredtiger/2.6.1/src/src/reconcile/rec_write.c:2234
      #3  0x00007fe1a02f4c72 in __rec_split_finish (session=0x7fe18fc40500, r=0x7fe18d85f600) at /tb/builds/thd/sbn/2.6/src/thirdparty/wiredtiger/2.6.1/src/src/reconcile/rec_write.c:2635
      #4  0x00007fe1a02f330d in __rec_col_var (session=0x7fe18fc40500, r=<optimized out>, page=<optimized out>, salvage=<optimized out>) at /tb/builds/thd/sbn/2.6/src/thirdparty/wiredtiger/2.6.1/src/src/reconcile/rec_write.c:4009
      #5  __wt_reconcile (session=0x7fe18fc40500, ref=0x7fe19a7d1b80, salvage=<optimized out>, flags=0) at /tb/builds/thd/sbn/2.6/src/thirdparty/wiredtiger/2.6.1/src/src/reconcile/rec_write.c:406
      #6  0x00007fe1a029ab47 in __sync_file (session=<optimized out>, syncop=<optimized out>) at /tb/builds/thd/sbn/2.6/src/thirdparty/wiredtiger/2.6.1/src/src/btree/bt_sync.c:77
      #7  __wt_cache_op (session=0x7fe18fc40500, ckptbase=<optimized out>, op=16) at /tb/builds/thd/sbn/2.6/src/thirdparty/wiredtiger/2.6.1/src/src/btree/bt_sync.c:257
      #8  0x00007fe1a03117c3 in __checkpoint_apply (session=0x7fe18fc40500, cfg=0x7fe188bf5950, op=0x7fe1a0311890 <__checkpoint_write_leaves>) at /tb/builds/thd/sbn/2.6/src/thirdparty/wiredtiger/2.6.1/src/src/txn/txn_ckpt.c:184
      #9  0x00007fe1a0310ad6 in __wt_txn_checkpoint (session=0x7fe18fc40500, cfg=0x7fe188bf5950) at /tb/builds/thd/sbn/2.6/src/thirdparty/wiredtiger/2.6.1/src/src/txn/txn_ckpt.c:405
      #10 0x00007fe1a0308bc6 in __session_checkpoint (wt_session=0x7fe18fc40500, config=<optimized out>) at /tb/builds/thd/sbn/2.6/src/thirdparty/wiredtiger/2.6.1/src/src/session/session_api.c:997
      #11 0x00007fe1a02aefe0 in __ckpt_server (arg=0x7fe18fc40500) at /tb/builds/thd/sbn/2.6/src/thirdparty/wiredtiger/2.6.1/src/src/conn/conn_ckpt.c:95
      #12 0x00007fe19ecd49d1 in start_thread () from /lib64/libpthread.so.0
      #13 0x00007fe19ea218fd in clone () from /lib64/libc.so.6
      (gdb) f 2
      #2  0x00007fe1a02f75eb in __rec_split_raw_worker (session=0x7fe18fc40500, r=0x7fe18d85f600, next_len=0, no_more_rows=1) at /tb/builds/thd/sbn/2.6/src/thirdparty/wiredtiger/2.6.1/src/src/reconcile/rec_write.c:2234
      2234                            r->raw_recnos[slots] = recno;
      (gdb) p r->raw_recnos
      $1 = (uint64_t *) 0x0
      (gdb) l
      2229                            r->raw_offsets[++slots] =
      2230                                WT_STORE_SIZE(len - WT_BLOCK_COMPRESS_SKIP);
      2231
      2232                    if (dsk->type == WT_PAGE_COL_INT ||
      2233                        dsk->type == WT_PAGE_COL_VAR)
      2234                            r->raw_recnos[slots] = recno;
      2235                    r->raw_entries[slots] = entry;
      2236            }
      2237
      2238            /*
      (gdb) p dsk->type
      $2 = 4 '\004'
      

      Moreover it also crashes while trying to recover this DB with similar backtrace:

      #1  <signal handler called>
      #2  0x00007f3740d625eb in __rec_split_raw_worker (session=0x7f372f45b900, r=0x7f37395aaa00, next_len=0, no_more_rows=1) at /tb/builds/thd/sbn/2.6/src/thirdparty/wiredtiger/2.6.1/src/src/reconcile/rec_write.c:2234
      #3  0x00007f3740d5fc72 in __rec_split_finish (session=0x7f372f45b900, r=0x7f37395aaa00) at /tb/builds/thd/sbn/2.6/src/thirdparty/wiredtiger/2.6.1/src/src/reconcile/rec_write.c:2635
      #4  0x00007f3740d5e30d in __rec_col_var (session=0x7f372f45b900, r=<optimized out>, page=<optimized out>, salvage=<optimized out>) at /tb/builds/thd/sbn/2.6/src/thirdparty/wiredtiger/2.6.1/src/src/reconcile/rec_write.c:4009
      #5  __wt_reconcile (session=0x7f372f45b900, ref=0x7f37395f9cd0, salvage=<optimized out>, flags=0) at /tb/builds/thd/sbn/2.6/src/thirdparty/wiredtiger/2.6.1/src/src/reconcile/rec_write.c:406
      #6  0x00007f3740d05b47 in __sync_file (session=<optimized out>, syncop=<optimized out>) at /tb/builds/thd/sbn/2.6/src/thirdparty/wiredtiger/2.6.1/src/src/btree/bt_sync.c:77
      #7  __wt_cache_op (session=0x7f372f45b900, ckptbase=<optimized out>, op=16) at /tb/builds/thd/sbn/2.6/src/thirdparty/wiredtiger/2.6.1/src/src/btree/bt_sync.c:257
      #8  0x00007f3740d7c7c3 in __checkpoint_apply (session=0x7f372f45b900, cfg=0x7fffaf15ee80, op=0x7f3740d7c890 <__checkpoint_write_leaves>) at /tb/builds/thd/sbn/2.6/src/thirdparty/wiredtiger/2.6.1/src/src/txn/txn_ckpt.c:184
      #9  0x00007f3740d7bad6 in __wt_txn_checkpoint (session=0x7f372f45b900, cfg=0x7fffaf15ee80) at /tb/builds/thd/sbn/2.6/src/thirdparty/wiredtiger/2.6.1/src/src/txn/txn_ckpt.c:405
      #10 0x00007f3740d73bc6 in __session_checkpoint (wt_session=0x7f372f45b900, config=<optimized out>) at /tb/builds/thd/sbn/2.6/src/thirdparty/wiredtiger/2.6.1/src/src/session/session_api.c:997
      #11 0x00007f3740d802f2 in __wt_txn_recover (session=<optimized out>) at /tb/builds/thd/sbn/2.6/src/thirdparty/wiredtiger/2.6.1/src/src/txn/txn_recover.c:523
      #12 0x00007f3740d1d4e9 in __wt_connection_workers (session=0x7f372f45b000, cfg=0x7fffaf15efb0) at /tb/builds/thd/sbn/2.6/src/thirdparty/wiredtiger/2.6.1/src/src/conn/conn_open.c:237
      #13 0x00007f3740d1641f in wiredtiger_open (home=<optimized out>, event_handler=<optimized out>, config=<optimized out>, wt_connp=0x7f37397c2400) at /tb/builds/thd/sbn/2.6/src/thirdparty/wiredtiger/2.6.1/src/src/conn/conn_api.c:2024
      ...
      (gdb) f 2
      #2  0x00007f3740d625eb in __rec_split_raw_worker (session=0x7f372f45b900, r=0x7f37395aaa00, next_len=0, no_more_rows=1) at /tb/builds/thd/sbn/2.6/src/thirdparty/wiredtiger/2.6.1/src/src/reconcile/rec_write.c:2234
      2234                            r->raw_recnos[slots] = recno;
      (gdb) p r->raw_recnos
      $1 = (uint64_t *) 0x0
      (gdb) p dsk->type
      $2 = 4 '\004'
      

      I'll attach broken DB and small C program that simply tries to open it (with recovery) and crashes.

      Thanks!

      1. compare.c
        2 kB
        Alexander Gorrod
      2. hd_alex.c
        0.7 kB
        Alexander Gorrod
      3. hd.c
        2 kB
        Dmitri Shubin
      4. hd.db.tar.gz
        8 kB
        Dmitri Shubin

        Issue Links

          Activity

          Hide
          Dmitri Shubin Dmitri Shubin added a comment -

          BTW it doesn't crash if I use lz4-noraw or snappy compressor, so probably the problem is in ext/compressors/lz4/lz4_compress.c:lz4_compress_raw() ?

          Show
          Dmitri Shubin Dmitri Shubin added a comment - BTW it doesn't crash if I use lz4-noraw or snappy compressor, so probably the problem is in ext/compressors/lz4/lz4_compress.c:lz4_compress_raw() ?
          Hide
          alexander.gorrod Alexander Gorrod added a comment -

          Thanks for the bug report Dmitri Shubin. I have reproduced the failure. It took a while before I could get your test case running. I needed to split hd.c into two separate files to get the collators to load (I'm sure it was how I was linking things).

          For the record, my steps were:

          $ ../configure CFLAGS="-ggdb -O0" --enable-lz4 --enable-diagnostic && make -j 12
          $ cd wt2210
          $ tar zxf ../hd.tar.gz
          $ ls
          compare.c hd_alex.c  hd.db
          $ gcc -I.. -L ../.libs -L../ -lwiredtiger -ldl -shared -fPIC compare.c -o compare.so
          $ gcc -I.. -L ../.libs -L../ -lwiredtiger -ldl hd_alex.c
          $ LD_LIBRARY_PATH=../.libs:.:../ext/compressors/lz4/.libs/ gdb ./a.out
          

          The other advantage to splitting the comparators out is that I can run the WiredTiger command line utility against the database, for example:

          WIREDTIGER_CONFIG="log=(enabled=true,recover=on)" LD_LIBRARY_PATH=../.libs:.:../ext/compressors/lz4/.libs/ ../wt -C "extensions=(compare.so={entry=init_comparators})" -h hd.db/ list
          

          Show
          alexander.gorrod Alexander Gorrod added a comment - Thanks for the bug report Dmitri Shubin . I have reproduced the failure. It took a while before I could get your test case running. I needed to split hd.c into two separate files to get the collators to load (I'm sure it was how I was linking things). For the record, my steps were: $ ../configure CFLAGS="-ggdb -O0" --enable-lz4 --enable-diagnostic && make -j 12 $ cd wt2210 $ tar zxf ../hd.tar.gz $ ls compare.c hd_alex.c hd.db $ gcc -I.. -L ../.libs -L../ -lwiredtiger -ldl -shared -fPIC compare.c -o compare.so $ gcc -I.. -L ../.libs -L../ -lwiredtiger -ldl hd_alex.c $ LD_LIBRARY_PATH=../.libs:.:../ext/compressors/lz4/.libs/ gdb ./a.out The other advantage to splitting the comparators out is that I can run the WiredTiger command line utility against the database, for example: WIREDTIGER_CONFIG="log=(enabled=true,recover=on)" LD_LIBRARY_PATH=../.libs:.:../ext/compressors/lz4/.libs/ ../wt -C "extensions=(compare.so={entry=init_comparators})" -h hd.db/ list
          Hide
          Dmitri Shubin Dmitri Shubin added a comment -

          Alexander Gorrod, thank you!

          Sorry, I forgot to say that I'm used -rdynamic when building the test (to make it find init function that registers collators) and set LD_LIBRARY_PATH when running.

          Interesting, when collators are not registered it complains but only once and works w/o any problem the next time:

          $ LD_LIBRARY_PATH=/home/sbn/tb/tb/build.x86_64-unknown-linux/platform/lib64/:../../ext/compressors/lz4/.libs/ ./hd
          [1447314064:335899][1940:0x7f3ee0b21720], file:main.wt: unknown collator 'main_coll': Invalid argument
          [1447314064:335955][1940:0x7f3ee0b21720], file:main.wt: Operation failed during recovery: Invalid argument
          wiredtiger_open(WT_HOME, NULL, "create,extensions=(local={entry=wt_init}),log=(enabled,recover=on)", &conn) at (hd.c:87) failed: Invalid argument [22]
          $ LD_LIBRARY_PATH=/home/sbn/tb/tb/build.x86_64-unknown-linux/platform/lib64/:../../ext/compressors/lz4/.libs/ ./hd
          $ echo $?
          0
          

          Show
          Dmitri Shubin Dmitri Shubin added a comment - Alexander Gorrod , thank you! Sorry, I forgot to say that I'm used -rdynamic when building the test (to make it find init function that registers collators) and set LD_LIBRARY_PATH when running. Interesting, when collators are not registered it complains but only once and works w/o any problem the next time: $ LD_LIBRARY_PATH=/home/sbn/tb/tb/build.x86_64-unknown-linux/platform/lib64/:../../ext/compressors/lz4/.libs/ ./hd [1447314064:335899][1940:0x7f3ee0b21720], file:main.wt: unknown collator 'main_coll': Invalid argument [1447314064:335955][1940:0x7f3ee0b21720], file:main.wt: Operation failed during recovery: Invalid argument wiredtiger_open(WT_HOME, NULL, "create,extensions=(local={entry=wt_init}),log=(enabled,recover=on)", &conn) at (hd.c:87) failed: Invalid argument [22] $ LD_LIBRARY_PATH=/home/sbn/tb/tb/build.x86_64-unknown-linux/platform/lib64/:../../ext/compressors/lz4/.libs/ ./hd $ echo $? 0
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'keithbostic', u'name': u'Keith Bostic', u'email': u'keith@wiredtiger.com'}

          Message: WT-2210: if we allocate raw structures for a row-store and then re-use
          them for a column-store, we might not have allocated the record-number
          array, r->raw_recnos. Allocate the record-number array regardless of the
          type of object we're reconciling.
          Branch: develop
          https://github.com/wiredtiger/wiredtiger/commit/3fac9615c5ef80e219637a0293d160f274146a7f

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'keithbostic', u'name': u'Keith Bostic', u'email': u'keith@wiredtiger.com'} Message: WT-2210 : if we allocate raw structures for a row-store and then re-use them for a column-store, we might not have allocated the record-number array, r->raw_recnos. Allocate the record-number array regardless of the type of object we're reconciling. Branch: develop https://github.com/wiredtiger/wiredtiger/commit/3fac9615c5ef80e219637a0293d160f274146a7f
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'keithbostic', u'name': u'Keith Bostic', u'email': u'keith@wiredtiger.com'}

          Message: WT-2210: enhance a comment, explain why we allocate raw compression structures
          for both row- and column- store objects regardless of the current page type.
          Branch: develop
          https://github.com/wiredtiger/wiredtiger/commit/957a8db79f743d0fe6db523b4f62b0f85f836814

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'keithbostic', u'name': u'Keith Bostic', u'email': u'keith@wiredtiger.com'} Message: WT-2210 : enhance a comment, explain why we allocate raw compression structures for both row- and column- store objects regardless of the current page type. Branch: develop https://github.com/wiredtiger/wiredtiger/commit/957a8db79f743d0fe6db523b4f62b0f85f836814
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexander.gorrod@mongodb.com'}

          Message: Merge pull request #2305 from wiredtiger/wt-2210

          WT-2210: raw compression fails if row-store recovery precedes column-store recovery
          Branch: develop
          https://github.com/wiredtiger/wiredtiger/commit/78bd4ac9a2dd63179b5ee8da65b514d287b9562c

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexander.gorrod@mongodb.com'} Message: Merge pull request #2305 from wiredtiger/wt-2210 WT-2210 : raw compression fails if row-store recovery precedes column-store recovery Branch: develop https://github.com/wiredtiger/wiredtiger/commit/78bd4ac9a2dd63179b5ee8da65b514d287b9562c
          Hide
          Dmitri Shubin Dmitri Shubin added a comment -
          Show
          Dmitri Shubin Dmitri Shubin added a comment - Keith Bostic , Alexander Gorrod , thank you!
          Hide
          keith.bostic Keith Bostic added a comment -

          Sorry, Dmitri Shubin, I meant to reach out to you on this one after the fix was merged.

          It should be trivial to backport this change to any previous release you like, and please let us know if you see any issues with the change. Thank you for the great test case!

          Show
          keith.bostic Keith Bostic added a comment - Sorry, Dmitri Shubin , I meant to reach out to you on this one after the fix was merged. It should be trivial to backport this change to any previous release you like, and please let us know if you see any issues with the change. Thank you for the great test case!
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'name': u'Ramon Fernandez', u'email': u'rfmnyc@gmail.com'}

          Message: Import wiredtiger-wiredtiger-mongodb-3.2.0-rc3-177-g9d375e3.tar.gz from wiredtiger branch mongodb-3.2

          ref: d9ec1ff..9d375e3

          16c0a1a WT-1315 Fix some leaks with join cursors.
          59857f9 WT-2222 Add statistics for named snapshots.
          4368d39 WT-1315 Cursor join implementation
          a72ddb7 WT-2218 Add truncate stats
          fb9cebe WT-2224 Track which deleted refs are discarded by a split.
          e2f1130 WT-2220 Split WT_TIMEDIFF macro into unit specific macros.
          be412b5 WT-2182 when internal pages grow large enough, split them into their parents
          ce8c091 WT-2219 Enhancements to in-memory testing
          347d922 WT-2220 time_t cleanup.
          08c0fcd WT-2217 change WT_CURSOR.insert to clear "set" key/value on return
          d1b5e7f WT-2135 Fix log_only setting for backup cursor. Fix initialization.
          78bd4ac WT-2210 raw compression fails if row-store recovery precedes column-store recovery
          c1b2634 WT-2182 fixes for splitting up the tree.
          0a1ee34 WT-2199 Fix transaction sync inconsistency.
          ee31bb2 WT-2182 Simplify the split deepen logic.
          c360d53 WT-2212 Add a "use_environment" config to "wiredtiger_open"
          3f132a4 WT-2182 detect internal page split races.
          Branch: master
          https://github.com/mongodb/mongo/commit/a0771ea5ec1b44537d3c409e3d712db24fd8e6bb

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'name': u'Ramon Fernandez', u'email': u'rfmnyc@gmail.com'} Message: Import wiredtiger-wiredtiger-mongodb-3.2.0-rc3-177-g9d375e3.tar.gz from wiredtiger branch mongodb-3.2 ref: d9ec1ff..9d375e3 16c0a1a WT-1315 Fix some leaks with join cursors. 59857f9 WT-2222 Add statistics for named snapshots. 4368d39 WT-1315 Cursor join implementation a72ddb7 WT-2218 Add truncate stats fb9cebe WT-2224 Track which deleted refs are discarded by a split. e2f1130 WT-2220 Split WT_TIMEDIFF macro into unit specific macros. be412b5 WT-2182 when internal pages grow large enough, split them into their parents ce8c091 WT-2219 Enhancements to in-memory testing 347d922 WT-2220 time_t cleanup. 08c0fcd WT-2217 change WT_CURSOR.insert to clear "set" key/value on return d1b5e7f WT-2135 Fix log_only setting for backup cursor. Fix initialization. 78bd4ac WT-2210 raw compression fails if row-store recovery precedes column-store recovery c1b2634 WT-2182 fixes for splitting up the tree. 0a1ee34 WT-2199 Fix transaction sync inconsistency. ee31bb2 WT-2182 Simplify the split deepen logic. c360d53 WT-2212 Add a "use_environment" config to "wiredtiger_open" 3f132a4 WT-2182 detect internal page split races. Branch: master https://github.com/mongodb/mongo/commit/a0771ea5ec1b44537d3c409e3d712db24fd8e6bb

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:
                Days since reply:
                1 year, 17 weeks, 6 days ago
                Date of 1st Reply: