Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-11354

Re-enable test_bug010 on tiered storage

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • WT11.3.0, 7.3.0-rc0
    • Affects Version/s: None
    • Component/s: None
    • None

      When running with the tiered hook and a fix for WT-11047 (where each call to checkpoint uses a flush_tier arg), test_bug010.py fails with EMFILE - too many open files.

      Here's the stack trace caught in the gdb:

      [Switching to Thread 0xffffe7fff120 (LWP 346014)]
      0x0000fffff7d6f200 in ?? () from /lib/aarch64-linux-gnu/libc.so.6
      (gdb) bt
      #0  0x0000fffff7d6f200 in ?? () from /lib/aarch64-linux-gnu/libc.so.6
      #1  0x0000fffff7d2a67c in raise () from /lib/aarch64-linux-gnu/libc.so.6
      #2  0x0000fffff7d17130 in abort () from /lib/aarch64-linux-gnu/libc.so.6
      #3  0x0000fffff7414788 in __wt_abort (session=0xaaaaab408320) at /home/dda/wt/git/wt-11047-tiered-hook-checkpoint/src/os_common/os_abort.c:30
      #4  0x0000fffff74ce42c in __wt_panic_func (session=0xaaaaab408320, error=24, 
          func=0xfffff75588d8 <__PRETTY_FUNCTION__.16> "__posix_directory_sync", line=151, category=WT_VERB_DEFAULT, 
          fmt=0xfffff75580d0 "%s: directory-sync") at /home/dda/wt/git/wt-11047-tiered-hook-checkpoint/src/support/err.c:570
      #5  0x0000fffff741eb58 in __posix_directory_sync (session=0xaaaaab408320, path=0xffffc0de5c80 "./test_bug010130-0000000011.wtobj")
          at /home/dda/wt/git/wt-11047-tiered-hook-checkpoint/src/os_posix/os_fs.c:151
      #6  0x0000fffff7420a04 in __posix_open_file (file_system=0xaaaaab0f2680, wt_session=0xaaaaab408320, 
          name=0xffffc0de5c80 "./test_bug010130-0000000011.wtobj", file_type=WT_FS_OPEN_FILE_TYPE_DATA, flags=52, file_handlep=0xffffc161f468)
          at /home/dda/wt/git/wt-11047-tiered-hook-checkpoint/src/os_posix/os_fs.c:827
      #7  0x0000fffff7417ed8 in __wt_open (session=0xaaaaab408320, name=0xffffc19b70b5 "test_bug010130-0000000011.wtobj", 
          file_type=WT_FS_OPEN_FILE_TYPE_DATA, flags=52, fhp=0xffffe7ffaa40)
          at /home/dda/wt/git/wt-11047-tiered-hook-checkpoint/src/os_common/os_fhandle.c:252
      #8  0x0000fffff71f8624 in __wt_block_manager_create (session=0xaaaaab408320, filename=0xffffc19b70b5 "test_bug010130-0000000011.wtobj", 
          allocsize=4096) at /home/dda/wt/git/wt-11047-tiered-hook-checkpoint/src/block/block_open.c:78
      #9  0x0000fffff7471354 in __create_file_block_manager (session=0xaaaaab408320, uri=0xffffc19b70b0 "file:test_bug010130-0000000011.wtobj", 
          filename=0xffffc19b70b5 "test_bug010130-0000000011.wtobj", allocsize=4096)
          at /home/dda/wt/git/wt-11047-tiered-hook-checkpoint/src/schema/schema_create.c:123
      #10 0x0000fffff74719b0 in __create_file (session=0xaaaaab408320, uri=0xffffc19b70b0 "file:test_bug010130-0000000011.wtobj", exclusive=false, 
          config=0xffffc211b270 "access_pattern_hint=none,allocation_size=4KB,app_metadata=,assert=(commit_timestamp=none,durable_timestamp=none,read_timestamp=none,write_timestamp=off),block_allocation=best,block_compressor=,cache_r"...)
          at /home/dda/wt/git/wt-11047-tiered-hook-checkpoint/src/schema/schema_create.c:261
      #11 0x0000fffff7476214 in __schema_create (session=0xaaaaab408320, uri=0xffffc19b70b0 "file:test_bug010130-0000000011.wtobj", 
          config=0xffffc211b270 "access_pattern_hint=none,allocation_size=4KB,app_metadata=,assert=(commit_timestamp=none,durable_timestamp=none,read_timestamp=none,write_timestamp=off),block_allocation=best,block_compressor=,cache_r"...)
          at /home/dda/wt/git/wt-11047-tiered-hook-checkpoint/src/schema/schema_create.c:1429
      #12 0x0000fffff74766f0 in __wt_schema_create (session=0xaaaaab407e60, uri=0xffffc19b70b0 "file:test_bug010130-0000000011.wtobj", 
          config=0xffffc211b270 "access_pattern_hint=none,allocation_size=4KB,app_metadata=,assert=(commit_timestamp=none,durable_timestamp=none,read_timestamp=none,write_timestamp=off),block_allocation=best,block_compressor=,cache_r"...)
          at /home/dda/wt/git/wt-11047-tiered-hook-checkpoint/src/schema/schema_create.c:1489
      #13 0x0000fffff74f1744 in __tiered_create_local (session=0xaaaaab407e60, tiered=0xaaaaac3ff960)
          at /home/dda/wt/git/wt-11047-tiered-hook-checkpoint/src/tiered/tiered_handle.c:245
      #14 0x0000fffff74f2d7c in __tiered_switch (session=0xaaaaab407e60, 
          config=0xaaaab1385cff "access_pattern_hint=none,allocation_size=4KB,app_metadata=,assert=(commit_timestamp=none,durable_timestamp=none,read_timestamp=none,write_timestamp=off),block_allocation=best,block_compressor=,cache_r"...)
          at /home/dda/wt/git/wt-11047-tiered-hook-checkpoint/src/tiered/tiered_handle.c:599
      #15 0x0000fffff74f2eb8 in __wt_tiered_switch (session=0xaaaaab407e60, 
          config=0xaaaab1385cff "access_pattern_hint=none,allocation_size=4KB,app_metadata=,assert=(commit_timestamp=none,durable_timestamp=none,read_timestamp=none,write_timestamp=off),block_allocation=best,block_compressor=,cache_r"...)
          at /home/dda/wt/git/wt-11047-tiered-hook-checkpoint/src/tiered/tiered_handle.c:626
      #16 0x0000fffff7505790 in __checkpoint_flush_tier (session=0xaaaaab407e60, force=true)
          at /home/dda/wt/git/wt-11047-tiered-hook-checkpoint/src/txn/txn_ckpt.c:141
      #17 0x0000fffff7507b6c in __checkpoint_prepare (session=0xaaaaab407e60, trackingp=0xffffe7ffca53, cfg=0xffffe7ffd550)
          at /home/dda/wt/git/wt-11047-tiered-hook-checkpoint/src/txn/txn_ckpt.c:803
      #18 0x0000fffff7508950 in __txn_checkpoint (session=0xaaaaab407e60, cfg=0xffffe7ffd550)
          at /home/dda/wt/git/wt-11047-tiered-hook-checkpoint/src/txn/txn_ckpt.c:1114
      #19 0x0000fffff75099c0 in __txn_checkpoint_wrapper (session=0xaaaaab407e60, cfg=0xffffe7ffd550)
          at /home/dda/wt/git/wt-11047-tiered-hook-checkpoint/src/txn/txn_ckpt.c:1423
      #20 0x0000fffff7509c40 in __wt_txn_checkpoint (session=0xaaaaab407e60, cfg=0xffffe7ffd550, waiting=true)
          at /home/dda/wt/git/wt-11047-tiered-hook-checkpoint/src/txn/txn_ckpt.c:1500
      #21 0x0000fffff74aa538 in __session_checkpoint (wt_session=0xaaaaab407e60, config=0xffffc06f8680 ",flush_tier=(enabled,force=true)")
          at /home/dda/wt/git/wt-11047-tiered-hook-checkpoint/src/session/session_api.c:2369
      #22 0x0000fffff7608164 in _wrap_Session_checkpoint (self=0xfffff7765da0, args=0xfffff6bc6200)
       

      Note that this may by the leak, or may simply be the "straw the broke the camel's back".

      While paused at the abort in the debugger, I ran

      lsof | grep ython

      and see thousands of lines like this:

      python3   345994 346014 python3               dda 2011u      REG              259,3     4096   80033922 /home/dda/wt/git/wt-11047-tiered-hook-checkpoint/build/WT_TEST/test_bug010.0/test_bug0100-0000000011.wtobj
      python3   345994 346014 python3               dda 2012u      REG              259,3     4096   80033925 /home/dda/wt/git/wt-11047-tiered-hook-checkpoint/build/WT_TEST/test_bug010.0/test_bug0101-0000000011.wtobj
      python3   345994 346014 python3               dda 2013u      REG              259,3     4096   80033923 /home/dda/wt/git/wt-11047-tiered-hook-checkpoint/build/WT_TEST/test_bug010.0/test_bug01010-0000000011.wtobj
      python3   345994 346014 python3               dda 2014u      REG              259,3     4096   80033926 /home/dda/wt/git/wt-11047-tiered-hook-checkpoint/build/WT_TEST/test_bug010.0/test_bug010100-0000000011.wtobj
      python3   345994 346014 python3               dda 2015u      REG              259,3     4096   80033927 /home/dda/wt/git/wt-11047-tiered-hook-checkpoint/build/WT_TEST/test_bug010.0/test_bug010101-0000000011.wtobj
      python3   345994 346014 python3               dda 2016u      REG              259,3     4096   80033931 /home/dda/wt/git/wt-11047-tiered-hook-checkpoint/build/WT_TEST/test_bug010.0/test_bug010102-0000000011.wtobj
      python3   345994 346014 python3               dda 2017u      REG              259,3     4096   80033928 /home/dda/wt/git/wt-11047-tiered-hook-checkpoint/build/WT_TEST/test_bug010.0/test_bug010103-0000000011.wtobj
      python3   345994 346014 python3               dda 2018u      REG              259,3     4096   80033930 /home/dda/wt/git/wt-11047-tiered-hook-checkpoint/build/WT_TEST/test_bug010.0/test_bug010104-0000000011.wtobj 

      So it seems like we're opening each object file many times - many more times than the number of checkpoints. 

      $ lsof | grep ython | wc
      26965  292461 5705729

      26K files open.  The test uses 200 tables

            Assignee:
            keith.smith@mongodb.com Keith Smith
            Reporter:
            donald.anderson@mongodb.com Donald Anderson
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: