-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: None
-
None
-
Storage Engines
When running with the tiered hook and a fix for WT-11047 (where each call to checkpoint uses a flush_tier arg), test_bug010.py fails with EMFILE - too many open files.
Here's the stack trace caught in the gdb:
[Switching to Thread 0xffffe7fff120 (LWP 346014)] 0x0000fffff7d6f200 in ?? () from /lib/aarch64-linux-gnu/libc.so.6 (gdb) bt #0 0x0000fffff7d6f200 in ?? () from /lib/aarch64-linux-gnu/libc.so.6 #1 0x0000fffff7d2a67c in raise () from /lib/aarch64-linux-gnu/libc.so.6 #2 0x0000fffff7d17130 in abort () from /lib/aarch64-linux-gnu/libc.so.6 #3 0x0000fffff7414788 in __wt_abort (session=0xaaaaab408320) at /home/dda/wt/git/wt-11047-tiered-hook-checkpoint/src/os_common/os_abort.c:30 #4 0x0000fffff74ce42c in __wt_panic_func (session=0xaaaaab408320, error=24, func=0xfffff75588d8 <__PRETTY_FUNCTION__.16> "__posix_directory_sync", line=151, category=WT_VERB_DEFAULT, fmt=0xfffff75580d0 "%s: directory-sync") at /home/dda/wt/git/wt-11047-tiered-hook-checkpoint/src/support/err.c:570 #5 0x0000fffff741eb58 in __posix_directory_sync (session=0xaaaaab408320, path=0xffffc0de5c80 "./test_bug010130-0000000011.wtobj") at /home/dda/wt/git/wt-11047-tiered-hook-checkpoint/src/os_posix/os_fs.c:151 #6 0x0000fffff7420a04 in __posix_open_file (file_system=0xaaaaab0f2680, wt_session=0xaaaaab408320, name=0xffffc0de5c80 "./test_bug010130-0000000011.wtobj", file_type=WT_FS_OPEN_FILE_TYPE_DATA, flags=52, file_handlep=0xffffc161f468) at /home/dda/wt/git/wt-11047-tiered-hook-checkpoint/src/os_posix/os_fs.c:827 #7 0x0000fffff7417ed8 in __wt_open (session=0xaaaaab408320, name=0xffffc19b70b5 "test_bug010130-0000000011.wtobj", file_type=WT_FS_OPEN_FILE_TYPE_DATA, flags=52, fhp=0xffffe7ffaa40) at /home/dda/wt/git/wt-11047-tiered-hook-checkpoint/src/os_common/os_fhandle.c:252 #8 0x0000fffff71f8624 in __wt_block_manager_create (session=0xaaaaab408320, filename=0xffffc19b70b5 "test_bug010130-0000000011.wtobj", allocsize=4096) at /home/dda/wt/git/wt-11047-tiered-hook-checkpoint/src/block/block_open.c:78 #9 0x0000fffff7471354 in __create_file_block_manager (session=0xaaaaab408320, uri=0xffffc19b70b0 "file:test_bug010130-0000000011.wtobj", filename=0xffffc19b70b5 "test_bug010130-0000000011.wtobj", allocsize=4096) at /home/dda/wt/git/wt-11047-tiered-hook-checkpoint/src/schema/schema_create.c:123 #10 0x0000fffff74719b0 in __create_file (session=0xaaaaab408320, uri=0xffffc19b70b0 "file:test_bug010130-0000000011.wtobj", exclusive=false, config=0xffffc211b270 "access_pattern_hint=none,allocation_size=4KB,app_metadata=,assert=(commit_timestamp=none,durable_timestamp=none,read_timestamp=none,write_timestamp=off),block_allocation=best,block_compressor=,cache_r"...) at /home/dda/wt/git/wt-11047-tiered-hook-checkpoint/src/schema/schema_create.c:261 #11 0x0000fffff7476214 in __schema_create (session=0xaaaaab408320, uri=0xffffc19b70b0 "file:test_bug010130-0000000011.wtobj", config=0xffffc211b270 "access_pattern_hint=none,allocation_size=4KB,app_metadata=,assert=(commit_timestamp=none,durable_timestamp=none,read_timestamp=none,write_timestamp=off),block_allocation=best,block_compressor=,cache_r"...) at /home/dda/wt/git/wt-11047-tiered-hook-checkpoint/src/schema/schema_create.c:1429 #12 0x0000fffff74766f0 in __wt_schema_create (session=0xaaaaab407e60, uri=0xffffc19b70b0 "file:test_bug010130-0000000011.wtobj", config=0xffffc211b270 "access_pattern_hint=none,allocation_size=4KB,app_metadata=,assert=(commit_timestamp=none,durable_timestamp=none,read_timestamp=none,write_timestamp=off),block_allocation=best,block_compressor=,cache_r"...) at /home/dda/wt/git/wt-11047-tiered-hook-checkpoint/src/schema/schema_create.c:1489 #13 0x0000fffff74f1744 in __tiered_create_local (session=0xaaaaab407e60, tiered=0xaaaaac3ff960) at /home/dda/wt/git/wt-11047-tiered-hook-checkpoint/src/tiered/tiered_handle.c:245 #14 0x0000fffff74f2d7c in __tiered_switch (session=0xaaaaab407e60, config=0xaaaab1385cff "access_pattern_hint=none,allocation_size=4KB,app_metadata=,assert=(commit_timestamp=none,durable_timestamp=none,read_timestamp=none,write_timestamp=off),block_allocation=best,block_compressor=,cache_r"...) at /home/dda/wt/git/wt-11047-tiered-hook-checkpoint/src/tiered/tiered_handle.c:599 #15 0x0000fffff74f2eb8 in __wt_tiered_switch (session=0xaaaaab407e60, config=0xaaaab1385cff "access_pattern_hint=none,allocation_size=4KB,app_metadata=,assert=(commit_timestamp=none,durable_timestamp=none,read_timestamp=none,write_timestamp=off),block_allocation=best,block_compressor=,cache_r"...) at /home/dda/wt/git/wt-11047-tiered-hook-checkpoint/src/tiered/tiered_handle.c:626 #16 0x0000fffff7505790 in __checkpoint_flush_tier (session=0xaaaaab407e60, force=true) at /home/dda/wt/git/wt-11047-tiered-hook-checkpoint/src/txn/txn_ckpt.c:141 #17 0x0000fffff7507b6c in __checkpoint_prepare (session=0xaaaaab407e60, trackingp=0xffffe7ffca53, cfg=0xffffe7ffd550) at /home/dda/wt/git/wt-11047-tiered-hook-checkpoint/src/txn/txn_ckpt.c:803 #18 0x0000fffff7508950 in __txn_checkpoint (session=0xaaaaab407e60, cfg=0xffffe7ffd550) at /home/dda/wt/git/wt-11047-tiered-hook-checkpoint/src/txn/txn_ckpt.c:1114 #19 0x0000fffff75099c0 in __txn_checkpoint_wrapper (session=0xaaaaab407e60, cfg=0xffffe7ffd550) at /home/dda/wt/git/wt-11047-tiered-hook-checkpoint/src/txn/txn_ckpt.c:1423 #20 0x0000fffff7509c40 in __wt_txn_checkpoint (session=0xaaaaab407e60, cfg=0xffffe7ffd550, waiting=true) at /home/dda/wt/git/wt-11047-tiered-hook-checkpoint/src/txn/txn_ckpt.c:1500 #21 0x0000fffff74aa538 in __session_checkpoint (wt_session=0xaaaaab407e60, config=0xffffc06f8680 ",flush_tier=(enabled,force=true)") at /home/dda/wt/git/wt-11047-tiered-hook-checkpoint/src/session/session_api.c:2369 #22 0x0000fffff7608164 in _wrap_Session_checkpoint (self=0xfffff7765da0, args=0xfffff6bc6200)
Note that this may by the leak, or may simply be the "straw the broke the camel's back".
While paused at the abort in the debugger, I ran
lsof | grep ython
and see thousands of lines like this:
python3 345994 346014 python3 dda 2011u REG 259,3 4096 80033922 /home/dda/wt/git/wt-11047-tiered-hook-checkpoint/build/WT_TEST/test_bug010.0/test_bug0100-0000000011.wtobj python3 345994 346014 python3 dda 2012u REG 259,3 4096 80033925 /home/dda/wt/git/wt-11047-tiered-hook-checkpoint/build/WT_TEST/test_bug010.0/test_bug0101-0000000011.wtobj python3 345994 346014 python3 dda 2013u REG 259,3 4096 80033923 /home/dda/wt/git/wt-11047-tiered-hook-checkpoint/build/WT_TEST/test_bug010.0/test_bug01010-0000000011.wtobj python3 345994 346014 python3 dda 2014u REG 259,3 4096 80033926 /home/dda/wt/git/wt-11047-tiered-hook-checkpoint/build/WT_TEST/test_bug010.0/test_bug010100-0000000011.wtobj python3 345994 346014 python3 dda 2015u REG 259,3 4096 80033927 /home/dda/wt/git/wt-11047-tiered-hook-checkpoint/build/WT_TEST/test_bug010.0/test_bug010101-0000000011.wtobj python3 345994 346014 python3 dda 2016u REG 259,3 4096 80033931 /home/dda/wt/git/wt-11047-tiered-hook-checkpoint/build/WT_TEST/test_bug010.0/test_bug010102-0000000011.wtobj python3 345994 346014 python3 dda 2017u REG 259,3 4096 80033928 /home/dda/wt/git/wt-11047-tiered-hook-checkpoint/build/WT_TEST/test_bug010.0/test_bug010103-0000000011.wtobj python3 345994 346014 python3 dda 2018u REG 259,3 4096 80033930 /home/dda/wt/git/wt-11047-tiered-hook-checkpoint/build/WT_TEST/test_bug010.0/test_bug010104-0000000011.wtobj
So it seems like we're opening each object file many times - many more times than the number of checkpoints.
$ lsof | grep ython | wc 26965 292461 5705729
26K files open. The test uses 200 tables