[SERVER-19774] WT_NOTFOUND: item not found during DB recovery Created: 05/Aug/15  Updated: 01/Mar/16  Resolved: 01/Mar/16

Status: Closed
Project: Core Server
Component/s: Storage, WiredTiger
Affects Version/s: 3.1.7
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Jonathan Abrahams Assignee: Jonathan Abrahams
Resolution: Done Votes: 0
Labels: 32powercycle, 32qa
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File mongod-wiredTiger-recovery.log     Text File mongod-wiredTiger.log     File mongod-wt-notfound.tgz     File powertest.sh     File wiredTiger-afterrecovery.tar-aa     File wiredTiger-afterrecovery.tar-ab     File wiredTiger-afterrecovery.tar-ac     File wiredTiger-afterrecovery.tar-ad     File wiredTiger-afterrecovery.tar-ae     File wiredTiger-afterrecovery.tar-af     File wiredTiger-afterrecovery.tar-ag     File wiredTiger-afterrecovery.tar-ah     File wiredTiger-beforerecovery.tar-aa     File wiredTiger-beforerecovery.tar-ab     File wiredTiger-beforerecovery.tar-ac     File wiredTiger-beforerecovery.tar-ad     File wiredTiger-beforerecovery.tar-ae     File wiredTiger-beforerecovery.tar-af     File wiredTiger-beforerecovery.tar-ag     File wiredTiger-beforerecovery.tar-ah     File wiredTiger-firstrun.tar-aa     File wiredTiger.tar-aa     File wiredTiger.tar-ab     File wiredTiger.tar-ac     File wiredTiger.tar-ad     File wiredTiger.tar-ae     File wiredTiger.tar-af     File wiredTiger.tar-ag     File wiredTiger.tar-ah    
Issue Links:
Depends
Related
is related to SERVER-19274 DB fails to recover after system cras... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

Using the powercycle test script, it failed on 3 different platforms within 60 loops:

powertest.sh -s <ip addr> -R -N 20 -F <mongo root> -l 100

Sprint: TIG 11 (03/11/16)
Participants:

 Description   

The recovery of the DB failed, with WiredTiger storage, after a power cycle event:

2015-08-04T18:05:39.663-0400 I CONTROL  [main] ***** SERVER RESTARTED *****
2015-08-04T18:05:39.703-0400 W -        [initandlisten] Detected unclean shutdown - /home/jonathan/pt-317-wt/data/wiredTiger/mongod.lock is not empty.
2015-08-04T18:05:39.703-0400 W STORAGE  [initandlisten] Recovering data from the last clean checkpoint.
2015-08-04T18:05:39.703-0400 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=8G,session_max=20000,eviction=(threads_max=4),statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),
2015-08-04T18:05:39.826-0400 E STORAGE  [initandlisten] WiredTiger (-31803) [1438725939:826651][2610:0x7f3e83636cc0], file:collection-215-1555756433786542724.wt: Operation failed during recovery: WT_NOTFOUND: item not found
2015-08-04T18:05:39.862-0400 I -        [initandlisten] Assertion: 28595:-31803: WT_NOTFOUND: item not found
2015-08-04T18:05:39.869-0400 I STORAGE  [initandlisten] exception in initAndListen: 28595 -31803: WT_NOTFOUND: item not found, terminating
2015-08-04T18:05:39.869-0400 I CONTROL  [initandlisten] dbexit:  rc: 100



 Comments   
Comment by Jonathan Abrahams [ 01/Mar/16 ]

Let's close this, since this issue is not as the original.

PS - the device has write-cache enabled.

Comment by Susan LoVerso [ 29/Feb/16 ]

That error sounds a lot like SERVER-19274. Does the device have write-cache turned off?

Comment by Jonathan Abrahams [ 29/Feb/16 ]

michael.cahill I apologize that I wasn't clear on the failure observed on a specific hardware. The failure is not due to too many open files, it seems to be related to the hardware and it's settings, where the collection file is not flushed to disk. As mentioned previously, the recovery after a power failure fails with the following error:

2016-02-25T10:08:37.037-0500 I CONTROL  [main] ***** SERVER RESTARTED *****
2016-02-25T10:08:37.048-0500 I CONTROL  [initandlisten] MongoDB starting : pid=2577 port=37017 dbpath=/home/jonathan/wt332/data/wiredTiger 64-bit host=capja2
2016-02-25T10:08:37.048-0500 I CONTROL  [initandlisten] db version v3.3.2-18-g4c1a7d5
2016-02-25T10:08:37.048-0500 I CONTROL  [initandlisten] git version: 4c1a7d5446ffc299f053b2e97ca5a80ce93829a2
2016-02-25T10:08:37.049-0500 I CONTROL  [initandlisten] allocator: tcmalloc
2016-02-25T10:08:37.049-0500 I CONTROL  [initandlisten] modules: none
2016-02-25T10:08:37.049-0500 I CONTROL  [initandlisten] build environment:
2016-02-25T10:08:37.049-0500 I CONTROL  [initandlisten]     distarch: x86_64
2016-02-25T10:08:37.049-0500 I CONTROL  [initandlisten]     target_arch: x86_64
2016-02-25T10:08:37.049-0500 I CONTROL  [initandlisten] options: { net: { port: 37017 }, processManagement: { fork: true }, replication: { replSet: "pc" }, setParameter: { enableTestCommands: "1" }, storage: { dbPath: "wt332/data/wiredTiger", engine: "wiredTiger" }, systemLog: { destination: "file", logAppend: true, path: "wt332/logs/mongod-wiredTiger-recovery.log" } }
2016-02-25T10:08:37.049-0500 W -        [initandlisten] Detected unclean shutdown - /home/jonathan/wt332/data/wiredTiger/mongod.lock is not empty.
2016-02-25T10:08:37.093-0500 W STORAGE  [initandlisten] Recovering data from the last clean checkpoint.
2016-02-25T10:08:37.093-0500 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=8G,session_max=20000,eviction=(threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),
2016-02-25T10:08:42.905-0500 E STORAGE  [initandlisten] WiredTiger (2) [1456412922:905321][2577:0x7f1740e41cc0], file:collection-3643-3972959707230803588.wt, txn-recover: /home/jonathan/wt332/data/wiredTiger/collection-3643-3972959707230803588.wt: No such file or directory
2016-02-25T10:08:42.905-0500 E STORAGE  [initandlisten] WiredTiger (2) [1456412922:905382][2577:0x7f1740e41cc0], file:collection-3643-3972959707230803588.wt, txn-recover: Operation failed during recovery: No such file or directory
2016-02-25T10:08:45.328-0500 E STORAGE  [initandlisten] WiredTiger (2) [1456412925:328468][2577:0x7f1740e41cc0], file:collection-3643-3972959707230803588.wt, WT_SESSION.open_cursor: /home/jonathan/wt332/data/wiredTiger/collection-3643-3972959707230803588.wt: No such file or directory
2016-02-25T10:08:45.328-0500 E STORAGE  [initandlisten] no cursor for uri: table:collection-3643-3972959707230803588
2016-02-25T10:08:45.328-0500 F -        [initandlisten] Invalid access at address: 0x58
2016-02-25T10:08:45.352-0500 F -        [initandlisten] Got signal: 11 (Segmentation fault).
 
 0x12d7152 0x12d6079 0x12d63f8 0x7f173fe03130 0x105930e 0x1057c59 0x1049719 0xfaabf4 0xfaff4a 0x1048049 0xf6f28e 0x93e2e8 0x94293d 0x7f173fa54af5 0x93bc79
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"ED7152","s":"_ZN5mongo15printStackTraceERSo"},{"b":"400000","o":"ED6079"},{"b":"400000","o":"ED63F8"},{"b":"7F173FDF4000","o":"F130"},{"b":"400000","o":"C5930E","s":"_ZN5mongo21WiredTigerRecordStore6Cursor4nextEv"},{"b":"400000","o":"C57C59","s":"_ZN5mongo21WiredTigerRecordStoreC1EPNS_16OperationContextENS_10StringDataES3_SsbbllPNS_14CappedCallbackEPNS_20WiredTigerSizeStorerE"},{"b":"400000","o":"C49719","s":"_ZN5mongo18WiredTigerKVEngine14getRecordStoreEPNS_16OperationContextENS_10StringDataES3_RKNS_17CollectionOptionsE"},{"b":"400000","o":"BAABF4","s":"_ZN5mongo22KVDatabaseCatalogEntry14initCollectionEPNS_16OperationContextERKSsb"},{"b":"400000","o":"BAFF4A","s":"_ZN5mongo15KVStorageEngineC1EPNS_8KVEngineERKNS_22KVStorageEngineOptionsE"},{"b":"400000","o":"C48049"},{"b":"400000","o":"B6F28E","s":"_ZN5mongo20ServiceContextMongoD29initializeGlobalStorageEngineEv"},{"b":"400000","o":"53E2E8","s":"_ZN5mongo13initAndListenEi"},{"b":"400000","o":"54293D","s":"main"},{"b":"7F173FA33000","o":"21AF5","s":"__libc_start_main"},{"b":"400000","o":"53BC79"}],"processInfo":{ "mongodbVersion" : "3.3.2-18-g4c1a7d5", "gitVersion" : "4c1a7d5446ffc299f053b2e97ca5a80ce93829a2", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "3.10.0-229.4.2.el7.x86_64", "version" : "#1 SMP Wed May 13 10:06:09 UTC 2015", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000" }, { "b" : "7FFF7A32F000", "elfType" : 3 }, { "b" : "7F1740A33000", "path" : "/lib64/librt.so.1", "elfType" : 3 }, { "b" : "7F174082F000", "path" : "/lib64/libdl.so.2", "elfType" : 3 }, { "b" : "7F1740528000", "path" : "/lib64/libstdc++.so.6", "elfType" : 3 }, { "b" : "7F1740226000", "path" : "/lib64/libm.so.6", "elfType" : 3 }, { "b" : "7F1740010000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3 }, { "b" : "7F173FDF4000", "path" : "/lib64/libpthread.so.0", "elfType" : 3 }, { "b" : "7F173FA33000", "path" : "/lib64/libc.so.6", "elfType" : 3 }, { "b" : "7F1740C3B000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3 } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x32) [0x12d7152]
 mongod(+0xED6079) [0x12d6079]
 mongod(+0xED63F8) [0x12d63f8]
 libpthread.so.0(+0xF130) [0x7f173fe03130]
 mongod(_ZN5mongo21WiredTigerRecordStore6Cursor4nextEv+0x10E) [0x105930e]
 mongod(_ZN5mongo21WiredTigerRecordStoreC1EPNS_16OperationContextENS_10StringDataES3_SsbbllPNS_14CappedCallbackEPNS_20WiredTigerSizeStorerE+0x339) [0x1057c59]
 mongod(_ZN5mongo18WiredTigerKVEngine14getRecordStoreEPNS_16OperationContextENS_10StringDataES3_RKNS_17CollectionOptionsE+0xE9) [0x1049719]
 mongod(_ZN5mongo22KVDatabaseCatalogEntry14initCollectionEPNS_16OperationContextERKSsb+0x204) [0xfaabf4]
 mongod(_ZN5mongo15KVStorageEngineC1EPNS_8KVEngineERKNS_22KVStorageEngineOptionsE+0x69A) [0xfaff4a]
 mongod(+0xC48049) [0x1048049]
 mongod(_ZN5mongo20ServiceContextMongoD29initializeGlobalStorageEngineEv+0x42E) [0xf6f28e]
 mongod(_ZN5mongo13initAndListenEi+0x388) [0x93e2e8]
 mongod(main+0x15D) [0x94293d]
 libc.so.6(__libc_start_main+0xF5) [0x7f173fa54af5]
 mongod(+0x53BC79) [0x93bc79]
-----  END BACKTRACE  -----
2016-02-25T10:08:45.352-0500 F -        [initandlisten] /proc/self/maps:
00400000-01fe1000 r-xp 00000000 fd:02 555243216                          /home/jonathan/wt332/mongodb-linux-x86_64-3.3.2-18-g4c1a7d5/bin/mongod
2016-02-25T10:08:45.352-0500 F -        [initandlisten] 021e1000-022b9000 rwxp 01be1000 fd:02 555243216                          /home/jonathan/wt332/mongodb-linux-x86_64-3.3.2-18-g4c1a7d5/bin/mongod
2016-02-25T10:08:45.352-0500 F -        [initandlisten] 022b9000-02317000 rwxp 00000000 00:00 0
2016-02-25T10:08:45.352-0500 F -        [initandlisten] 040e7000-04ce8000 rwxp 00000000 00:00 0                                  [heap]
2016-02-25T10:08:45.352-0500 F -        [initandlisten] 04ce8000-380ee000 rwxp 00000000 00:00 0                                  [heap]
2016-02-25T10:08:45.352-0500 F -        [initandlisten] 7f1735504000-7f1735505000 ---p 00000000 00:00 0
2016-02-25T10:08:45.352-0500 F -        [initandlisten] 7f1735505000-7f1735d05000 rwxp 00000000 00:00 0                          [stack:2589]
2016-02-25T10:08:45.353-0500 F -        [initandlisten] 7f1735d05000-7f1735d06000 ---p 00000000 00:00 0
2016-02-25T10:08:45.353-0500 F -        [initandlisten] 7f1735d06000-7f1736506000 rwxp 00000000 00:00 0                          [stack:2588]
2016-02-25T10:08:45.353-0500 F -        [initandlisten] 7f1736506000-7f1736507000 ---p 00000000 00:00 0
2016-02-25T10:08:45.353-0500 F -        [initandlisten] 7f1736507000-7f1736d07000 rwxp 00000000 00:00 0                          [stack:2587]
2016-02-25T10:08:45.353-0500 F -        [initandlisten] 7f1736d07000-7f1736d08000 ---p 00000000 00:00 0
2016-02-25T10:08:45.353-0500 F -        [initandlisten] 7f1736d08000-7f1737508000 rwxp 00000000 00:00 0                          [stack:2583]
2016-02-25T10:08:45.353-0500 F -        [initandlisten] 7f1737508000-7f1737509000 ---p 00000000 00:00 0
2016-02-25T10:08:45.353-0500 F -        [initandlisten] 7f1737509000-7f1737d09000 rwxp 00000000 00:00 0                          [stack:2584]
2016-02-25T10:08:45.353-0500 F -        [initandlisten] 7f1737d09000-7f1737d0a000 ---p 00000000 00:00 0
2016-02-25T10:08:45.353-0500 F -        [initandlisten] 7f1737d0a000-7f173850a000 rwxp 00000000 00:00 0                          [stack:2585]
2016-02-25T10:08:45.353-0500 F -        [initandlisten] 7f173850a000-7f173850b000 ---p 00000000 00:00 0
2016-02-25T10:08:45.353-0500 F -        [initandlisten] 7f173850b000-7f1738d0b000 rwxp 00000000 00:00 0                          [stack:2586]
2016-02-25T10:08:45.353-0500 F -        [initandlisten] 7f1738d0b000-7f1738d0c000 ---p 00000000 00:00 0
2016-02-25T10:08:45.353-0500 F -        [initandlisten] 7f1738d0c000-7f173950c000 rwxp 00000000 00:00 0                          [stack:2578]
2016-02-25T10:08:45.353-0500 F -        [initandlisten] 7f173950c000-7f173fa33000 r-xp 00000000 fd:01 11251                      /usr/lib/locale/locale-archive
2016-02-25T10:08:45.353-0500 F -        [initandlisten] 7f173fa33000-7f173fbe9000 r-xp 00000000 fd:01 205023206                  /usr/lib64/libc-2.17.so
2016-02-25T10:08:45.353-0500 F -        [initandlisten] 7f173fbe9000-7f173fde9000 ---p 001b6000 fd:01 205023206                  /usr/lib64/libc-2.17.so
2016-02-25T10:08:45.353-0500 F -        [initandlisten] 7f173fde9000-7f173fded000 r-xp 001b6000 fd:01 205023206                  /usr/lib64/libc-2.17.so
2016-02-25T10:08:45.353-0500 F -        [initandlisten] 7f173fded000-7f173fdef000 rwxp 001ba000 fd:01 205023206                  /usr/lib64/libc-2.17.so
2016-02-25T10:08:45.353-0500 F -        [initandlisten] 7f173fdef000-7f173fdf4000 rwxp 00000000 00:00 0
2016-02-25T10:08:45.353-0500 F -        [initandlisten] 7f173fdf4000-7f173fe0a000 r-xp 00000000 fd:01 201334002                  /usr/lib64/libpthread-2.17.so
2016-02-25T10:08:45.353-0500 F -        [initandlisten] 7f173fe0a000-7f174000a000 ---p 00016000 fd:01 201334002                  /usr/lib64/libpthread-2.17.so
2016-02-25T10:08:45.353-0500 F -        [initandlisten] 7f174000a000-7f174000b000 r-xp 00016000 fd:01 201334002                  /usr/lib64/libpthread-2.17.so
2016-02-25T10:08:45.353-0500 F -        [initandlisten] 7f174000b000-7f174000c000 rwxp 00017000 fd:01 201334002                  /usr/lib64/libpthread-2.17.so
2016-02-25T10:08:45.353-0500 F -        [initandlisten] 7f174000c000-7f1740010000 rwxp 00000000 00:00 0
2016-02-25T10:08:45.353-0500 F -        [initandlisten] 7f1740010000-7f1740025000 r-xp 00000000 fd:01 205023195                  /usr/lib64/libgcc_s-4.8.3-20140911.so.1
2016-02-25T10:08:45.353-0500 F -        [initandlisten] 7f1740025000-7f1740224000 ---p 00015000 fd:01 205023195                  /usr/lib64/libgcc_s-4.8.3-20140911.so.1
2016-02-25T10:08:45.353-0500 F -        [initandlisten] 7f1740224000-7f1740225000 r-xp 00014000 fd:01 205023195                  /usr/lib64/libgcc_s-4.8.3-20140911.so.1
2016-02-25T10:08:45.353-0500 F -        [initandlisten] 7f1740225000-7f1740226000 rwxp 00015000 fd:01 205023195                  /usr/lib64/libgcc_s-4.8.3-20140911.so.1
2016-02-25T10:08:45.353-0500 F -        [initandlisten] 7f1740226000-7f1740327000 r-xp 00000000 fd:01 201333945                  /usr/lib64/libm-2.17.so
2016-02-25T10:08:45.353-0500 F -        [initandlisten] 7f1740327000-7f1740526000 ---p 00101000 fd:01 201333945                  /usr/lib64/libm-2.17.so
2016-02-25T10:08:45.353-0500 F -        [initandlisten] 7f1740526000-7f1740527000 r-xp 00100000 fd:01 201333945                  /usr/lib64/libm-2.17.so
2016-02-25T10:08:45.353-0500 F -        [initandlisten] 7f1740527000-7f1740528000 rwxp 00101000 fd:01 201333945                  /usr/lib64/libm-2.17.so
2016-02-25T10:08:45.353-0500 F -        [initandlisten] 7f1740528000-7f1740611000 r-xp 00000000 fd:01 201334037                  /usr/lib64/libstdc++.so.6.0.19
2016-02-25T10:08:45.353-0500 F -        [initandlisten] 7f1740611000-7f1740810000 ---p 000e9000 fd:01 201334037                  /usr/lib64/libstdc++.so.6.0.19
2016-02-25T10:08:45.353-0500 F -        [initandlisten] 7f1740810000-7f1740818000 r-xp 000e8000 fd:01 201334037                  /usr/lib64/libstdc++.so.6.0.19
2016-02-25T10:08:45.353-0500 F -        [initandlisten] 7f1740818000-7f174081a000 rwxp 000f0000 fd:01 201334037                  /usr/lib64/libstdc++.so.6.0.19
2016-02-25T10:08:45.353-0500 F -        [initandlisten] 7f174081a000-7f174082f000 rwxp 00000000 00:00 0
2016-02-25T10:08:45.353-0500 F -        [initandlisten] 7f174082f000-7f1740832000 r-xp 00000000 fd:01 201333943                  /usr/lib64/libdl-2.17.so
2016-02-25T10:08:45.353-0500 F -        [initandlisten] 7f1740832000-7f1740a31000 ---p 00003000 fd:01 201333943                  /usr/lib64/libdl-2.17.so
2016-02-25T10:08:45.353-0500 F -        [initandlisten] 7f1740a31000-7f1740a32000 r-xp 00002000 fd:01 201333943                  /usr/lib64/libdl-2.17.so
2016-02-25T10:08:45.353-0500 F -        [initandlisten] 7f1740a32000-7f1740a33000 rwxp 00003000 fd:01 201333943                  /usr/lib64/libdl-2.17.so
2016-02-25T10:08:45.353-0500 F -        [initandlisten] 7f1740a33000-7f1740a3a000 r-xp 00000000 fd:01 201334007                  /usr/lib64/librt-2.17.so
2016-02-25T10:08:45.353-0500 F -        [initandlisten] 7f1740a3a000-7f1740c39000 ---p 00007000 fd:01 201334007                  /usr/lib64/librt-2.17.so
2016-02-25T10:08:45.353-0500 F -        [initandlisten] 7f1740c39000-7f1740c3a000 r-xp 00006000 fd:01 201334007                  /usr/lib64/librt-2.17.so
2016-02-25T10:08:45.353-0500 F -        [initandlisten] 7f1740c3a000-7f1740c3b000 rwxp 00007000 fd:01 201334007                  /usr/lib64/librt-2.17.so
2016-02-25T10:08:45.353-0500 F -        [initandlisten] 7f1740c3b000-7f1740c5c000 r-xp 00000000 fd:01 205023199                  /usr/lib64/ld-2.17.so
2016-02-25T10:08:45.353-0500 F -        [initandlisten] 7f1740e3f000-7f1740e46000 rwxp 00000000 00:00 0
2016-02-25T10:08:45.353-0500 F -        [initandlisten] 7f1740e54000-7f1740e5b000 r-xs 00000000 fd:01 68823729                   /usr/lib64/gconv/gconv-modules.cache
2016-02-25T10:08:45.353-0500 F -        [initandlisten] 7f1740e5b000-7f1740e5c000 rwxp 00000000 00:00 0
2016-02-25T10:08:45.353-0500 F -        [initandlisten] 7f1740e5c000-7f1740e5d000 r-xp 00021000 fd:01 205023199                  /usr/lib64/ld-2.17.so
2016-02-25T10:08:45.353-0500 F -        [initandlisten] 7f1740e5d000-7f1740e5e000 rwxp 00022000 fd:01 205023199                  /usr/lib64/ld-2.17.so
2016-02-25T10:08:45.353-0500 F -        [initandlisten] 7f1740e5e000-7f1740e5f000 rwxp 00000000 00:00 0
2016-02-25T10:08:45.353-0500 F -        [initandlisten] 7fff799f6000-7fff79a17000 rwxp 00000000 00:00 0                          [stack]
2016-02-25T10:08:45.353-0500 F -        [initandlisten] 7fff79a2f000-7fff79a31000 r-xp 00000000 00:00 0                          [vdso]
2016-02-25T10:08:45.353-0500 F -        [initandlisten] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

Do you want to pursue this further?

Comment by Michael Cahill (Inactive) [ 26/Feb/16 ]

FWIW, if I deliberately induce recovery to run with insufficient file handles, I get error messages like this:

[1456461024:605708][122228:0x7f8137d4f740], test_txn16.test_txn16.test_many_tables: , file:test_txn16-092.wt, txn-recover: ./test_txn16-092.wt: Too many open files
[1456461024:605855][122228:0x7f8137d4f740], test_txn16.test_txn16.test_many_tables: , file:test_txn16-092.wt, txn-recover: Operation failed during recovery: Too many open files
[1456461024:606111][122228:0x7f8137d4f740], test_txn16.test_txn16.test_many_tables: , file:test_txn16-092.wt, txn-recover: Recovery failed: Too many open files
lt-wt: Too many open files

Comment by Michael Cahill (Inactive) [ 26/Feb/16 ]

jonathan.abrahams, I'm unclear on the status.

When you reproduce this failure, do you see mongod fail with "Too many open files"? If so, are we treating that as unexpected (i.e., do we expect to see that error given the workload and the ulimit setting on server)?

Or is the issue that after mongod fails in that way, recovery is failing with WT_NOTFOUND, but only on the same server? I gather from sue.loverso's comment above that recovery of the database was successful when the files were copied elsewhere.

I could believe that recovery is failing because it tries to open all of the same files that were open before the crash, so if mongod is hitting the open file limit, then recovery could also hit that limit. But if that is the case, I would expect that (a) WiredTiger should generate a clearer error message, and (b) it should be reproducible anywhere with the same open file limit.

TL;DR If we can't reproduce this elsewhere, I'm happy to close it.

Comment by Jonathan Abrahams [ 25/Feb/16 ]

This seems to be reproducible only one 1 piece of hardware. The same scenario does not fail in a Ubuntu VM or a Windows host.

michael.cahill Given that it reproduces rather easily on this 1 host, is it worthwhile to continue an investigation? Do we get any worthwhile results from this particular failure?

Comment by Susan LoVerso [ 14/Jan/16 ]

This is different than the other FSM related tickets. However, I cannot reproduce the recovery error when I run recovery on the database attached. Recovery is successful.

jonathan.abrahams if you can try to reproduce this on the current 3.2 release, while capturing the "before" copy of the database that would help (as in before mongod restarts and runs recovery where it errored with "Item not found". You were running the FSM powercycle test, but the original error was too many open files. You could force that error with ulimit adjustments on the open files.

Since there have been quite a number of changes/fixes in the last 2 months, if we cannot reproduce it after some amount of time, we can close this and open a new ticket if it ever turns up again.

Comment by Ian Whalen (Inactive) [ 13/Jan/16 ]

sue.loverso do you know whether this is still an issue? or if it was fixed for 3.2.0?

Comment by Jonathan Abrahams [ 20/Nov/15 ]

dan@10gen.com I do have the logs and db files, and I will attach to this ticket.

Comment by Daniel Pasette (Inactive) [ 20/Nov/15 ]

Do you have the files in the dbpath?

Comment by Jonathan Abrahams [ 20/Nov/15 ]

I saw this problem during an FSM stress run. The mongod process failed because it ran out of open files.

2015-11-20T11:31:15.371-0500 E STORAGE  [conn4000] WiredTiger (24) [1448037075:371921][7081:0x7f566f4fd700], WT_SESSION.create: /data/db/index-45844--5131263760495339475.wt: Too many open files
2015-11-20T11:31:16.375-0500 E STORAGE  [thread2] WiredTiger (24) [1448037076:375083][7081:0x7f568073b700], file:WiredTiger.wt, WT_SESSION.checkpoint: WiredTiger.turtle: fopen: Too many open files
2015-11-20T11:31:16.379-0500 E STORAGE  [thread2] WiredTiger (24) [1448037076:379217][7081:0x7f568073b700], checkpoint-server: checkpoint server error: Too many open files
2015-11-20T11:31:16.379-0500 E STORAGE  [thread2] WiredTiger (-31804) [1448037076:379257][7081:0x7f568073b700], checkpoint-server: the process must exit and restart: WT_PANIC: WiredTiger library panic
2015-11-20T11:31:16.379-0500 I -        [conn4000] Fatal Assertion 28559
2015-11-20T11:31:16.379-0500 I -        [thread2] Fatal Assertion 28558
2015-11-20T11:31:16.379-0500 I -        [thread2]
 
***aborting after fassert() failure
 
 
2015-11-20T11:31:16.379-0500 I -        [conn4000]
 
***aborting after fassert() failure
 
 
2015-11-20T11:31:16.385-0500 F -        [thread2] Got signal: 6 (Aborted).
 
 0x12e1172 0x12e02d9 0x12e0ae2 0x7f568b88e100 0x7f568b4f35f7 0x7f568b4f4ce8 0x126bb32 0x106df03 0x1a172ac 0x1a1744d 0x1a17834 0x19a740b 0x7f568b886dc5 0x7f568b5b421d
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"EE1172"},{"b":"400000","o":"EE02D9"},{"b":"400000","o":"EE0AE2"},{"b":"7F568B87F000","o":"F100"},{"b":"7F568B4BE000","o":"355F7"},{"b":"7F568B4BE000","o":"36CE8"},{"b":"400000","o":"E6BB32"},{"b":"400000","o":"C6DF03"},{"b":"400000","o":"16172AC"},{"b":"400000","o":"161744D"},{"b":"400000","o":"1617834"},{"b":"400000","o":"15A740B"},{"b":"7F568B87F000","o":"7DC5"},{"b":"7F568B4BE000","o":"F621D"}],"processInfo":{ "mongodbVersion" : "3.2.0-rc3", "gitVersion" : "bd58ea2ba5d17b960981990bb97cab133d7e90ed", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "3.10.0-229.el7.x86_64", "version" : "#1 SMP Thu Jan 29 18:37:38 EST 2015", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "41B4B2C5C9E4469F918F360FC0E62785B44FA9A8" }, { "b" : "7FFFA365D000", "elfType" : 3, "buildId" : "0B377E1B214C21598532E65CA1F8780D05ADBEBB" }, { "b" : "7F568CAAF000", "path" : "/lib64/libssl.so.10", "elfType" : 3, "buildId" : "8DB4545998776514159031B754BB67F7F396F83A" }, { "b" : "7F568C6C8000", "path" : "/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "038F79F7C3F6E60C29184B8E70D0B1E62525D64D" }, { "b" : "7F568C4C0000", "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "CCC119FE8F4D8D262AFC67DDC36C2F266F30586F" }, { "b" : "7F568C2BC000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "DECA756A631284EEE020E3AE1CADCE9BDFB9914D" }, { "b" : "7F568BFB3000", "path" : "/lib64/libstdc++.so.6", "elfType" : 3, "buildId" : "74AAD82C77F6D211BD9E4DA27BF4F62A0D938C4B" }, { "b" : "7F568BCB1000", "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "F129A8EBE4CD7D042019BFACD9A58677C18F2AE3" }, { "b" : "7F568BA9B000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "97D5E2F5739B715C3A0EC9F95F7336E232346CA8" }, { "b" : "7F568B87F000", "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "0D3AFB2828490FDE7B2BE3BEEBACBDB4D94A999F" }, { "b" : "7F568B4BE000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "508F7FF5F9802FE2FE52FD9F5D19D0D732D55A56" }, { "b" : "7F568CD1C000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "09E1BB4D034C7263810A41100647068858A7ECB6" }, { "b" : "7F568B272000", "path" : "/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "8AB5682155DE13D0916B984306B4E044E216B2EB" }, { "b" : "7F568AF8D000", "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "BE8968836D439581B2816CE3827642FCF4B8BF4A" }, { "b" : "7F568AD89000", "path" : "/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "B25574847B066A26CD593C8101DF6779898FF2C2" }, { "b" : "7F568AB57000", "path" : "/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "F5784ED7E64118BAFE898DBF178DC9E37CBDA4AA" }, { "b" : "7F568A941000", "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "FC37913FB197B822BCDBF3697D061E248698CEC1" }, { "b" : "7F568A732000", "path" : "/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "4BBED12CFDC9647C8771A4B897E0D5A4F217ED7C" }, { "b" : "7F568A52E000", "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "8CA73C16CFEB9A8B5660015B9223B09F87041CAD" }, { "b" : "7F568A314000", "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "262D22F4D416B270C2AA70EF5FB2100E12796A3F" }, { "b" : "7F568A0EF000", "path" : "/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "5062031216B995004A297D555D834C0109F7598C" }, { "b" : "7F5689E8E000", "path" : "/lib64/libpcre.so.1", "elfType" : 3, "buildId" : "8E3819A80BF876382A6F0CB2A08F82F1742EE8DB" }, { "b" : "7F5689C69000", "path" : "/lib64/liblzma.so.5", "elfType" : 3, "buildId" : "61D7D46225E85F144221E1424B87FBF3CB2B9D3F" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x32) [0x12e1172]
 mongod(+0xEE02D9) [0x12e02d9]
 mongod(+0xEE0AE2) [0x12e0ae2]
 libpthread.so.0(+0xF100) [0x7f568b88e100]
 libc.so.6(gsignal+0x37) [0x7f568b4f35f7]
 libc.so.6(abort+0x148) [0x7f568b4f4ce8]
 mongod(_ZN5mongo13fassertFailedEi+0x82) [0x126bb32]
 mongod(+0xC6DF03) [0x106df03]
 mongod(__wt_eventv+0x40C) [0x1a172ac]
 mongod(__wt_err+0x8D) [0x1a1744d]
 mongod(__wt_panic+0x24) [0x1a17834]
 mongod(+0x15A740B) [0x19a740b]
 libpthread.so.0(+0x7DC5) [0x7f568b886dc5]
 libc.so.6(clone+0x6D) [0x7f568b5b421d]
-----  END BACKTRACE  -----

After restarting the process the following error was noted:

2015-11-20T12:25:28.396-0500 E STORAGE  [initandlisten] WiredTiger (-31803) [1448040328:396336][29606:0x7f1b8e262dc0], file:collection-45193--5131263760495339475.wt, txn-recover: Operation failed during recovery: WT_NOTFOUND: item not found
2015-11-20T12:25:28.396-0500 E STORAGE  [initandlisten] WiredTiger (-31803) [1448040328:396533][29606:0x7f1b8e262dc0], file:collection-45193--5131263760495339475.wt, txn-recover: Recovery failed: WT_NOTFOUND: item not found
2015-11-20T12:25:28.476-0500 I -        [initandlisten] Assertion: 28595:-31803: WT_NOTFOUND: item not found

Comment by Michael Cahill (Inactive) [ 16/Nov/15 ]

jonathan.abrahams, I think all of the activity happened in SERVER-19274. Can we close this one?

Comment by Jonathan Abrahams [ 05/Aug/15 ]

To reassemble:

cat wiredTiger-afterrecovery.tar-* > wiredTiger-afterrecovery.tar

Comment by Jonathan Abrahams [ 05/Aug/15 ]

Similar problem as seen in SERVER-19024.

Generated at Thu Feb 08 03:52:02 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.