[SERVER-27894] WiredTiger on Btrfs - failed to read 4096 bytes at offset 0: Input/output error Created: 02/Feb/17  Updated: 13/Aug/18  Resolved: 22/Mar/17

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: 3.2.6, 3.2.8, 3.2.11, 3.4.1
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Michael Göhler Assignee: Mark Agarunov
Resolution: Incomplete Votes: 0
Labels: envm, rns, wtc
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File mongodb-diagnostic-data-20170424.tar.gz     File mongodb-diagnostic-data.tar.gz    
Operating System: Linux
Participants:

 Description   

I'm running MongoDB 3.4.1 on our CI systems (CentOS 7.3) in a Btrfs subvolume. Every now and then (2-4 weeks) the following error leaves the database in a non-repairable state. There are no messages in syslog/dmesg showing any fs errors during this time.

2017-01-31T16:38:01.118+0000 E STORAGE  [conn9] WiredTiger error (5) [1485880681:118369][63127:0x7ff17bbe1700], file:index-5-8173061028522095950.wt, WT_SESSION.open_cursor: /opt/mongo/db/index-5-81730610285220
95950.wt: handle-read: pread: failed to read 4096 bytes at offset 0: Input/output error
2017-01-31T16:38:01.118+0000 I -        [conn9] Invariant failure: ret resulted in status UnknownError: 5: Input/output error at src/mongo/db/storage/wiredtiger/wiredtiger_session_cache.cpp 95
2017-01-31T16:38:01.118+0000 I -        [conn9]
 
***aborting after invariant() failure
 
 
2017-01-31T16:38:01.137+0000 F -        [conn9] Got signal: 6 (Aborted).
 
 0x7ff188124351 0x7ff188123449 0x7ff18812392d 0x7ff1857fa370 0x7ff18545f1d7 0x7ff1854608c8 0x7ff1873b887f 0x7ff187e37766 0x7ff187e3547c 0x7ff187e0fd57 0x7ff1878112c8 0x7ff187744048 0x7ff18774456f 0x7ff187754cb
3 0x7ff18773305e 0x7ff187754cb3 0x7ff187748d26 0x7ff187754cb3 0x7ff187a580ca 0x7ff187a589eb 0x7ff18763face 0x7ff187616847 0x7ff187617be0 0x7ff187c2e54d 0x7ff1878348f2 0x7ff1878368f6 0x7ff187426f1d 0x7ff1874278
5d 0x7ff188089822 0x7ff1857f2dc5 0x7ff18552173d
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"7FF186BA5000","o":"157F351","s":"_ZN5mongo15printStackTraceERSo"},{"b":"7FF186BA5000","o":"157E449"},{"b":"7FF186BA5000","o":"157E92D"},{"b":"7FF1857EB000","o":"F370"},{"b":"7FF18542A000","
o":"351D7","s":"gsignal"},{"b":"7FF18542A000","o":"368C8","s":"abort"},{"b":"7FF186BA5000","o":"81387F","s":"_ZN5mongo25fassertFailedWithLocationEiPKcj"},{"b":"7FF186BA5000","o":"1292766","s":"_ZN5mongo17Wired
TigerSession9getCursorERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEmb"},{"b":"7FF186BA5000","o":"129047C","s":"_ZN5mongo16WiredTigerCursorC1ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEmbPN
S_16OperationContextE"},{"b":"7FF186BA5000","o":"126AD57","s":"_ZNK5mongo21WiredTigerIndexUnique9newCursorEPNS_16OperationContextEb"},{"b":"7FF186BA5000","o":"C6C2C8","s":"_ZNK5mongo17IndexAccessMethod9newCurs
orEPNS_16OperationContextEb"},{"b":"7FF186BA5000","o":"B9F048","s":"_ZN5mongo9IndexScan13initIndexScanEv"},{"b":"7FF186BA5000","o":"B9F56F","s":"_ZN5mongo9IndexScan6doWorkEPm"},{"b":"7FF186BA5000","o":"BAFCB3"
,"s":"_ZN5mongo9PlanStage4workEPm"},{"b":"7FF186BA5000","o":"B8E05E","s":"_ZN5mongo10FetchStage6doWorkEPm"},{"b":"7FF186BA5000","o":"BAFCB3","s":"_ZN5mongo9PlanStage4workEPm"},{"b":"7FF186BA5000","o":"BA3D26",
"s":"_ZN5mongo10LimitStage6doWorkEPm"},{"b":"7FF186BA5000","o":"BAFCB3","s":"_ZN5mongo9PlanStage4workEPm"},{"b":"7FF186BA5000","o":"EB30CA","s":"_ZN5mongo12PlanExecutor11getNextImplEPNS_11SnapshottedINS_7BSONO
bjEEEPNS_8RecordIdE"},{"b":"7FF186BA5000","o":"EB39EB","s":"_ZN5mongo12PlanExecutor7getNextEPNS_7BSONObjEPNS_8RecordIdE"},{"b":"7FF186BA5000","o":"A9AACE","s":"_ZN5mongo7FindCmd3runEPNS_16OperationContextERKNS
t7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERNS_7BSONObjEiRS8_RNS_14BSONObjBuilderE"},{"b":"7FF186BA5000","o":"A71847","s":"_ZN5mongo7Command3runEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS3_2
1ReplyBuilderInterfaceE"},{"b":"7FF186BA5000","o":"A72BE0","s":"_ZN5mongo7Command11execCommandEPNS_16OperationContextEPS0_RKNS_3rpc16RequestInterfaceEPNS4_21ReplyBuilderInterfaceE"},{"b":"7FF186BA5000","o":"10
8954D","s":"_ZN5mongo11runCommandsEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS2_21ReplyBuilderInterfaceE"},{"b":"7FF186BA5000","o":"C8F8F2"},{"b":"7FF186BA5000","o":"C918F6","s":"_ZN5mongo16assemble
ResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE"},{"b":"7FF186BA5000","o":"881F1D","s":"_ZN5mongo23ServiceEntryPointMongod12_sessionLoopERKSt10shared_ptrINS_9transport7Session
EE"},{"b":"7FF186BA5000","o":"88285D"},{"b":"7FF186BA5000","o":"14E4822"},{"b":"7FF1857EB000","o":"7DC5"},{"b":"7FF18542A000","o":"F773D","s":"clone"}],"processInfo":{ "mongodbVersion" : "3.4.1", "gitVersion"
: "5e103c4f5583e2566a45d740225dc250baacfbd7", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "3.10.0-327.18.2.el7.x86_64", "version" : "#1 SMP Thu May 12 11:03:55 UTC 2016", "machine" : "
x86_64" }, "somap" : [ { "b" : "7FF186BA5000", "elfType" : 3, "buildId" : "1CFAD184A19D237BE2A54C39E15942B7F1E1F618" }, { "b" : "7FFCB2666000", "elfType" : 3, "buildId" : "627B075D566CF4BFF68497DAB7DF9B024F8E5
A83" }, { "b" : "7FF186715000", "path" : "/lib64/libssl.so.10", "elfType" : 3, "buildId" : "D0018CA5E24522ED0DC1844556FA8DBC4B39D5C3" }, { "b" : "7FF18632B000", "path" : "/lib64/libcrypto.so.10", "elfType" : 3
, "buildId" : "8756D2315BF50F8610875B1AFF128198FB9D202D" }, { "b" : "7FF186123000", "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "82E77ADE22BC9FFF8D3458BD37331E7EDF174C28" }, { "b" : "7FF185F1F000"
, "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "C5F560504E1AF52E29679C3B52FF11121015D6BB" }, { "b" : "7FF185C1D000", "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "721C7CC9488EFA25F83B48A
F713AB27DBE48EF3E" }, { "b" : "7FF185A07000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "408B46E291B2D4C9612E27C0509D165D7E186D40" }, { "b" : "7FF1857EB000", "path" : "/lib64/libpthread.so.0"
, "elfType" : 3, "buildId" : "C3DEB1FA27CD0C1C3CC575B944ABACBA0698B0F2" }, { "b" : "7FF18542A000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "8B2C421716985B927AA0CAF2A05D0B1F452367F7" }, { "b" :
"7FF186983000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "8F3E366E2DB73C330A3791DEAE31AE9579099B44" }, { "b" : "7FF1851DC000", "path" : "/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "A2499C359AA179EE23324ED949C0E508E4434F10" }, { "b" : "7FF184EF5000", "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "E09A34D9083DC6FEAF7018C09D55631DEEE2836D" }, { "b" : "7FF184CF1000", "path" : "/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "BF54B7C8932E450769FBBB8B18864D1DD70BBC67" }, { "b" : "7FF184ABF000", "path" : "/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "BF8F00D7CB849ADB0B7A4703BC7B8D66AEE6A49C" }, { "b" : "7FF1848A9000", "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "EA8E45DC8E395CC5E26890470112D97A1F1E0B65" }, { "b" : "7FF18469A000", "path" : "/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "1E7A92FDD6FB3871DA97F4BCA2E147E72B6B6E1F" }, { "b" : "7FF184496000", "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "2E01D5AC08C1280D013AAB96B292AC58BC30A263" }, { "b" : "7FF18427C000", "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "FE7AE845A123A3DFC0FDC2408BCBC2BA8B61B158" }, { "b" : "7FF184055000", "path" : "/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "76687CA31A406854DF3BCF8D03055656F56E6892" }, { "b" : "7FF183DF4000", "path" : "/lib64/libpcre.so.1", "elfType" : 3, "buildId" : "AE64AA461A26E01F60408013D361749D56DD0AE1" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x41) [0x7ff188124351]
 mongod(+0x157E449) [0x7ff188123449]
 mongod(+0x157E92D) [0x7ff18812392d]
 libpthread.so.0(+0xF370) [0x7ff1857fa370]
 libc.so.6(gsignal+0x37) [0x7ff18545f1d7]
 libc.so.6(abort+0x148) [0x7ff1854608c8]
 mongod(_ZN5mongo25fassertFailedWithLocationEiPKcj+0x0) [0x7ff1873b887f]
 mongod(_ZN5mongo17WiredTigerSession9getCursorERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEmb+0x106) [0x7ff187e37766]
 mongod(_ZN5mongo16WiredTigerCursorC1ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEmbPNS_16OperationContextE+0x4C) [0x7ff187e3547c]
 mongod(_ZNK5mongo21WiredTigerIndexUnique9newCursorEPNS_16OperationContextEb+0x157) [0x7ff187e0fd57]
 mongod(_ZNK5mongo17IndexAccessMethod9newCursorEPNS_16OperationContextEb+0x28) [0x7ff1878112c8]
 mongod(_ZN5mongo9IndexScan13initIndexScanEv+0x58) [0x7ff187744048]
 mongod(_ZN5mongo9IndexScan6doWorkEPm+0x14F) [0x7ff18774456f]
 mongod(_ZN5mongo9PlanStage4workEPm+0x63) [0x7ff187754cb3]
 mongod(_ZN5mongo10FetchStage6doWorkEPm+0x29E) [0x7ff18773305e]
 mongod(_ZN5mongo9PlanStage4workEPm+0x63) [0x7ff187754cb3]
 mongod(_ZN5mongo10LimitStage6doWorkEPm+0x76) [0x7ff187748d26]
 mongod(_ZN5mongo9PlanStage4workEPm+0x63) [0x7ff187754cb3]
 mongod(_ZN5mongo12PlanExecutor11getNextImplEPNS_11SnapshottedINS_7BSONObjEEEPNS_8RecordIdE+0x19A) [0x7ff187a580ca]
 mongod(_ZN5mongo12PlanExecutor7getNextEPNS_7BSONObjEPNS_8RecordIdE+0x4B) [0x7ff187a589eb]
 mongod(_ZN5mongo7FindCmd3runEPNS_16OperationContextERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERNS_7BSONObjEiRS8_RNS_14BSONObjBuilderE+0x104E) [0x7ff18763face]
 mongod(_ZN5mongo7Command3runEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS3_21ReplyBuilderInterfaceE+0x527) [0x7ff187616847]
 mongod(_ZN5mongo7Command11execCommandEPNS_16OperationContextEPS0_RKNS_3rpc16RequestInterfaceEPNS4_21ReplyBuilderInterfaceE+0x9E0) [0x7ff187617be0]
 mongod(_ZN5mongo11runCommandsEPNS_16OperationContextERKNS_3rpc16RequestInterfaceEPNS2_21ReplyBuilderInterfaceE+0x26D) [0x7ff187c2e54d]
 mongod(+0xC8F8F2) [0x7ff1878348f2]
 mongod(_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0x766) [0x7ff1878368f6]
 mongod(_ZN5mongo23ServiceEntryPointMongod12_sessionLoopERKSt10shared_ptrINS_9transport7SessionEE+0x1ED) [0x7ff187426f1d]
 mongod(+0x88285D) [0x7ff18742785d]
 mongod(+0x14E4822) [0x7ff188089822]
 libpthread.so.0(+0x7DC5) [0x7ff1857f2dc5]
 libc.so.6(clone+0x6D) [0x7ff18552173d]

When we try to restart it affects other files (too?).

2017-02-01T17:31:51.215+0000 I CONTROL  [main] ***** SERVER RESTARTED *****
2017-02-01T17:31:51.222+0000 I CONTROL  [initandlisten] MongoDB starting : pid=21524 port=27017 dbpath=/opt/mongo/db 64-bit host=outmatch-dev
2017-02-01T17:31:51.222+0000 I CONTROL  [initandlisten] db version v3.4.1
2017-02-01T17:31:51.222+0000 I CONTROL  [initandlisten] git version: 5e103c4f5583e2566a45d740225dc250baacfbd7
2017-02-01T17:31:51.222+0000 I CONTROL  [initandlisten] OpenSSL version: OpenSSL 1.0.1e-fips 11 Feb 2013
2017-02-01T17:31:51.222+0000 I CONTROL  [initandlisten] allocator: tcmalloc
2017-02-01T17:31:51.222+0000 I CONTROL  [initandlisten] modules: none
2017-02-01T17:31:51.222+0000 I CONTROL  [initandlisten] build environment:
2017-02-01T17:31:51.222+0000 I CONTROL  [initandlisten]     distmod: rhel70
2017-02-01T17:31:51.222+0000 I CONTROL  [initandlisten]     distarch: x86_64
2017-02-01T17:31:51.222+0000 I CONTROL  [initandlisten]     target_arch: x86_64
2017-02-01T17:31:51.222+0000 I CONTROL  [initandlisten] options: { config: "/opt/mongo/conf/mongod.conf", net: { bindIp: "127.0.0.1", port: 27017 }, processManagement: { fork: false, pidFilePath: "/var/run/mongod/mongod.pid" }, security: { authorization: "enabled" }, storage: { dbPath: "/opt/mongo/db", engine: "wiredTiger", wiredTiger: { collectionConfig: { blockCompressor: "zlib" } } }, systemLog: { destination: "file", logAppend: true, path: "/opt/mongo/log/mongod.log" } }
2017-02-01T17:31:51.222+0000 W -        [initandlisten] Detected unclean shutdown - /opt/mongo/db/mongod.lock is not empty.
2017-02-01T17:31:51.271+0000 W STORAGE  [initandlisten] Recovering data from the last clean checkpoint.
2017-02-01T17:31:51.271+0000 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=3476M,session_max=20000,eviction=(threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),
2017-02-01T17:31:51.305+0000 E STORAGE  [initandlisten] WiredTiger error (5) [1485970311:305130][21524:0x7fc41fde8dc0], file:WiredTiger.wt, connection: /opt/mongo/db/WiredTiger.wt: handle-read: pread: failed to read 4096 bytes at offset 0: Input/output error
2017-02-01T17:31:51.306+0000 I -        [initandlisten] Assertion: 28595:5: Input/output error src/mongo/db/storage/wiredtiger/wiredtiger_kv_engine.cpp 267
2017-02-01T17:31:51.311+0000 I STORAGE  [initandlisten] exception in initAndListen: 28595 5: Input/output error, terminating
2017-02-01T17:31:51.311+0000 I NETWORK  [initandlisten] shutdown: going to close listening sockets...
2017-02-01T17:31:51.311+0000 I NETWORK  [initandlisten] removing socket file: /tmp/mongodb-27017.sock
2017-02-01T17:31:51.311+0000 I NETWORK  [initandlisten] shutdown: going to flush diaglog...
2017-02-01T17:31:51.311+0000 I CONTROL  [initandlisten] now exiting
2017-02-01T17:31:51.311+0000 I CONTROL  [initandlisten] shutting down with code:100



 Comments   
Comment by Satish Anupindi [ 13/Sep/17 ]

Not sure if creating a new ticket is appropriate, but we're seeing the same exact issue in one of our deployments in a staging (pre-prod) environment. The MongoDB in question is 3.2.4 and the data mounted on a XFS filesystem
When I do a directory listing on the diagnostic data, I get the below :

Dir listing of diagnostic data

-bash-4.2$ ls -rlt
ls: cannot access metrics.2017-06-07T12-08-57Z-00000: Structure needs cleaning
ls: cannot access metrics.2017-06-07T12-10-44Z-00000: Structure needs cleaning
ls: cannot access metrics.2017-06-07T12-34-41Z-00000: Structure needs cleaning
ls: cannot access metrics.2017-06-07T13-14-51Z-00000: Structure needs cleaning
ls: cannot access metrics.2017-06-07T13-00-46Z-00000: Structure needs cleaning
ls: cannot access metrics.2017-06-07T15-09-14Z-00000: Structure needs cleaning
ls: cannot access metrics.2017-06-07T12-07-34Z-00000: Structure needs cleaning
ls: cannot access metrics.2017-06-07T12-09-38Z-00000: Structure needs cleaning
ls: cannot access metrics.2017-06-07T12-21-40Z-00000: Structure needs cleaning
ls: cannot access metrics.2017-06-07T13-15-34Z-00000: Structure needs cleaning
ls: cannot access metrics.2017-06-07T13-29-45Z-00000: Structure needs cleaning
ls: cannot access metrics.2017-06-07T13-49-39Z-00000: Structure needs cleaning
ls: cannot access metrics.2017-06-07T14-04-44Z-00000: Structure needs cleaning
ls: cannot access metrics.2017-06-07T14-19-17Z-00000: Structure needs cleaning
ls: cannot access metrics.2017-06-07T14-34-15Z-00000: Structure needs cleaning
total 86276
?????????? ? ? ? ? ? metrics.2017-06-07T15-09-14Z-00000
?????????? ? ? ? ? ? metrics.2017-06-07T14-34-15Z-00000
?????????? ? ? ? ? ? metrics.2017-06-07T14-19-17Z-00000
?????????? ? ? ? ? ? metrics.2017-06-07T14-04-44Z-00000
?????????? ? ? ? ? ? metrics.2017-06-07T13-49-39Z-00000
?????????? ? ? ? ? ? metrics.2017-06-07T13-29-45Z-00000
?????????? ? ? ? ? ? metrics.2017-06-07T13-15-34Z-00000
?????????? ? ? ? ? ? metrics.2017-06-07T13-14-51Z-00000
?????????? ? ? ? ? ? metrics.2017-06-07T13-00-46Z-00000
?????????? ? ? ? ? ? metrics.2017-06-07T12-34-41Z-00000
?????????? ? ? ? ? ? metrics.2017-06-07T12-21-40Z-00000
?????????? ? ? ? ? ? metrics.2017-06-07T12-10-44Z-00000
?????????? ? ? ? ? ? metrics.2017-06-07T12-09-38Z-00000
?????????? ? ? ? ? ? metrics.2017-06-07T12-08-57Z-00000
?????????? ? ? ? ? ? metrics.2017-06-07T12-07-34Z-00000
rw-rr- 1 mongod mongod 4076373 Jun 6 23:30 metrics.2017-06-07T05-44-41Z-00000
rw-rr- 1 mongod mongod 972424 Jun 6 23:38 metrics.2017-06-07T06-30-38Z-00000
rw-rr- 1 mongod mongod 446004 Jun 6 23:42 metrics.2017-06-07T06-38-44Z-00000
rw-rr- 1 mongod mongod 1611649 Jun 7 00:00 metrics.2017-06-07T06-43-02Z-00000
rw-rr- 1 mongod mongod 5405048 Jun 7 01:01 metrics.2017-06-07T07-00-35Z-00000
rw-rr- 1 mongod mongod 436294 Jun 7 01:06 metrics.2017-06-07T08-01-57Z-00000
rw-rr- 1 mongod mongod 901234 Jun 7 01:13 metrics.2017-06-07T08-07-02Z-00000
rw-rr- 1 mongod mongod 361253 Jun 7 01:17 metrics.2017-06-07T08-13-50Z-00000
rw-rr- 1 mongod mongod 1754846 Jun 7 01:36 metrics.2017-06-07T08-17-24Z-00000
rw-rr- 1 mongod mongod 3387969 Jun 7 02:11 metrics.2017-06-07T08-36-42Z-00000
rw-rr- 1 mongod mongod 1750738 Jun 7 02:29 metrics.2017-06-07T09-11-57Z-00000
rw-rr- 1 mongod mongod 305219 Jun 7 02:32 metrics.2017-06-07T09-30-00Z-00000
rw-rr- 1 mongod mongod 206123 Jun 7 02:35 metrics.2017-06-07T09-33-08Z-00000
rw-rr- 1 mongod mongod 1512191 Jun 7 02:53 metrics.2017-06-07T09-35-11Z-00000
rw-rr- 1 mongod mongod 1930268 Jun 7 03:18 metrics.2017-06-07T09-53-04Z-00000
rw-rr- 1 mongod mongod 723067 Jun 7 03:26 metrics.2017-06-07T10-18-23Z-00000
rw-rr- 1 mongod mongod 2200563 Jun 7 03:49 metrics.2017-06-07T10-26-58Z-00000
rw-rr- 1 mongod mongod 23361 Jun 7 03:50 metrics.2017-06-07T10-50-01Z-00000
rw-rr- 1 mongod mongod 219294 Jun 7 03:52 metrics.2017-06-07T10-50-14Z-00000
rw-rr- 1 mongod mongod 86601 Jun 7 03:53 metrics.2017-06-07T10-52-29Z-00000
rw-rr- 1 mongod mongod 1587751 Jun 7 04:08 metrics.2017-06-07T10-53-28Z-00000
rw-rr- 1 mongod mongod 59033 Jun 7 04:08 metrics.2017-06-07T11-08-09Z-00000
rw-rr- 1 mongod mongod 530465 Jun 7 04:16 metrics.2017-06-07T11-09-00Z-00000
rw-rr- 1 mongod mongod 1373056 Jun 7 04:27 metrics.2017-06-07T11-17-28Z-00000
rw-rr- 1 mongod mongod 1625990 Jun 7 04:44 metrics.2017-06-07T11-27-29Z-00000
rw-rr- 1 mongod mongod 1442684 Jun 7 04:59 metrics.2017-06-07T11-44-13Z-00000
rw-rr- 1 mongod mongod 592774 Jun 7 05:07 metrics.2017-06-07T11-59-43Z-00000
rw-rr- 1 mongod mongod 1574963 Jun 7 06:48 metrics.2017-06-07T13-33-44Z-00000
rw-rr- 1 mongod mongod 85711 Jun 7 06:49 metrics.2017-06-07T13-48-54Z-00000
rw-rr- 1 mongod mongod 840177 Jun 7 07:34 metrics.2017-06-07T14-26-08Z-00000
rw-rr- 1 mongod mongod 877237 Jun 7 07:55 metrics.2017-06-07T14-46-26Z-00000
rw-rr- 1 mongod mongod 1405769 Jun 7 08:09 metrics.2017-06-07T14-55-35Z-00000
rw-rr- 1 mongod mongod 10476096 Jun 9 21:37 metrics.2017-06-07T21-25-28Z-00000
rw-rr- 1 mongod mongod 10479516 Jun 12 05:37 metrics.2017-06-10T04-37-30Z-00000
rw-rr- 1 mongod mongod 10479577 Jun 14 13:34 metrics.2017-06-12T12-37-30Z-00000
rw-rr- 1 mongod mongod 10476489 Jun 16 20:49 metrics.2017-06-14T20-34-31Z-00000
rw-rr- 1 mongod mongod 6044214 Jun 18 03:46 metrics.2017-06-17T03-49-31Z-00000

2017-09-13T11:54:46.640-0700 I CONTROL  [main] ***** SERVER RESTARTED *****
2017-09-13T11:54:46.693-0700 I CONTROL  [initandlisten] MongoDB starting : pid=12137 port=27017 dbpath=//data//mongodb//pqmcl136 64-bit host=lpqosp
db00335.phx.aexp.com
2017-09-13T11:54:46.693-0700 I CONTROL  [initandlisten] db version v3.2.4
2017-09-13T11:54:46.693-0700 I CONTROL  [initandlisten] git version: e2ee9ffcf9f5a94fad76802e28cc978718bb7a30
2017-09-13T11:54:46.693-0700 I CONTROL  [initandlisten] OpenSSL version: OpenSSL 1.0.1e-fips 11 Feb 2013
2017-09-13T11:54:46.693-0700 I CONTROL  [initandlisten] allocator: tcmalloc
2017-09-13T11:54:46.693-0700 I CONTROL  [initandlisten] modules: enterprise
2017-09-13T11:54:46.693-0700 I CONTROL  [initandlisten] build environment:
2017-09-13T11:54:46.693-0700 I CONTROL  [initandlisten]     distmod: rhel70
2017-09-13T11:54:46.693-0700 I CONTROL  [initandlisten]     distarch: x86_64
2017-09-13T11:54:46.693-0700 I CONTROL  [initandlisten]     target_arch: x86_64
2017-09-13T11:54:46.693-0700 I CONTROL  [initandlisten] options: { config: "/data/mongodb/pqmcl136/automation-mongod.conf", net: { port: 27017 }, p
rocessManagement: { fork: true }, replication: { oplogSizeMB: 12288, replSetName: "pqmcl136" }, security: { authorization: "enabled", keyFile: "/va
r/lib/mongodb-mms-automation/keyfile" }, storage: { dbPath: "//data//mongodb//pqmcl136", engine: "wiredTiger" }, systemLog: { destination: "file",
logAppend: true, path: "//logs//mongodb//pqmcl136//syslog//mongod.log" } }
2017-09-13T11:54:46.713-0700 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=3G,session_max=20000,eviction=(threads_max=4),con
fig_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(w
ait=60,log_size=2GB),statistics_log=(wait=0),
2017-09-13T11:54:47.449-0700 E STORAGE  [initandlisten] WiredTiger (5) [1505328887:449363][12137:0x7ff3c9f25dc0], file:collection-7-698599304664660
5445.wt, WT_SESSION.open_cursor: collection-7-6985993046646605445.wt read error: failed to read 4096 bytes at offset 0: Input/output error
2017-09-13T11:54:47.449-0700 I -        [initandlisten] Invariant failure: ret resulted in status UnknownError: 5: Input/output error at src/mongo/
db/storage/wiredtiger/wiredtiger_session_cache.cpp 74
2017-09-13T11:54:47.480-0700 I CONTROL  [initandlisten]
 0x1362052 0x1300388 0x12ed26d 0x10e2214 0x10e0b80 0x10dce62 0x10dbd3e 0x10cdf1b 0x1032a24 0x1037ca2 0x10cc908 0xff9488 0x9a81fb 0x95fe0d 0x7ff3c6d
a0b35 0x9a5b5d
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"F62052","s":"_ZN5mongo15printStackTraceERSo"},{"b":"400000","o":"F00388","s":"_ZN5mongo10logContextEPKc"},{"b":"40
0000","o":"EED26D","s":"_ZN5mongo17invariantOKFailedEPKcRKNS_6StatusES1_j"},{"b":"400000","o":"CE2214","s":"_ZN5mongo17WiredTigerSession9getCursorE
RKSsmb"},{"b":"400000","o":"CE0B80","s":"_ZN5mongo16WiredTigerCursorC1ERKSsmbPNS_16OperationContextE"},{"b":"400000","o":"CDCE62","s":"_ZN5mongo21W
iredTigerRecordStore6CursorC1EPNS_16OperationContextERKS0_b"},{"b":"400000","o":"CDBD3E","s":"_ZN5mongo21WiredTigerRecordStoreC1EPNS_16OperationCon
textENS_10StringDataES3_SsbbllPNS_14CappedCallbackEPNS_20WiredTigerSizeStorerE"},{"b":"400000","o":"CCDF1B","s":"_ZN5mongo18WiredTigerKVEngine14get
RecordStoreEPNS_16OperationContextENS_10StringDataES3_RKNS_17CollectionOptionsE"},{"b":"400000","o":"C32A24","s":"_ZN5mongo22KVDatabaseCatalogEntry
14initCollectionEPNS_16OperationContextERKSsb"},{"b":"400000","o":"C37CA2","s":"_ZN5mongo15KVStorageEngineC1EPNS_8KVEngineERKNS_22KVStorageEngineOp
tionsE"},{"b":"400000","o":"CCC908"},{"b":"400000","o":"BF9488","s":"_ZN5mongo20ServiceContextMongoD29initializeGlobalStorageEngineEv"},{"b":"40000
0","o":"5A81FB","s":"_ZN5mongo13initAndListenEi"},{"b":"400000","o":"55FE0D","s":"main"},{"b":"7FF3C6D7F000","o":"21B35","s":"__libc_start_main"},{
"b":"400000","o":"5A5B5D"}],"processInfo":{ "mongodbVersion" : "3.2.4", "gitVersion" : "e2ee9ffcf9f5a94fad76802e28cc978718bb7a30", "compiledModules
" : [ "enterprise" ], "uname" : { "sysname" : "Linux", "release" : "3.10.0-514.26.2.el7.x86_64", "version" : "#1 SMP Fri Jun 30 05:26:04 UTC 2017",
 "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "A499C4D58ACDB756669678D8B234E5D47C7AD201" }, { "b" : "7FFDFCCFC0
00", "elfType" : 3, "buildId" : "10AFBB931D4285498F9E31C06BD3EF6191C30A24" }, { "b" : "7FF3C9B06000", "path" : "/lib64/libsasl2.so.3", "elfType" :
3, "buildId" : "39AD6F37E45CFD100A0EA21CB11EAFC1A8738CAE" }, { "b" : "7FF3C98B8000", "path" : "/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId
" : "98F8B08BB984B5B3366F928C26585489625B829D" }, { "b" : "7FF3C93FF000", "path" : "/lib64/libnetsnmpmibs.so.31", "elfType" : 3, "buildId" : "16C0F
6686B750AD238BE7D2343832A9315CDBC9C" }, { "b" : "7FF3C91F0000", "path" : "/lib64/libsensors.so.4", "elfType" : 3, "buildId" : "A3E060E653BD689628E2
2908EC3D6B6A94C6492F" }, { "b" : "7FF3C8FEC000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "9939B83E89981591ACBD6F85AE2020349A169C52
" }, { "b" : "7FF3C8D85000", "path" : "/lib64/librpm.so.3", "elfType" : 3, "buildId" : "10A8FE4C8C97A9268614C335E62B3C61559A742F" }, { "b" : "7FF3C
8B58000", "path" : "/lib64/librpmio.so.3", "elfType" : 3, "buildId" : "43B2D54108317F8728A5E23C5DE560B55BA43A7F" }, { "b" : "7FF3C88E9000", "path" : "/lib64/libnetsnmpagent.so.31", "elfType" : 3, "buildId" : "1DAA4932AC416F5984B2186B44EDBBABF575E192" }, { "b" : "7FF3C86DE000", "path" : "/lib64/libwrap.so.0", "elfType" : 3, "buildId" : "F751FEF2271CC88C55BECDA85D9531C286BF5812" }, { "b" : "7FF3C83DC000", "path" : "/lib64/libnetsnmp.so.31", "elfType" : 3, "buildId" : "4FBB4BB00F3E6501470A84C04E76BC0E6B5EB84B" }, { "b" : "7FF3C816E000", "path" : "/lib64/libssl.so.10", "elfType" : 3, "buildId" : "8B4A33094EA982F927F4D5F84059EB073A203DB5" }, { "b" : "7FF3C7D84000", "path" : "/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "7455CBD6F62579DA1598F1DC123F039F25466C90" }, { "b" : "7FF3C7A82000", "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "8EAEED1A5C217B2F9F66C3AD:EB53B1BCD526F65A" }, { "b" : "7FF3C787A000", "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "2AC501524AB01C3A36053233524A9B7BDF06D2E3" }, { "b" : "7FF3C7572000", "path" : "/lib64/libstdc++.so.6", "elfType" : 3, "buildId" : "348CA28355FB67351EA0CC37170BB83FA008CFEC" }, { "b" : "7FF3C735C000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "E7A44AE9AAA39B04F12503BB3B170860F0EB38E2" }, { "b" : "7FF3C7140000", "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "540DA7E8674CC3C696324B7D080703E9F71CFC9D" }, { "b" : "7FF3C6D7F000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "8D494FF4CF494C73101E328124077F67D553B160" }, { "b" : "7FF3C9D23000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "0874508AA13D28E3F48637C1D5BF067BA8D9FD3A" }, { "b" : "7FF3C6B65000", "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "59754FDA02AEF392C878E5E714127F5E2C68A891" }, { "b" : "7FF3C692E000", "path" : "/lib64/libcrypt.so.1", "elfType" : 3, "buildId" : "CC0CEF5BA844FB95E25459DC48D3444595C11445" }, { "b" : "7FF3C6647000", "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "2DC7DC7094E4E26951D7B63C1F433EFE5EA06006" }, { "b" : "7FF3C6415000", "path" : "/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "424C66AD190AA9C519971E461C7BC3ABFBF17E51" }, { "b" : "7FF3C6211000", "path" : "/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "8E6B94EAC98D4D32CA753B11E1C2CD9CE3DF3886" }, { "b" : "7FF3C6002000", "path" : "/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "3D433D76A47E97AE253C57C20E9F86266F9595A1" }, { "b" : "7FF3C5DFE000", "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "8CA73C16CFEB9A8B5660015B9223B09F87041CAD" }, { "b" : "7FF3C5A70000", "path" : "/usr/lib64/perl5/CORE/libperl.so", "elfType" : 3, "buildId" : "C4051029341E80142DC7342D8F483D9F916D457E" }, { "b" : "7FF3C5857000", "path" : "/lib64/libnsl.so.1", "elfType" : 3, "buildId" : "40B7B3C21AC8AEEB59C40FD28361EECDD2ECDA2E" }, { "b" : "7FF3C5654000", "path" : "/lib64/libutil.so.1", "elfType" : 3, "buildId" : "4F392DE64C4BC564E6E814A9C40834C019F54AB0" }, { "b" : "7FF3C532A000", "path" : "/lib64/libnss3.so", "elfType" : 3, "buildId" : "92EE446C81DAA9348C5A91C2556C947661E53761" }, { "b" : "7FF3C511A000", "path" : "/lib64/libbz2.so.1", "elfType" : 3, "buildId" : "E31FA8F847E049A7AB920BA0AAD2113C080B2CC8" }, { "b" : "7FF3C4F04000", "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "FE621E91052A9A77CC263E00A8A21C2BC0867E21" }, { "b" : "7FF3C4CEC000", "path" : "/lib64/libelf.so.1", "elfType" : 3, "buildId" : "4EC6A8B9037AF02C1A0D6A39F5DCD41CC3FF1FBB" }, { "b" : "7FF3C4AC6000", "path" : "/lib64/liblzma.so.5", "elfType" : 3, "buildId" : "66FEE65F44C6C46764C9DC4ADA60BC85AF7AE36F" }, { "b" : "7FF3C48BC000", "path" : "/lib64/libpopt.so.0", "elfType" : 3, "buildId" : "1A75F65C340B52F21E8CAF1D48C641818A008440" }, { "b" : "7FF3C4695000", "path" : "/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "4D7CA6EFC2D57A25B1B71E3450A016AD5F220429" }, { "b" : "7FF3C4490000", "path" : "/lib64/libcap.so.2", "elfType" : 3, "buildId" : "5DD36916F2F8F5E72B700162B83F56C2A38E8D16" }, { "b" : "7FF3C4287000", "path" : "/lib64/libacl.so.1", "elfType" : 3, "buildId" : "37CE682664BAA3E12B128BC320087960D687333C" }, { "b" : "7FF3C4059000", "path" : "/lib64/liblua-5.1.so", "elfType" : 3, "buildId" : "473F207910C926F2D177400F81A51CD5AAE02173" }, { "b" : "7FF3C3C9B000", "path" : "/lib64/libdb-5.3.so", "elfType" : 3, "buildId" : "7F85997ADFDF4144EEA82F07F4A24E6AA9A2CA2F" }, { "b" : "7FF3C3A98000", "path" : "/lib64/libfreebl3.so", "elfType" : 3, "buildId" : "3A99D0FE92C9030D9C6171E5490BDD3CB0C13E22" }, { "b" : "7FF3C386B000", "path" : "/lib64/libnssutil3.so", "elfType" : 3, "buildId" : "7AC7E82D066D6619565A7CCC8EA8844A6DE647A3" }, { "b" : "7FF3C3666000", "path" : "/lib64/libplc4.so", "elfType" : 3, "buildId" : "5670E2AA1CC7E163D1E38F8AAE271CD97EEDD7B3" }, { "b" : "7FF3C3462000", "path" : "/lib64/libplds4.so", "elfType" : 3, "buildId" : "F64EA169C894416678F317E4024F07E5C5C152E7" }, { "b" : "7FF3C3224000", "path" : "/lib64/libnspr4.so", "elfType" : 3, "buildId" : "6AB0C0852005E296FAC8B881F62827127BE39BBE" }, { "b" : "7FF3C2FC3000", "path" : "/lib64/libpcre.so.1", "elfType" : 3, "buildId" : "1DEC80B82143A7960489C7B7AA8DDF182D6E2BE6" }, { "b" : "7FF3C2DBE000", "path" : "/lib64/libattr.so.1", "elfType" : 3, "buildId" : "41AC3CD2413353A5E5CAFA4FF2BB4C31D315A61E" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x32) [0x1362052]
 mongod(_ZN5mongo10logContextEPKc+0x138) [0x1300388]
 mongod(_ZN5mongo17invariantOKFailedEPKcRKNS_6StatusES1_j+0xAD) [0x12ed26d]
 mongod(_ZN5mongo17WiredTigerSession9getCursorERKSsmb+0xE4) [0x10e2214]
 mongod(_ZN5mongo16WiredTigerCursorC1ERKSsmbPNS_16OperationContextE+0x50) [0x10e0b80]
 mongod(_ZN5mongo21WiredTigerRecordStore6CursorC1EPNS_16OperationContextERKS0_b+0x92) [0x10dce62]
 mongod(_ZN5mongo21WiredTigerRecordStoreC1EPNS_16OperationContextENS_10StringDataES3_SsbbllPNS_14CappedCallbackEPNS_20WiredTigerSizeStorerE+0x32E) [0x10dbd3e]
 mongod(_ZN5mongo18WiredTigerKVEngine14getRecordStoreEPNS_16OperationContextENS_10StringDataES3_RKNS_17CollectionOptionsE+0x1AB) [0x10cdf1b]
 mongod(_ZN5mongo22KVDatabaseCatalogEntry14initCollectionEPNS_16OperationContextERKSsb+0x204) [0x1032a24]
 mongod(_ZN5mongo15KVStorageEngineC1EPNS_8KVEngineERKNS_22KVStorageEngineOptionsE+0x682) [0x1037ca2]
 mongod(+0xCCC908) [0x10cc908]
 mongod(_ZN5mongo20ServiceContextMongoD29initializeGlobalStorageEngineEv+0x598) [0xff9488]
 mongod(_ZN5mongo13initAndListenEi+0x37B) [0x9a81fb]
 mongod(main+0x15D) [0x95fe0d]
 libc.so.6(__libc_start_main+0xF5) [0x7ff3c6da0b35]
 mongod(+0x5A5B5D) [0x9a5b5d]
-----  END BACKTRACE  -----
2017-09-13T11:54:47.481-0700 I -        [initandlisten]
 
***aborting after invariant() failure
 

Comment by Michael Göhler [ 24/Apr/17 ]

Attached another occurrence. This time the affected file was readable with cat. Maybe readable again, after some time?

Comment by Michael Göhler [ 18/Apr/17 ]

Hi mark.agarunov,

since we reconfigured our CI system to build more often, the error has diapered.
My guess is, that the issue happens after a longer period (> 1 week) of database inactivity only.

The attached logs are from a system that is used less frequently. The error occured after a restart to upgrade from 3.4.1 to 3.4.3., but I think the version switch is unrelated.

I tried to save the corrupt file collection-0--3762849562811845828.wt too, but it is not readable via command-line utils either.
I've also checked the filesystem via btrfs scrub, which returned no errors.

Comment by Mark Agarunov [ 22/Mar/17 ]

Hello mgoehler,

We haven’t heard back from you for some time, so I’m going to mark this ticket as resolved. If this is still an issue for you, please provide additional information and we will reopen the ticket.

Thanks,
Mark

Comment by Michael Göhler [ 07/Feb/17 ]

Hi mark.agarunov,

Thanks for helping me.

We mainly use btrfs for transparent compression to keep the whole VM as small as possible.
Performance is not a concern on these instances, later in production we use xfs.
Switching to xfs is also an option for us, but I thought analysing the issue might be more helpful to the MongoDB community.

What is the layout of the underlying storage? That is, is it btrfs directly on a physical drive, is lvm being used, is this a virtual system/drive?

It's btrfs directly on a VMware virtual SCSI drive (no LVM or such). The VM is hosted in a IaaS Cloud on VMware vSphere.
We don't use btrfs's snapshot feature, so no shared blocks between differnet subvolumes, if that matters.

If possible, please provide the full mongod logs for every instance of this issue.
If possible, please provide the dbpath/diagnostic.data folder

We're using single instances without replicaset/sharding here. I'll collect the complete logs and diagnostic.data on the next occurence.

Is the workload on the database write heavy?

Depends on your definition of write heavy. The instance is idling most of the day, except for some short test executions of approx. 3 minutes where the write performance is up to 15MBps / 600iops. Reading seems to be done from cache all the time, so almost no read I/O (<1MBps / <30iops).

What options is the filesystem mounted with?

rw,relatime,seclabel,compress=lzo,space_cache,subvol=rw/@opt-mongo-db

What options was the filesystem created with?

It was created by the CentOS installer during setup. No special options.

Comment by Mark Agarunov [ 02/Feb/17 ]

Hello mgoehler,

Thank you for the report. Looking over the output you've provided, this seems like it may be an issue on the filesystem level, more specifically how btrfs interacts with the underlying block storage in regards to CoW. To better diagnose this issue, I have a few questions:

  • What is the layout of the underlying storage? That is, is it btrfs directly on a physical drive, is lvm being used, is this a virtual system/drive?
  • If possible, please provide the full mongod logs for every instance of this issue.
  • If possible, please provide the dbpath/diagnostic.data folder
  • Is the workload on the database write heavy?
  • What options is the filesystem mounted with?
  • What options was the filesystem created with?

Thanks,
Mark

Generated at Thu Feb 08 04:16:33 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.