Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-20061

Entire data loss after connectivity issues with the storage

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Critical - P2 Critical - P2
    • None
    • Affects Version/s: 3.0.2
    • Component/s: Admin
    • None
    • ALL

      We've a MongoDB 3.0.2 replica running on Windows.
      We had 2 nodes in the replica with security enabled, but had to remove one for maintenance, so the issue happened while the replica had only one node.

      We than see the following Mongo log lines:

      2015-08-08T06:37:19.115-0400 E STORAGE  [conn37692] WiredTiger (-28936) [1439030239:114410][4392:2004759424], session.commit_transaction: journal/WiredTigerLog.0000000714 write error: failed to write 22528 bytes at offset 23841280: The specified network name is no longer available.
      2015-08-08T06:37:19.115-0400 I -        [conn37692] Invariant failure: s->commit_transaction(s, NULL) resulted in status UnknownError -28936: The specified network name is no longer available.
        at src\mongo\db\storage\wiredtiger\wiredtiger_recovery_unit.cpp 290
      2015-08-08T06:37:19.381-0400 E STORAGE  [conn37617] WiredTiger (-28936) [1439030239:380436][4392:2004759424], session.commit_transaction: journal/WiredTigerLog.0000000714 write error: failed to write 4352 bytes at offset 23863808: The specified network name is no longer available.
      2015-08-08T06:37:19.382-0400 I -        [conn37617] Invariant failure: s->commit_transaction(s, NULL) resulted in status UnknownError -28936: The specified network name is no longer available.
        at src\mongo\db\storage\wiredtiger\wiredtiger_recovery_unit.cpp 290
      2015-08-08T06:37:19.488-0400 E STORAGE  [WiredTigerRecordStoreThread for local.oplog.rs] WiredTiger (-28936) [1439030239:488447][4392:2004759424], session.commit_transaction: journal/WiredTigerLog.0000000714 write error: failed to write 128 bytes at offset 23868160: The specified network name is no longer available.
      2015-08-08T06:37:19.488-0400 I -        [WiredTigerRecordStoreThread for local.oplog.rs] Invariant failure: s->commit_transaction(s, NULL) resulted in status UnknownError -28936: The specified network name is no longer available.
        at src\mongo\db\storage\wiredtiger\wiredtiger_recovery_unit.cpp 290
      2015-08-08T06:37:20.951-0400 I CONTROL  [conn37692] mongod.exe      index_collator_extension+0x141cf3
      2015-08-08T06:37:20.951-0400 I CONTROL  [conn37692] mongod.exe      index_collator_extension+0xfaf17
      2015-08-08T06:37:20.951-0400 I CONTROL  [conn37692] mongod.exe      index_collator_extension+0xedd21
      2015-08-08T06:37:20.951-0400 I CONTROL  [conn37692] mongod.exe      index_collator_extension+0xb4ab
      2015-08-08T06:37:20.951-0400 I CONTROL  [conn37692] mongod.exe      index_collator_extension+0xb16b
      2015-08-08T06:37:20.951-0400 I CONTROL  [conn37692] mongod.exe      mongo::parseNumberFromStringWithBase<unsigned __int64>+0x1b5ffe
      2015-08-08T06:37:20.951-0400 I CONTROL  [conn37692] mongod.exe      mongo::parseNumberFromStringWithBase<unsigned __int64>+0x1b9ad3
      2015-08-08T06:37:20.951-0400 I CONTROL  [conn37692] mongod.exe      mongo::parseNumberFromStringWithBase<unsigned __int64>+0x2efe5d
      2015-08-08T06:37:20.952-0400 I CONTROL  [conn37692] mongod.exe      mongo::parseNumberFromStringWithBase<unsigned __int64>+0x2efa9b
      2015-08-08T06:37:20.952-0400 I CONTROL  [conn37692] mongod.exe      mongo::parseNumberFromStringWithBase<unsigned __int64>+0x14c05b
      2015-08-08T06:37:20.952-0400 I CONTROL  [conn37692] mongod.exe      mongo::parseNumberFromStringWithBase<unsigned __int64>+0x1495e2
      2015-08-08T06:37:20.952-0400 I CONTROL  [conn37692] mongod.exe      mongo::parseNumberFromStringWithBase<unsigned __int64>+0x148230
      2015-08-08T06:37:20.952-0400 I CONTROL  [conn37692] mongod.exe      mongo::parseNumberFromStringWithBase<unsigned __int64>+0x149c3f
      2015-08-08T06:37:20.952-0400 I CONTROL  [conn37692] mongod.exe      mongo::parseNumberFromStringWithBase<unsigned __int64>+0x14f251
      2015-08-08T06:37:20.952-0400 I CONTROL  [conn37692] mongod.exe      mongo::parseNumberFromStringWithBase<unsigned __int64>+0x15ef1a
      2015-08-08T06:37:20.952-0400 I CONTROL  [conn37692] mongod.exe      mongo::parseNumberFromStringWithBase<unsigned __int64>+0x1614b3
      2015-08-08T06:37:20.952-0400 I CONTROL  [conn37692] mongod.exe      mongo::parseNumberFromStringWithBase<unsigned __int64>+0x15f380
      2015-08-08T06:37:20.952-0400 I CONTROL  [conn37692] mongod.exe      mongo::parseNumberFromStringWithBase<unsigned __int64>+0x2c8ce3
      2015-08-08T06:37:20.952-0400 I CONTROL  [conn37692] mongod.exe      mongo::parseNumberFromStringWithBase<unsigned __int64>+0x2c9103
      2015-08-08T06:37:20.953-0400 I CONTROL  [conn37692] mongod.exe      mongo::parseNumberFromStringWithBase<unsigned __int64>+0x215e1b
      2015-08-08T06:37:20.953-0400 I CONTROL  [conn37692] mongod.exe      mongo::parseNumberFromStringWithBase<unsigned __int64>+0x2109f2
      2015-08-08T06:37:20.953-0400 I CONTROL  [conn37692] mongod.exe      index_collator_extension+0x43e568
      2015-08-08T06:37:20.953-0400 I CONTROL  [conn37692] mongod.exe      index_collator_extension+0x1071c3
      2015-08-08T06:37:20.953-0400 I CONTROL  [conn37692] mongod.exe      index_collator_extension+0x161a91
      2015-08-08T06:37:20.953-0400 I CONTROL  [conn37692] MSVCR120.dll    beginthreadex+0x107
      2015-08-08T06:37:20.953-0400 I CONTROL  [conn37692] MSVCR120.dll    endthreadex+0x192
      2015-08-08T06:37:20.953-0400 I CONTROL  [conn37692] kernel32.dll    BaseThreadInitThunk+0xd
      2015-08-08T06:37:20.957-0400 I CONTROL  [conn37692] 
      2015-08-08T06:37:20.957-0400 I -        [conn37692] 
      
      ***aborting after invariant() failure
      
      
      2015-08-08T06:38:47.685-0400 I CONTROL  ***** SERVER RESTARTED *****
      2015-08-08T06:38:47.795-0400 I CONTROL  Hotfix KB2731284 or later update is not installed, will zero-out data files
      2015-08-08T06:38:48.159-0400 I CONTROL  Trying to start Windows service 'MongoDB'
      2015-08-08T06:38:48.160-0400 I STORAGE  Service running
      2015-08-08T06:38:48.324-0400 W -        [initandlisten] Detected unclean shutdown - \\STORAGE\FOLDER\mongod.lock is not empty.
      2015-08-08T06:38:48.326-0400 W STORAGE  [initandlisten] Recovering data from the last clean checkpoint.
      2015-08-08T06:38:48.329-0400 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=4G,session_max=20000,eviction=(threads_max=4),statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),
      2015-08-08T06:40:05.322-0400 E STORAGE  [initandlisten] WiredTiger (-28998) [1439030405:322774][4460:2004759424], connection: \\STORAGE\FOLDER\journal/WiredTigerLog.0000000715: The system cannot find the file specified.
        a\journal
      2015-08-08T06:40:05.331-0400 E STORAGE  [initandlisten] WiredTiger (-28994) [1439030405:331775][4460:2004759424], file:WiredTiger.wt, sweep-server: WiredTiger.wt SetEndOfFile error: The handle is invalid.
      
        he file specified.
      
        a\journal
      2015-08-08T06:40:05.332-0400 E STORAGE  [initandlisten] WiredTiger (-28994) [1439030405:332775][4460:2004759424], file:WiredTiger.wt, sweep-server: Final close of file:WiredTiger.wt failed: The handle is invalid.
      
        he file specified.
      
        a\journal
      2015-08-08T06:40:05.338-0400 I -        [initandlisten] Assertion: 28595:-28998: The system cannot find the file specified.
      2015-08-08T06:40:05.366-0400 I STORAGE  [initandlisten] exception in initAndListen: 28595 -28998: The system cannot find the file specified.
      
      , terminating
      2015-08-08T06:40:05.366-0400 I CONTROL  [initandlisten] dbexit:  rc: 49
      2015-08-08T06:40:23.877-0400 I CONTROL  ***** SERVER RESTARTED *****
      2015-08-08T06:40:23.877-0400 I CONTROL  Hotfix KB2731284 or later update is not installed, will zero-out data files
      2015-08-08T06:40:24.279-0400 I CONTROL  Trying to start Windows service 'MongoDB'
      2015-08-08T06:40:24.281-0400 I STORAGE  Service running
      2015-08-08T06:40:24.510-0400 W -        [initandlisten] Detected unclean shutdown - \\STORAGE\FOLDER\mongod.lock is not empty.
      2015-08-08T06:40:24.513-0400 W STORAGE  [initandlisten] Recovering data from the last clean checkpoint.
      2015-08-08T06:40:24.514-0400 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=4G,session_max=20000,eviction=(threads_max=4),statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),
      2015-08-08T06:40:34.965-0400 I CONTROL  [initandlisten] MongoDB starting : pid=1728 port=27017 dbpath=\STORAGE\FOLDER 64-bit host=miragemgt
      2015-08-08T06:40:34.965-0400 I CONTROL  [initandlisten] targetMinOS: Windows 7/Windows Server 2008 R2
      2015-08-08T06:40:34.965-0400 I CONTROL  [initandlisten] db version v3.0.2
      2015-08-08T06:40:34.966-0400 I CONTROL  [initandlisten] git version: 6201872043ecbbc0a4cc169b5482dcf385fc464f
      2015-08-08T06:40:34.966-0400 I CONTROL  [initandlisten] OpenSSL version: OpenSSL 1.0.1m-fips 19 Mar 2015
      2015-08-08T06:40:34.966-0400 I CONTROL  [initandlisten] build info: windows sys.getwindowsversion(major=6, minor=1, build=7601, platform=2, service_pack='Service Pack 1') BOOST_LIB_VERSION=1_49
      2015-08-08T06:40:34.966-0400 I CONTROL  [initandlisten] allocator: system
      2015-08-08T06:40:34.966-0400 I CONTROL  [initandlisten] options: { config: "C:\mongod1.cfg", net: { port: 27017 }, operationProfiling: { slowOpThresholdMs: 5000 }, replication: { oplogSizeMB: 2048, replSetName: "REPLICA-NAME" }, security: { authorization: "enabled", keyFile: "C:\securityFile.txt" }, service: true, storage: { dbPath: "\\STORAGE\FOLDER", engine: "wiredTiger", journal: { enabled: true }, wiredTiger: { engineConfig: { cacheSizeGB: 4 } } }, systemLog: { destination: "file", logAppend: true, path: "C:\logs\mongod1.log", quiet: true } }
      2015-08-08T06:40:35.012-0400 I REPL     [initandlisten] Did not find local replica set configuration document at startup;  NoMatchingDocument Did not find replica set configuration document in local.system.replset
      

      After which we identify the replica configuration disappeared.
      We recall rs.initiate() which succeeds, but than find the entire database data is gone, we've only one database (other than the default admin) with only one collection, and we see it contains no data.

      Why did this happen? what should've we done differently?

            Assignee:
            ramon.fernandez@mongodb.com Ramon Fernandez Marina
            Reporter:
            sternr-windowslive Rami stern
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: