Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-6561

wt utility reports a data corruption error while recovering

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Needs Scheduling
    • Priority: Major - P3
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Recovering shard0/node1 from the BFG-642258 failed database files using wt utility failed with data corruption and whereas it worked with mongod.

      wt utility error on the same database:

      [root@ip-10-122-11-175 artifacts-383e44-linux-64-repeated-execution_concurrency_sharded_replication]# ../artifacts-383e44-linux-64-repeated-execution_compile/dist-test/bin/wt -h data/db/job0/resmoke/shard0/node1 -R salvage file:_mdb_catalog.wt > dump_catalog1.txt
      [1596156536:744923][115420:0x7f70feee0880], wt, file:collection-2--9072357644366475232.wt, txn rollback_to_stable: __wt_block_read_off, 283: collection-2--9072357644366475232.wt: read checksum error for 4096B block at offset 32768: block header checksum of 0xb68de82f doesn't match expected checksum of 0x4178f07c
      

      mongod logs on the same database:

      [root@ip-10-122-11-175 artifacts-383e44-linux-64-repeated-execution_concurrency_sharded_replication]# ../artifacts-383e44-linux-64-repeated-execution_compile/dist-test/bin/mongod --replSet shard-rs0 --dbpath=data/db/job0/resmoke/shard0/node1
      {"t":{"$date":"2020-07-31T01:26:26.637+00:00"},"s":"I",  "c":"CONTROL",  "id":23285,   "ctx":"main","msg":"Automatically disabling TLS 1.0, to force-enable TLS 1.0 specify --sslDisabledProtocols 'none'"}
      {"t":{"$date":"2020-07-31T01:26:26.641+00:00"},"s":"W",  "c":"ASIO",     "id":22601,   "ctx":"main","msg":"No TransportLayer configured during NetworkInterface startup"}
      {"t":{"$date":"2020-07-31T01:26:26.641+00:00"},"s":"I",  "c":"NETWORK",  "id":4648602, "ctx":"main","msg":"Implicit TCP FastOpen in use."}
      {"t":{"$date":"2020-07-31T01:26:26.641+00:00"},"s":"W",  "c":"ASIO",     "id":22601,   "ctx":"main","msg":"No TransportLayer configured during NetworkInterface startup"}
      {"t":{"$date":"2020-07-31T01:26:26.641+00:00"},"s":"I",  "c":"STORAGE",  "id":4615611, "ctx":"initandlisten","msg":"MongoDB starting","attr":{"pid":47888,"port":27017,"dbPath":"data/db/job0/resmoke/shard0/node1","architecture":"64-bit","host":"ip-10-122-11-175"}}
      {"t":{"$date":"2020-07-31T01:26:26.641+00:00"},"s":"I",  "c":"CONTROL",  "id":23403,   "ctx":"initandlisten","msg":"Build Info","attr":{"buildInfo":{"version":"4.4.0-rc13-1-g383e442","gitVersion":"383e442b881c089afebaea1d11aeec196f8efc56","openSSLVersion":"OpenSSL 1.0.1e-fips 11 Feb 2013","modules":[],"allocator":"tcmalloc","environment":{"distarch":"x86_64","target_arch":"x86_64"}}}}
      {"t":{"$date":"2020-07-31T01:26:26.641+00:00"},"s":"I",  "c":"CONTROL",  "id":51765,   "ctx":"initandlisten","msg":"Operating System","attr":{"os":{"name":"Red Hat Enterprise Linux Server release 6.2 (Santiago)","version":"Kernel 2.6.32-220.el6.x86_64"}}}
      {"t":{"$date":"2020-07-31T01:26:26.641+00:00"},"s":"I",  "c":"CONTROL",  "id":21951,   "ctx":"initandlisten","msg":"Options set by command line","attr":{"options":{"replication":{"replSet":"shard-rs0"},"storage":{"dbPath":"data/db/job0/resmoke/shard0/node1"}}}}
      {"t":{"$date":"2020-07-31T01:26:26.642+00:00"},"s":"W",  "c":"STORAGE",  "id":22271,   "ctx":"initandlisten","msg":"Detected unclean shutdown - Lock file is not empty","attr":{"lockFile":"data/db/job0/resmoke/shard0/node1/mongod.lock"}}
      {"t":{"$date":"2020-07-31T01:26:26.642+00:00"},"s":"I",  "c":"STORAGE",  "id":22270,   "ctx":"initandlisten","msg":"Storage engine to use detected by data files","attr":{"dbpath":"data/db/job0/resmoke/shard0/node1","storageEngine":"wiredTiger"}}
      {"t":{"$date":"2020-07-31T01:26:26.642+00:00"},"s":"W",  "c":"STORAGE",  "id":22302,   "ctx":"initandlisten","msg":"Recovering data from the last clean checkpoint."}
      {"t":{"$date":"2020-07-31T01:26:26.642+00:00"},"s":"I",  "c":"STORAGE",  "id":22315,   "ctx":"initandlisten","msg":"Opening WiredTiger","attr":{"config":"create,cache_size=14590M,session_max=33000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000,close_scan_interval=10,close_handle_minimum=250),statistics_log=(wait=0),verbose=[recovery_progress,checkpoint_progress,compact_progress],"}}
      {"t":{"$date":"2020-07-31T01:26:28.757+00:00"},"s":"I",  "c":"STORAGE",  "id":4795906, "ctx":"initandlisten","msg":"WiredTiger opened","attr":{"durationMillis":2115}}
      {"t":{"$date":"2020-07-31T01:26:28.757+00:00"},"s":"I",  "c":"RECOVERY", "id":23987,   "ctx":"initandlisten","msg":"WiredTiger recoveryTimestamp","attr":{"recoveryTimestamp":{"$timestamp":{"t":1594557134,"i":219}}}}
      {"t":{"$date":"2020-07-31T01:26:28.760+00:00"},"s":"I",  "c":"STORAGE",  "id":22383,   "ctx":"initandlisten","msg":"The size storer reports that the oplog contains","attr":{"numRecords":5505395,"dataSize":1074051648}}
      {"t":{"$date":"2020-07-31T01:26:28.760+00:00"},"s":"I",  "c":"STORAGE",  "id":22386,   "ctx":"initandlisten","msg":"Sampling the oplog to determine where to place markers for truncation"}
      {"t":{"$date":"2020-07-31T01:26:28.760+00:00"},"s":"I",  "c":"STORAGE",  "id":22389,   "ctx":"initandlisten","msg":"Sampling from the oplog to determine where to place markers for truncation","attr":{"from":{"$timestamp":{"t":1594550038,"i":24410}},"to":{"$timestamp":{"t":1594557166,"i":34}}}}
      {"t":{"$date":"2020-07-31T01:26:28.760+00:00"},"s":"I",  "c":"STORAGE",  "id":22390,   "ctx":"initandlisten","msg":"Taking samples and assuming each oplog section contains","attr":{"numSamples":630,"containsNumRecords":87363,"containsNumBytes":17043713}}
      {"t":{"$date":"2020-07-31T01:26:28.811+00:00"},"s":"I",  "c":"STORAGE",  "id":22393,   "ctx":"initandlisten","msg":"Oplog sampling complete"}
      {"t":{"$date":"2020-07-31T01:26:28.811+00:00"},"s":"I",  "c":"STORAGE",  "id":22382,   "ctx":"initandlisten","msg":"WiredTiger record store oplog processing finished","attr":{"durationMillis":50}}
      {"t":{"$date":"2020-07-31T01:26:28.816+00:00"},"s":"I",  "c":"STORAGE",  "id":22262,   "ctx":"initandlisten","msg":"Timestamp monitor starting"}
      {"t":{"$date":"2020-07-31T01:26:28.818+00:00"},"s":"W",  "c":"CONTROL",  "id":22120,   "ctx":"initandlisten","msg":"Access control is not enabled for the database. Read and write access to data and configuration is unrestricted","tags":["startupWarnings"]}
      

      As the failed collection that is reported while recovering using the wt utility is mapped to mongod table - local.replset.oplogTruncateAfterPoint.

        Attachments

          Activity

            People

            Assignee:
            backlog-server-storage-engines Backlog - Storage Engines Team
            Reporter:
            haribabu.kommi Haribabu Kommi
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Dates

              Created:
              Updated: