Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Cannot Reproduce
Priority: Critical - P2
Fix Version/s: None
Affects Version/s: 3.6.1
Component/s: Stability, Storage
Labels:
None

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Steps To Reproduce:

Hide

This won't be easy.

Centos 7 VM - VirtualBox 5.1.30 on OSX, 2 cores, 8GB RAM
Use Mlaunch to start a 2 shard CSRS MDB 3.6.1 cluster with 0.5GB RAM per WT cache
mongorestore 5GB of data (from POCDriver)

Crashes - sometimes leaving DB corrupted.

Database files are available.

Show
This won't be easy. Centos 7 VM - VirtualBox 5.1.30 on OSX, 2 cores, 8GB RAM Use Mlaunch to start a 2 shard CSRS MDB 3.6.1 cluster with 0.5GB RAM per WT cache mongorestore 5GB of data (from POCDriver) Crashes - sometimes leaving DB corrupted. Database files are available.
Confidence Status:
None
Work Order:
0
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

MongoDB 3.6 - Centos 7

Running in VM (Virtualbox on OSX) - several times Virtualbox has crashed with memory error. Not this is NOT the bug.

After this crash - WT databases being written to at the time corrupted and cannot start.

2018-01-04T16:45:57.564+0000 I CONTROL  [initandlisten] MongoDB starting : pid=1339 port=27024 dbpath=/root/data/configRepl/rs1/db 64-bit host=centos7.local
2018-01-04T16:45:57.564+0000 I CONTROL  [initandlisten] db version v3.6.1
2018-01-04T16:45:57.564+0000 I CONTROL  [initandlisten] git version: 025d4f4fe61efd1fb6f0005be20cb45a004093d1
2018-01-04T16:45:57.564+0000 I CONTROL  [initandlisten] OpenSSL version: OpenSSL 1.0.1e-fips 11 Feb 2013
2018-01-04T16:45:57.564+0000 I CONTROL  [initandlisten] allocator: tcmalloc
2018-01-04T16:45:57.564+0000 I CONTROL  [initandlisten] modules: none
2018-01-04T16:45:57.564+0000 I CONTROL  [initandlisten] build environment:
2018-01-04T16:45:57.564+0000 I CONTROL  [initandlisten]     distmod: rhel70
2018-01-04T16:45:57.564+0000 I CONTROL  [initandlisten]     distarch: x86_64
2018-01-04T16:45:57.564+0000 I CONTROL  [initandlisten]     target_arch: x86_64
2018-01-04T16:45:57.564+0000 I CONTROL  [initandlisten] options: { net: { port: 27024 }, processManagement: { fork: true }, replication: { replSet: "configRepl" }, security: { keyFile: "/root/data/keyfile" }, sharding: { clusterRole: "configsvr" }, storage: { dbPath: "/root/data/configRepl/rs1/db", wiredTiger: { engineConfig: { cacheSizeGB: 0.5 } } }, systemLog: { destination: "file", path: "/root/data/configRepl/rs1/mongod.log" } }
2018-01-04T16:45:57.565+0000 W -        [initandlisten] Detected unclean shutdown - /root/data/configRepl/rs1/db/mongod.lock is not empty.
2018-01-04T16:45:57.566+0000 I -        [initandlisten] Detected data files in /root/data/configRepl/rs1/db created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'.
2018-01-04T16:45:57.566+0000 W STORAGE  [initandlisten] Recovering data from the last clean checkpoint.
2018-01-04T16:45:57.566+0000 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=512M,session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),statistics_log=(wait=0),verbose=(recovery_progress),
2018-01-04T16:45:57.950+0000 I STORAGE  [initandlisten] WiredTiger message [1515084357:950005][1339:0x7f560203bb00], txn-recover: Main recovery loop: starting at 2/42880
2018-01-04T16:45:57.950+0000 I STORAGE  [initandlisten] WiredTiger message [1515084357:950414][1339:0x7f560203bb00], txn-recover: Recovering log 2 through 3
2018-01-04T16:45:58.029+0000 I STORAGE  [initandlisten] WiredTiger message [1515084358:29048][1339:0x7f560203bb00], file:index-49--1899953198262174362.wt, txn-recover: Recovering log 3 through 3
2018-01-04T16:45:58.186+0000 I STORAGE  [initandlisten] Starting WiredTigerRecordStoreThread local.oplog.rs
2018-01-04T16:45:58.186+0000 I STORAGE  [initandlisten] The size storer reports that the oplog contains 113 records totaling to 24685 bytes
2018-01-04T16:45:58.186+0000 I STORAGE  [initandlisten] Scanning the oplog to determine where to place markers for truncation
2018-01-04T16:45:58.199+0000 I CONTROL  [initandlisten] ** WARNING: You are running this process as the root user, which is not recommended.
2018-01-04T16:45:58.199+0000 I CONTROL  [initandlisten] 
2018-01-04T16:45:58.199+0000 I CONTROL  [initandlisten] ** WARNING: This server is bound to localhost.
2018-01-04T16:45:58.199+0000 I CONTROL  [initandlisten] **          Remote systems will be unable to connect to this server. 
2018-01-04T16:45:58.199+0000 I CONTROL  [initandlisten] **          Start the server with --bind_ip <address> to specify which IP 
2018-01-04T16:45:58.199+0000 I CONTROL  [initandlisten] **          addresses it should serve responses from, or with --bind_ip_all to
2018-01-04T16:45:58.199+0000 I CONTROL  [initandlisten] **          bind to all interfaces. If this behavior is desired, start the
2018-01-04T16:45:58.199+0000 I CONTROL  [initandlisten] **          server with --bind_ip 127.0.0.1 to disable this warning.
2018-01-04T16:45:58.199+0000 I CONTROL  [initandlisten] 
2018-01-04T16:45:58.200+0000 I CONTROL  [initandlisten] 
2018-01-04T16:45:58.200+0000 I CONTROL  [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/enabled is 'always'.
2018-01-04T16:45:58.200+0000 I CONTROL  [initandlisten] **        We suggest setting it to 'never'
2018-01-04T16:45:58.200+0000 I CONTROL  [initandlisten] 
2018-01-04T16:45:58.200+0000 I CONTROL  [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/defrag is 'always'.
2018-01-04T16:45:58.200+0000 I CONTROL  [initandlisten] **        We suggest setting it to 'never'
2018-01-04T16:45:58.200+0000 I CONTROL  [initandlisten] 
2018-01-04T16:45:58.245+0000 I FTDC     [initandlisten] Initializing full-time diagnostic data capture with directory '/root/data/configRepl/rs1/db/diagnostic.data'
2018-01-04T16:45:58.246+0000 I SHARDING [initandlisten] first cluster operation detected, adding sharding hook to enable versioning and authentication to remote servers
2018-01-04T16:45:58.246+0000 I SHARDING [thread1] creating distributed lock ping thread for process ConfigServer (sleeping for 30000ms)
2018-01-04T16:45:58.248+0000 I REPL     [initandlisten] Starting recovery oplog application at the appliedThrough: { ts: Timestamp(1515080359, 2), t: 2 }
2018-01-04T16:45:58.248+0000 I REPL     [initandlisten] Replaying stored operations from { : Timestamp(1515080359, 2) } (exclusive) to { : Timestamp(1515082949, 2) } (inclusive).
2018-01-04T16:45:58.249+0000 F REPL     [initandlisten] Oplog entry at { : Timestamp(1515080359, 2) } is missing; actual entry found is { : Timestamp(1515082746, 1) }
2018-01-04T16:45:58.249+0000 F -        [initandlisten] Fatal Assertion 40292 at src/mongo/db/repl/replication_recovery.cpp 217
2018-01-04T16:45:58.249+0000 F -        [initandlisten]

Assignee:: Daniel Gottlieb (Inactive)
Reporter:: John Page
Participants:: Bruce Lucas, Daniel Gottlieb, Daniel Pasette, John Page, Susan LoVerso
Votes:: 0 Vote for this issue
Watchers:: 8 Start watching this issue

Created:: Jan 04 2018 04:53:52 PM UTC
Updated:: Jan 29 2018 06:35:16 PM UTC
Resolved:: Jan 05 2018 02:15:17 PM UTC

Details

Description

Attachments

Activity

People

Dates