[SERVER-28168] Cannot start or repair mongodb after unexpected shutdown. Created: 02/Mar/17  Updated: 15/Aug/17  Resolved: 24/Apr/17

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: None
Fix Version/s: 3.2.13, 3.4.4, 3.5.6

Type: Bug Priority: Major - P3
Reporter: kurevo18 Assignee: Keith Bostic (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-27820 Improve storage engine logging at sta... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Storage 2017-04-17, Storage 2017-05-08
Participants:

 Description   

Hello, i'm running mongodb on VPS, here is the version:

MongoDB shell version v3.4.1
git version: 5e103c4f5583e2566a45d740225dc250baacfbd7
OpenSSL version: OpenSSL 1.0.1 14 Mar 2012
allocator: tcmalloc
modules: none
build environment:
distmod: ubuntu1204
distarch: x86_64
target_arch: x86_64

I had an unexpected shutdown, i looked up the log and i can see following:

2017-03-02T01:44:58.205+0200 I CONTROL  [initandlisten] options: { config: "/etc/mongod.conf", net: { bindIp: "127.0.0.1", port: 27017 }, storage: { dbPath: "/var/lib/mongodb", journal: { enabled: true } }, systemLog: { destination: "file", logAppend: true, path: "/var/log/mongodb/mongod.log" } }
2017-03-02T01:44:58.205+0200 W -        [initandlisten] Detected unclean shutdown - /var/lib/mongodb/mongod.lock is not empty.
2017-03-02T01:44:58.224+0200 I -        [initandlisten] Detected data files in /var/lib/mongodb created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'.
2017-03-02T01:44:58.224+0200 W STORAGE  [initandlisten] Recovering data from the last clean checkpoint.
2017-03-02T01:44:58.224+0200 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=2560M,session_max=20000,eviction=(threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),
2017-03-02T01:44:58.235+0200 I -        [initandlisten] Assertion: 28595:2: No such file or directory src/mongo/db/storage/wiredtiger/wiredtiger_kv_engine.cpp 267
2017-03-02T01:44:58.236+0200 I STORAGE  [initandlisten] exception in initAndListen: 28595 2: No such file or directory, terminating
2017-03-02T01:44:58.236+0200 I NETWORK  [initandlisten] shutdown: going to close listening sockets...
2017-03-02T01:44:58.236+0200 I NETWORK  [initandlisten] removing socket file: /tmp/mongodb-27017.sock
2017-03-02T01:44:58.236+0200 I NETWORK  [initandlisten] shutdown: going to flush diaglog...
2017-03-02T01:44:58.236+0200 I CONTROL  [initandlisten] now exiting
2017-03-02T01:44:58.236+0200 I CONTROL  [initandlisten] shutting down with code:100

I get exactly same message when i try to do mongod --repair --dbpath /var/lib/mongodb

Someone please help!!! I will be thrown out from work...



 Comments   
Comment by Githook User [ 13/Apr/17 ]

Author:

{u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexander.gorrod@mongodb.com'}

Message: Import wiredtiger: f5c08e2b5f02805b062888d45c9eca19af175f7e from branch mongodb-3.2

ref: d48181f6f4..f5c08e2b5f
for: 3.2.13

SERVER-16796 Increase logging activity for journal recovery operations
SERVER-28168 Cannot start or repair mongodb after unexpected shutdown.
SERVER-28194 Missing WiredTiger.turtle file loses data
WT-2402 Misaligned structure accesses lead to undefined behavior
WT-2439 Enhance reconciliation page layout
WT-2771 Add a statistic to track per-btree dirty cache usage
WT-2790 Fix a text case false positive in test_sweep01
WT-2833 improvement: add projections to wt dump utility
WT-2898 Improve performance of eviction-heavy workloads by dynamically controlling the number of eviction threads
WT-2909 Create automatable test verifying checkpoint integrity after errors
WT-2978 Make WiredTiger python binding pip-compatible
WT-2990 checkpoint load live_open assertion failure
WT-2994 Create documentation describing page sizes and relationships
WT-3080 Python test suite: add timestamp or elapsed time for tests
WT-3082 Python test suite: shorten default run to avoid pull request timeouts.
WT-3083 Fix a bug in wtperf config dump
WT-3086 Add transaction state information to cache stuck diagnostic information
WT-3088 bug: Don't evict a page with refs visible to readers after a split
WT-3091 Add stats to test_perf0001
WT-3092 Quiet a warning from autogen.sh
WT-3093 Padding the WT_RWLOCK structure grew the WT_PAGE structure.
WT-3097 Race on reconfigure or shutdown can lead to waiting for statistics log server
WT-3099 lint: static function declarations, non-text characters in documentation
WT-3100 test bug: format is weighted to delete, insert, then write operations.
WT-3104 Fix wtperf configs for eviction tests
WT-3105 Fix a deadlock caused by allocating eviction thread sessions dynamically
WT-3106 Add truncate support to command line wt utility
WT-3108 Also dump disk page size as part of metadata information
WT-3109 wording fix in transaction doc
WT-3110 Add more test cases for the WT command line utility
WT-3111 util_create() doesnt free memory assigned to "uri"
WT-3112 Handle list lock statistic not incremented in eviction server
WT-3113 Add a verbose mode to dump the cache when eviction is stuck
WT-3114 Avoid archiving log files immediately after recovery
WT-3115 Change the dhandle lock to a read/write lock
WT-3116 Python style testing in s_all may not execute correctly
WT-3118 Protect random-abort test against unexpectedly slow child start
WT-3120 Fix ordering problem in connection_close for filesystem loaded in an extension
WT-3121 In test suite create standard way to load extensions
WT-3126 bug: dist/s_all script has misplaced quote causing bad error reporting
WT-3127 bug: CPU yield calls don't necessarily imply memory barriers
WT-3128 wt printlog returns operation-not-supported if it doesn't find any log files
WT-3130 Ensure extensions have access to database home directory
WT-3134 Coverity scan reports 1368529 and 1368528
WT-3135 search_near() for index with custom collator
WT-3136 bug fix: WiredTiger doesn't check sprintf calls for error return
WT-3137 Hang in _log_slot_join/_log_slot_switch_internal
WT-3139 Enhance wtperf to support periodic table scans
WT-3144 bug fix: random cursor returns not-found when descending to an empty page
WT-3148 Improve eviction efficiency with many small trees
WT-3149 Change eviction to start new walks from a random place in the tree
WT-3150 Reduce impact of checkpoints on eviction server
WT-3152 Convert table lock from a spinlock to a read write lock
WT-3155 Remove WT_CONN_SERVER_RUN flag
WT-3156 Assertion in log_write fires after write failure
WT-3157 checkpoint/transaction integrity issue when writes fail.
WT-3159 Incorrect key for index containing multiple variable sized entries
WT-3161 checkpoint hang after write failure injection.
WT-3164 Ensure all relevant btree fields are reset on checkpoint error
WT-3170 Clear the eviction walk point while populating from a tree
WT-3173 Add runtime detection for s390x CRC32 hardware support
WT-3174 Coverity/lint cleanup
WT-3175 New hang in internal page split
WT-3179 test bug: clang sanitizer failure in fail_fs
WT-3180 fault injection tests should only run as "long" tests and should not create core files
WT-3182 Switch make-check to run the short test suite by default
WT-3184 Problem duplicating index cursor with custom collator
WT-3186 Fix error path and panic detection in logging loops
WT-3187 Hang on shutdown with a busy cache pool
WT-3188 Fix error handling in logging where fatal errors could lead to a hang
WT-3189 Fix a segfault in the eviction server random positioning
WT-3190 Enhance eviction thread auto-tuning algorithm
WT-3191 lint
WT-3193 Close a race between verify opening a handle and eviction visiting it
WT-3196 Race with LSM and eviction when switching chunks
WT-3199 bug: eviction assertion failure
WT-3202 wtperf report an error on in_memory=true mode : No such file or directory
WT-3203 bulk-load state changes can race
WT-3204 eviction changes cost LSM performance
WT-3206 bug: core dump on NULL page index
WT-3207 Drops with checkpoint_wait=false should not wait for checkpoints
WT-3208 test format hung with 9mb cache
WT-3211 WT_CURSOR.remove cannot always retain its position.
WT-3212 'wt dump' crashes when given table with unknown collator
WT-3213 generated test/format CONFIG invalid on next run
WT-3216 add support for clang-tidy
WT-3218 unexpected checkpoint ordering failures
WT-3224 LSM assertion failure pindex->entries == 1
WT-3225 WiredTiger won't build with clang on CentOS 7.3.1611
WT-3227 Python test suite inserts unnecessary whitespace in error output.
WT-3228 Remove with overwrite shouldn't return WT_NOTFOUND
WT-3234 Update WiredTiger build for clang 4.0.
WT-3238 Java: Cursor.compare and Cursor.equals throw Exceptions for valid return values
WT-3240 Coverity reports
WT-3243 Reorder log slot release so joins don't wait on IO
WT-3244 metadata operations failing in in-memory configurations
WT-3249 Unit test test_readonly fails as it is unable to open WiredTiger.lock
WT-3250 Incorrect statistics incremented on Windows
WT-3254 test_reconfig02 uses incorrect configuration string
WT-3262 Schema operations shouldn't wait for cache
WT-3265 Verify hits assertion in eviction when transiting handle to exclusive mode
WT-3271 Eviction tuning stuck in a loop
WT-98 Update the current cursor value without a search
Branch: v3.2
https://github.com/mongodb/mongo/commit/e5de3702c1dd8257c6289869d2cbd8b014221808

Comment by Githook User [ 13/Apr/17 ]

Author:

{u'username': u'keithbostic', u'name': u'Keith Bostic', u'email': u'keith.bostic@mongodb.com'}

Message: SERVER-28168 Cannot start or repair mongodb after unexpected shutdown. (#3353)

Panic if there's an error in reading/writing from/to the turtle file,
there's no point in continuing. This change avoids user confusion when
the turtle file is corrupted or zero'd out by the filesystem.
Branch: mongodb-3.2
https://github.com/wiredtiger/wiredtiger/commit/a5b3166ab7bcdb365b60686246b8e5624efeca84

Comment by Githook User [ 12/Apr/17 ]

Author:

{u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexander.gorrod@mongodb.com'}

Message: Import wiredtiger: cb16839cfbdf338af95bed43ca40979ae6e32f54 from branch mongodb-3.4

ref: cc2f15f595..cb16839cfb
for: 3.4.4

SERVER-28168 Cannot start or repair mongodb after unexpected shutdown.
SERVER-28194 Missing WiredTiger.turtle file loses data
WT-2439 Enhance reconciliation page layout
WT-2978 Make WiredTiger python binding pip-compatible
WT-2990 checkpoint load live_open assertion failure
WT-3136 bug fix: WiredTiger doesn't check sprintf calls for error return
WT-3155 Remove WT_CONN_SERVER_RUN flag
WT-3182 Switch make-check to run the short test suite by default
WT-3190 Enhance eviction thread auto-tuning algorithm
WT-3191 lint
WT-3193 Close a race between verify opening a handle and eviction visiting it
WT-3196 Race with LSM and eviction when switching chunks
WT-3199 bug: eviction assertion failure
WT-3202 wtperf report an error on in_memory=true mode : No such file or directory
WT-3203 bulk-load state changes can race
WT-3204 eviction changes cost LSM performance
WT-3206 bug: core dump on NULL page index
WT-3207 Drops with checkpoint_wait=false should not wait for checkpoints
WT-3208 test format hung with 9mb cache
WT-3211 WT_CURSOR.remove cannot always retain its position.
WT-3212 'wt dump' crashes when given table with unknown collator
WT-3213 generated test/format CONFIG invalid on next run
WT-3216 add support for clang-tidy
WT-3218 unexpected checkpoint ordering failures
WT-3224 LSM assertion failure pindex->entries == 1
WT-3225 WiredTiger won't build with clang on CentOS 7.3.1611
WT-3227 Python test suite inserts unnecessary whitespace in error output.
WT-3228 Remove with overwrite shouldn't return WT_NOTFOUND
WT-3234 Update WiredTiger build for clang 4.0.
WT-3238 Java: Cursor.compare and Cursor.equals throw Exceptions for valid return values
WT-3240 Coverity reports
WT-3243 Reorder log slot release so joins don't wait on IO
WT-3244 metadata operations failing in in-memory configurations
WT-3249 Unit test test_readonly fails as it is unable to open WiredTiger.lock
WT-3250 Incorrect statistics incremented on Windows
WT-3254 test_reconfig02 uses incorrect configuration string
WT-3262 Schema operations shouldn't wait for cache
WT-3265 Verify hits assertion in eviction when transiting handle to exclusive mode
WT-3271 Eviction tuning stuck in a loop
WT-98 Update the current cursor value without a search
Branch: v3.4
https://github.com/mongodb/mongo/commit/9c2e3c5396adb6bbaaf6a19e6c017b051f943ebf

Comment by Githook User [ 12/Apr/17 ]

Author:

{u'username': u'keithbostic', u'name': u'Keith Bostic', u'email': u'keith.bostic@mongodb.com'}

Message: SERVER-28168 Cannot start or repair mongodb after unexpected shutdown. (#3353)

Panic if there's an error in reading/writing from/to the turtle file,
there's no point in continuing. This change avoids user confusion when
the turtle file is corrupted or zero'd out by the filesystem.
Branch: mongodb-3.4
https://github.com/wiredtiger/wiredtiger/commit/a5b3166ab7bcdb365b60686246b8e5624efeca84

Comment by Githook User [ 31/Mar/17 ]

Author:

{u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexander.gorrod@mongodb.com'}

Message: Import wiredtiger: af735d14a603a6ef6256a6685f09ec13755a5024 from branch mongodb-3.6

ref: cc2f15f595..af735d14a6
for: 3.5.6

SERVER-28168 Cannot start or repair mongodb after unexpected shutdown.
SERVER-28194 Missing WiredTiger.turtle file loses data
WT-2439 Enhance reconciliation page layout
WT-2978 Make WiredTiger python binding pip-compatible
WT-2990 Fix a new bug where checkpoint load live_open failed
WT-3136 bug fix: WiredTiger doesn't check sprintf calls for error return
WT-3155 Remove WT_CONN_SERVER_RUN flag
WT-3182 Switch make-check to run the short test suite by default
WT-3190 Enhance eviction thread auto-tuning algorithm
WT-3191 Fix lint complaints
WT-3193 Close a race between verify opening a handle and eviction visiting it
WT-3196 Race with LSM and eviction when switching chunks
WT-3199 bug: eviction assertion failure
WT-3202 wtperf report an error on in_memory=true mode : No such file or directory
WT-3203 bulk-load state changes can race
WT-3204 eviction changes cost LSM performance
WT-3206 bug: core dump on NULL page index
WT-3207 Drops with checkpoint_wait=false should not wait for checkpoints
WT-3208 test format hung with 9mb cache
WT-3211 WT_CURSOR.remove cannot always retain its position.
WT-3212 'wt dump' crashes when given table with unknown collator
WT-3213 generated test/format CONFIG invalid on next run
WT-3216 add support for clang-tidy
WT-3218 unexpected checkpoint ordering failures
WT-3224 LSM assertion failure pindex->entries == 1
WT-3225 WiredTiger won't build with clang on CentOS 7.3.1611
WT-3227 Python test suite inserts unnecessary whitespace in error output.
WT-3228 Remove with overwrite shouldn't return WT_NOTFOUND
WT-3234 Update WiredTiger build for clang 4.0.
WT-3238 Java: Cursor.compare and Cursor.equals throw Exceptions for valid return values
WT-3240 Coverity reports
WT-3243 Reorder log slot release so joins don't wait on IO
WT-3244 Metadata operations failing in in-memory configurations when the cache is full
WT-98 Update the current cursor value without a search
Branch: master
https://github.com/mongodb/mongo/commit/f6cbdfb8c5c52209f58562ccbe14013c72df3f40

Comment by Githook User [ 27/Mar/17 ]

Author:

{u'username': u'keithbostic', u'name': u'Keith Bostic', u'email': u'keith.bostic@mongodb.com'}

Message: SERVER-28168 Cannot start or repair mongodb after unexpected shutdown. (#3353)

Panic if there's an error in reading/writing from/to the turtle file,
there's no point in continuing. This change avoids user confusion when
the turtle file is corrupted or zero'd out by the filesystem.
Branch: develop
https://github.com/wiredtiger/wiredtiger/commit/a5b3166ab7bcdb365b60686246b8e5624efeca84

Comment by Keith Bostic (Inactive) [ 07/Mar/17 ]

kurevo18, we've managed to reproduce this one locally, I believe the problem is the database has a zero-length WiredTiger.turtle file. Can you please confirm that for us by running ls -la on the dbpath?

As previously suggested, that's likely due to filesystem corruption.

Comment by Keith Bostic (Inactive) [ 06/Mar/17 ]

Hi kurevo18, I thought of one approach that might help: can you run your mongod command under strace, and upload that output, so we can figure out what file isn't being found when mongod starts?

To be clear, I think it's likely anonymous.user is correct and there's file system corruption, but we can debug this a little further if you'd like.

Comment by Kelsey Schubert [ 02/Mar/17 ]

Hi kurevo18,

This error suggests that the filesystem was corrupted when the VPS unexpectedly shutdown. Unfortunately, when MongoDB encounters corruption of this nature, there is little it can do to recover. In this case, I would recommend restoring from a backup if possible.

Kind regards,
Thomas

Generated at Thu Feb 08 04:17:20 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.