[SERVER-27820] Improve storage engine logging at startup to include additional failure information Created: 26/Jan/17  Updated: 09/Jul/18  Resolved: 09/Jul/18

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: 3.2.10
Fix Version/s: None

Type: Improvement Priority: Minor - P4
Reporter: Nick Judson Assignee: Xiangyu Yao (Inactive)
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-35630 Missing or corrupt data files should ... Closed
Related
is related to SERVER-28168 Cannot start or repair mongodb after ... Closed
Sprint: Storage NYC 2018-07-02, Storage NYC 2018-07-16
Participants:

 Description   

I've encountered the following in on a test system running 3.2.10 on Windows:

2017-01-26T11:07:37.724-0500 I CONTROL  [main] ***** SERVER RESTARTED *****
2017-01-26T11:07:37.729-0500 I CONTROL  [main] Hotfix KB2731284 or later update is not installed, will zero-out data files
2017-01-26T11:07:37.729-0500 I CONTROL  [initandlisten] MongoDB starting : pid=7856 port=27200 dbpath=C:\ProgramData\Connexion.Integrator\Mongo 64-bit host=D0077-IMP-UMY
2017-01-26T11:07:37.729-0500 I CONTROL  [initandlisten] targetMinOS: Windows Vista/Windows Server 2008
2017-01-26T11:07:37.729-0500 I CONTROL  [initandlisten] db version v3.2.10
2017-01-26T11:07:37.729-0500 I CONTROL  [initandlisten] git version: 79d9b3ab5ce20f51c272b4411202710a082d0317
2017-01-26T11:07:37.729-0500 I CONTROL  [initandlisten] allocator: tcmalloc
2017-01-26T11:07:37.729-0500 I CONTROL  [initandlisten] modules: none
2017-01-26T11:07:37.729-0500 I CONTROL  [initandlisten] build environment:
2017-01-26T11:07:37.729-0500 I CONTROL  [initandlisten]     distarch: x86_64
2017-01-26T11:07:37.729-0500 I CONTROL  [initandlisten]     target_arch: x86_64
2017-01-26T11:07:37.729-0500 I CONTROL  [initandlisten] options: { config: "C:\ProgramData\Connexion.Integrator\Conf\mongo.config", net: { bindIp: "127.0.0.1", port: 27200 }, storage: { dbPath: "C:\ProgramData\Connexion.Integrator\Mongo", engine: "wiredTiger", journal: { enabled: true }, wiredTiger: { collectionConfig: { blockCompressor: "zlib" }, engineConfig: { cacheSizeGB: 1, journalCompressor: "zlib" } } }, systemLog: { destination: "file", logAppend: true, path: "C:\ProgramData\Connexion.Integrator\Logs\mongo.log", quiet: true, timeStampFormat: "iso8601-local" } }
2017-01-26T11:07:37.729-0500 W -        [initandlisten] Detected unclean shutdown - C:\ProgramData\Connexion.Integrator\Mongo\mongod.lock is not empty.
2017-01-26T11:07:37.730-0500 W STORAGE  [initandlisten] Recovering data from the last clean checkpoint.
2017-01-26T11:07:37.730-0500 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=1G,session_max=20000,eviction=(threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=zlib),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),
2017-01-26T11:07:37.731-0500 I -        [initandlisten] Assertion: 28595:2: No such file or directory
2017-01-26T11:07:37.731-0500 I STORAGE  [initandlisten] exception in initAndListen: 28595 2: No such file or directory, terminating
2017-01-26T11:07:37.731-0500 I CONTROL  [initandlisten] dbexit:  rc: 100

All the folders listed exist and are writeable, and none are locked. Clearly there has been an "unclean shutdown" which has caused some corruption - but it's difficult to know what the proper action would be in this case:

Restore from backup, clear mongo.lock etc.

If possible, additional logging information here would be appreciated.



 Comments   
Comment by Xiangyu Yao (Inactive) [ 09/Jul/18 ]

I am closing this ticket as a dup of SERVER-35630.

Comment by Alexander Gorrod [ 28/Jun/18 ]

I believe what is being asked for here is a log message indicating a way forward from the error - which needs to be owned by MongoDB. i.e: Should the user then attempt to start with --repair, or remove the data and re-sync from another node in a replica set, or something else. Maybe ideally there could be a documentation page describing what to do in case of on-disk corruption which we could link to in a message before exiting? Maybe a generalized version of this page https://docs.mongodb.com/manual/tutorial/recover-data-following-unexpected-shutdown/

Comment by Eric Milkie [ 18/Apr/18 ]

I believe the work for this will involve enhancing the WiredTiger code for situations where wiredtiger_open() can return ENOENT, as it did in the example in the Description. We should log an informational message with context at those points prior to returning ENOENT.

Comment by Alexander Gorrod [ 07/Mar/17 ]

Thanks Nick - happily it looks like we will be able to improve the error message, since we now understand better where it originates from.

Comment by Nick Judson [ 07/Mar/17 ]

@Alexander - Sorry - it's long gone.

Comment by Alexander Gorrod [ 07/Mar/17 ]

nick@innsenroute.com Another user has encountered a similar symptom - see SERVER-28168, the reason for the error message was an empty WiredTiger.turtle file. It would be useful if you can upload the contents of the database directory where you saw the failure - if you still have it available - so that we can confirm that the failures are due to the same cause.

Comment by Nick Judson [ 21/Feb/17 ]

It just took me a while to digest the error message and I figured that if I found it confusing, other people probably will too. I think the big thing was the filename of '2'. Maybe a fully qualified path or putting it in quotes? Generally I think verbosity can be a good thing for major errors (especially those that stop the application from running)...

Comment by Alexander Gorrod [ 21/Feb/17 ]

nickj Thanks for the report. The error message reported is generally more useful than is shown here. The database is attempting to open a file called 2 that doesn't exist, which makes the error messages confusing. It isn't a failure mode I've seen before, and MongoDB doesn't generally create files with numerical names - is that something we need to investigate, or are you only asking about the verbosity of the error message?

Comment by Nick Judson [ 26/Jan/17 ]

I'm of mixed opinion here actually. On the one hand it does state that mongo.lock isn't empty, and perhaps that is enough information. What might be nice to see is something like this:

...
recovering from...
recovery failed, X bytes/documents of information unrecoverable. Recover from backup or clear lock file.
exception: file 'X' not found.
...

Generated at Thu Feb 08 04:16:20 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.