[SERVER-30836] The storage.bson file may become empty after startup Created: 25/Aug/17 Updated: 30/Oct/23 Resolved: 15/Sep/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Storage |
| Affects Version/s: | 3.5.12 |
| Fix Version/s: | 3.6.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Jonathan Abrahams | Assignee: | Daniel Gottlieb (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Operating System: | ALL | ||||||||
| Sprint: | Storage 2017-10-02 | ||||||||
| Participants: | |||||||||
| Description |
|
During the powercycle testing the following fatal assertion occurred:
|
| Comments |
| Comment by Ramon Fernandez Marina [ 15/Sep/17 ] |
|
Author: {'username': u'dgottlieb', 'name': u'Daniel Gottlieb', 'email': u'daniel.gottlieb@mongodb.com'}Message: |
| Comment by Jonathan Abrahams [ 13/Sep/17 ] |
|
The provided patch does protect the storage,bson from becoming empty during system crashes. |
| Comment by Jonathan Abrahams [ 12/Sep/17 ] |
|
It seems this ticket has morphed, as the original issue of asserting when the storage.bson is empty is "works as designed". The secondary issue of the file becoming empty is the issue of interest and the ticket is being retitled. |
| Comment by Daniel Gottlieb (Inactive) [ 11/Sep/17 ] |
|
Thanks for the logs. It seems there are two issues going on that I want to discuss separately. 2) The issue I'm more concerned with is a mongod that can come up and after a process/node crash and the storage.bson file not being intact. From our conversation of when the crash occurs, the crash is only issued after the server starts up and a client can connect. It's my expectation that the storage.bson file is safely persisted on disk at this time. I've found that our creation of the storage.bson file, which is done via a rename, doesn't do all of the "tricks" POSIX renames do[1][2]. I've attached a patch that better conforms to what I've seen other implementations do. Can you try running the patched version by hand and see if the problem becomes less frequent (or hopefully just goes away)? [1]https://github.com/wiredtiger/wiredtiger/blob/95d911ab246e444192f34dc395652dba2653dd3c/src/os_posix/os_fs.c#L214-L223 |
| Comment by Jonathan Abrahams [ 08/Sep/17 ] |
|
Note, if the storage.bson file does not exist then mongod will create one. |
| Comment by Jonathan Abrahams [ 08/Sep/17 ] |
|
Steps to reproduce:
|
| Comment by Jonathan Abrahams [ 07/Sep/17 ] |
|
Attached tar file has data and logs. |
| Comment by Eric Milkie [ 07/Sep/17 ] |
|
jonathan.abrahams I'm tabling work on this until it reproduces again and we can get log and data files to help diagnose further. |
| Comment by Daniel Gottlieb (Inactive) [ 01/Sep/17 ] |
|
jonathan.abrahams Can you attach the entire dbpath after one of these failures? And the mongod log files as that seems to be in a separate path. |
| Comment by Ian Whalen (Inactive) [ 01/Sep/17 ] |
|
Assigning to Dan to investigate a bit further. |