[SERVER-228] make sure when running out of space in 32-bit mode, corruption doesn't occur Created: 11/Aug/09  Updated: 14/Apr/16  Resolved: 17/Nov/15

Status: Closed
Project: Core Server
Component/s: Stability
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Minor - P4
Reporter: Eliot Horowitz (Inactive) Assignee: Unassigned
Resolution: Won't Fix Votes: 4
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-3625 Data can be lost if there is no free ... Closed
Backwards Compatibility: Fully Compatible
Participants:

 Comments   
Comment by Eliot Horowitz (Inactive) [ 17/Nov/15 ]

we do not support 32-bit anymore, so this is not relevant

Comment by auto [ 04/Feb/10 ]

Author:

{'login': 'astaple', 'name': 'Aaron', 'email': 'aaron@10gen.com'}

Message: SERVER-228 update test t now that we don't exit when disk is full
http://github.com/mongodb/mongo/commit/e5a49e306fd7bf199493ae469d3bb3636ab79cf3

Comment by Aaron Staple [ 11/Jan/10 ]

I went ahead and pushed the cleaner response to running out of disk space (assert rather than simply exit) since that seems like an improvement regardless.

Comment by auto [ 11/Jan/10 ]

Author:

{'name': 'Aaron', 'email': 'aaron@10gen.com'}

Message: SERVER-228 assert rather than exit when run out of disk space
http://github.com/mongodb/mongo/commit/d377e7ad992b5fe515525da68e4ef44349818c7b

Comment by Eliot Horowitz (Inactive) [ 11/Jan/10 ]

Ok. Def don't want transaction log right now so deferring

Comment by Aaron Staple [ 11/Jan/10 ]

Not trying to beat a dead horse, but here's what I was thinking with the transaction log:

If the transaction log is supposed to prevent inconsistent data when the db stops running suddenly, then it should prevent partial execution of commands. One way to do this would be to write only to the transaction log while running a command, then apply writes from the log in order to commit the command's changes. We would allocate and mmap every disk pointer in the transaction log before attempting to write any of the log in order to prevent out of memory / out of space problems. (Any such error and we'd simply not write the transaction log operations to disk.) Might be some issues with the size of the transaction log - I don't know if that is going to be limited somehow.

It seems like the only alternative to this approach for dealing with partial commands resulting when the db stops suddenly is to implement command rollback / completion for a fresh db instance – the next time the db starts, it will clean up or complete an interrupted command. Since the new db instance (currently) lacks information on the context in which the command was originally run, this might require a fair amount of work to implement. If we do plan to go this route though, then implementing rollback on mmap / disk allocation failure would be a place to start.

The first step in any case should be to make running out of disk space no worse than an mmap failure - should just assert through the same call hierarchy. I'll start by implementing that.

Comment by Eliot Horowitz (Inactive) [ 05/Jan/10 ]

Maybe - but I don't think that's the right solution to this problem.
I think we should handle for real.
Alos, even with a transaction log, could be bad. You'll come back up ok, but then just keep crashing over and over.

Comment by Aaron Staple [ 05/Jan/10 ]

So right now running out of disk space is even worse - pre allocation threads just exit the process once they run out of space, and this can happen at any time regardless of what the db is doing, just like an unfriendly process kill.

I remember at one point we were considering implementing a transaction log in order to fix up corruption caused by a dying db. Is that still planned?

Comment by Eliot Horowitz (Inactive) [ 05/Jan/10 ]

How do we handle running out of disk space?
Should be fairly similar, no?
if we have problems with that, then we have problems with this unless we cheat via pre-alloc, but i think it ends up at the same place.

If so - we should try and figure out all the cases, and at the very least enumerate them all and figure out how hard it is to fix.

Comment by Aaron Staple [ 05/Jan/10 ]

So what is meant by corruption here? Theoretically, any command that saves some data to a non capped collection (and there are plenty of these) could trigger an mmap exception and leave the db in an inconsistent state as a result of the command being partially completed.

It seems reasonable to handle certain cases - for example, rolling back creation of a namespace if file allocation fails while we're part way through allocating extents, but I don't know if we want to guard every programmatic write.

Comment by Dwight Merriman [ 05/Oct/09 ]

a problem with create index failing has been taking care of.

however, there may be other issues (unclear) still pending.

Generated at Thu Feb 08 02:53:27 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.