[SERVER-228] make sure when running out of space in 32-bit mode, corruption doesn't occur Created: 11/Aug/09 Updated: 14/Apr/16 Resolved: 17/Nov/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Stability |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Minor - P4 |
| Reporter: | Eliot Horowitz (Inactive) | Assignee: | Unassigned |
| Resolution: | Won't Fix | Votes: | 4 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Participants: | |||||||||
| Comments |
| Comment by Eliot Horowitz (Inactive) [ 17/Nov/15 ] |
|
we do not support 32-bit anymore, so this is not relevant |
| Comment by auto [ 04/Feb/10 ] |
|
Author: {'login': 'astaple', 'name': 'Aaron', 'email': 'aaron@10gen.com'}Message: |
| Comment by Aaron Staple [ 11/Jan/10 ] |
|
I went ahead and pushed the cleaner response to running out of disk space (assert rather than simply exit) since that seems like an improvement regardless. |
| Comment by auto [ 11/Jan/10 ] |
|
Author: {'name': 'Aaron', 'email': 'aaron@10gen.com'}Message: |
| Comment by Eliot Horowitz (Inactive) [ 11/Jan/10 ] |
|
Ok. Def don't want transaction log right now so deferring |
| Comment by Aaron Staple [ 11/Jan/10 ] |
|
Not trying to beat a dead horse, but here's what I was thinking with the transaction log: If the transaction log is supposed to prevent inconsistent data when the db stops running suddenly, then it should prevent partial execution of commands. One way to do this would be to write only to the transaction log while running a command, then apply writes from the log in order to commit the command's changes. We would allocate and mmap every disk pointer in the transaction log before attempting to write any of the log in order to prevent out of memory / out of space problems. (Any such error and we'd simply not write the transaction log operations to disk.) Might be some issues with the size of the transaction log - I don't know if that is going to be limited somehow. It seems like the only alternative to this approach for dealing with partial commands resulting when the db stops suddenly is to implement command rollback / completion for a fresh db instance – the next time the db starts, it will clean up or complete an interrupted command. Since the new db instance (currently) lacks information on the context in which the command was originally run, this might require a fair amount of work to implement. If we do plan to go this route though, then implementing rollback on mmap / disk allocation failure would be a place to start. The first step in any case should be to make running out of disk space no worse than an mmap failure - should just assert through the same call hierarchy. I'll start by implementing that. |
| Comment by Eliot Horowitz (Inactive) [ 05/Jan/10 ] |
|
Maybe - but I don't think that's the right solution to this problem. |
| Comment by Aaron Staple [ 05/Jan/10 ] |
|
So right now running out of disk space is even worse - pre allocation threads just exit the process once they run out of space, and this can happen at any time regardless of what the db is doing, just like an unfriendly process kill. I remember at one point we were considering implementing a transaction log in order to fix up corruption caused by a dying db. Is that still planned? |
| Comment by Eliot Horowitz (Inactive) [ 05/Jan/10 ] |
|
How do we handle running out of disk space? If so - we should try and figure out all the cases, and at the very least enumerate them all and figure out how hard it is to fix. |
| Comment by Aaron Staple [ 05/Jan/10 ] |
|
So what is meant by corruption here? Theoretically, any command that saves some data to a non capped collection (and there are plenty of these) could trigger an mmap exception and leave the db in an inconsistent state as a result of the command being partially completed. It seems reasonable to handle certain cases - for example, rolling back creation of a namespace if file allocation fails while we're part way through allocating extents, but I don't know if we want to guard every programmatic write. |
| Comment by Dwight Merriman [ 05/Oct/09 ] |
|
a problem with create index failing has been taking care of. however, there may be other issues (unclear) still pending. |