[SERVER-493] replica pairs keep growing Created: 17/Dec/09 Updated: 12/Jul/16 Resolved: 19/Jan/10 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 1.2.1, 1.3.0 |
| Fix Version/s: | 1.3.1 |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | Kristina Chodorow (Inactive) | Assignee: | Aaron Staple |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||
| Issue Links: |
|
||||
| Participants: | |||||
| Description |
|
I'm not inserting anything and running out of space on my 32-bit machine. I'm testing out replica pair failover with a driver. Every couple seconds I kill the master db server (call it A), wait for the other db (B) to become master, and restart A. Then I kill B, wait a couple seconds, restart B, repeat. After ~ 20 minutes of this, when I try to restart one of them, I get: mmap() failed for /home/k/dbs/data2/local.7 len:536870912 errno:12 data2 contains: I'm not doing anything other than querying and restarting the server, so it seems weird it's taking up 2GB so quickly, even if it is logging stuff. |
| Comments |
| Comment by Aaron Staple [ 19/Jan/10 ] |
|
Should be fixed now |
| Comment by auto [ 19/Jan/10 ] |
|
Author: {'name': 'Aaron', 'email': 'aaron@10gen.com'}Message: |
| Comment by auto [ 19/Jan/10 ] |
|
Author: {'name': 'Aaron', 'email': 'aaron@10gen.com'}Message: |
| Comment by Aaron Staple [ 19/Jan/10 ] |
|
Sorry, I was running tests in environment that didn't produce this issue. The problem results from a conflict between the new "openAllFiles" behavior and the preallocation scheme which assumes that a preallocated file won't be opened until we need to allocate new data in it. I'll work on a fix. |
| Comment by Kristina Chodorow (Inactive) [ 19/Jan/10 ] |
|
The db directories were empty when I started (except for the arbiter's). I didn't have a script, I ran: ./mongod --pairwith localhost:27018 --arbiter localhost:27019 --dbpath ~/dbs/data1 Then I killed off the master, restarted it. Killed off the other, restarted it. Repeated for a while. |
| Comment by Aaron Staple [ 19/Jan/10 ] |
|
Pls go ahead and reassign to me once you send the info, thx. |
| Comment by Aaron Staple [ 19/Jan/10 ] |
|
I haven't been able to reproduce this behavior using the instructions. Kristina, could you provide the script you used? Also, from the logs it looks like the databases weren't started on fresh db paths. Do you know what was there when your test began? |
| Comment by Eliot Horowitz (Inactive) [ 16/Jan/10 ] |
|
Can use the process info stuff to validate vsize - mapped on linux |
| Comment by Eliot Horowitz (Inactive) [ 16/Jan/10 ] |
|
Have heard similarish things from 3 people - so we need to make sure we're ok. |
| Comment by Eliot Horowitz (Inactive) [ 13/Jan/10 ] |
|
had various reports of this on the 1.2 branch aaron: can you do some testing and figure out if there is an issue? |
| Comment by Aaron Staple [ 05/Jan/10 ] |
|
Happy to look at it, but I don't know what recent changes you're referring to. I made some changes earlier today, but they wouldn't have caused a bug on Dec 17. Before today it looks like I haven't made any capped changes since September. |
| Comment by Mathias Stearn [ 05/Jan/10 ] |
|
@Aaron: I think this is related to your recent capped collection changes. It appears to keep trying to create new oplogs on the new master and not using old space. I can confirm that this is a regression from 1.2. Reassign this to me if it isn't related to your changes. |
| Comment by Kristina Chodorow (Inactive) [ 18/Dec/09 ] |
|
Full logs run with -v. 1.log was the one that eventually failed to start, you can see that part on the bottom. |
| Comment by Eliot Horowitz (Inactive) [ 18/Dec/09 ] |
|
can you run with -v |