[SERVER-3126] replset - common point problem Created: 20/May/11 Updated: 02/Aug/18 Resolved: 28/Oct/11 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 1.8.1 |
| Fix Version/s: | None |
| Type: | Task | Priority: | Critical - P2 |
| Reporter: | MartinS | Assignee: | Gregory McKeon (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
as in |
||
| Participants: |
| Description |
|
after changing oplog size in in config and restarting mongodb (in order: node no 1 - secondary, node no 2 - secondary, node no 3 - primary) Set name: testreplset Recent replset log activity: Fri May 20 14:15:09 [startReplSets] couldn't connect to localhost:27017: couldn't connect to server localhost:27017 |
| Comments |
| Comment by Kristina Chodorow (Inactive) [ 28/Oct/11 ] |
|
You can prevent rollback with w:"majority" in v2.0. |
| Comment by MartinS [ 26/May/11 ] |
|
I don't wont to recover my current data cause it's only test before drive mongo into production. But I want to not have to use rollback, I'd like mongo to stop working if 1/2 of nodes are stale (like when hosts are down). |
| Comment by Kristina Chodorow (Inactive) [ 24/May/11 ] |
|
I'll take a look at the logs. If you always use a write concern of w:<majority of servers>, you should never have rollbacks. |
| Comment by MartinS [ 24/May/11 ] |
|
logs are in One important information: I won't to get data back with rollback. That was only test data. And about primary host election with secondaries in incoplete state: And one configuration directive could be very useful: forcing to keep oplog if changes were not commited on all nodes. |
| Comment by Kristina Chodorow (Inactive) [ 20/May/11 ] |
|
You cannot, at the moment, change oplog size by just changing the config option. You have to drop the oplog and recreate it with a new size. The oplog size matters a great deal! > Primary question is: why node that cannot sync became a new primary afrter restart primary host. If you have logs from all three during the time in question, I can probably interpret them for you. > And then: Why host with only actual db try to rollback after restart. Basically, if you have a set with a primary & two secondaries and the primary is ahead, when it goes down, one of the secondaries will be elected primary even if it doesn't have all of the data the primary had. Then this new primary will start taking writes, so when the old primary comes back up, it will have data that it never applied to the secondaries, but it can't apply it automatically because it might conflict with writes that have happened while it was down. So it rolls back its conflicting data. See http://www.snailinaturtleneck.com/blog/2011/01/19/how-to-use-replica-set-rollbacks/ for an example. |
| Comment by MartinS [ 20/May/11 ] |
|
This is no matter how big this logs are. or: size no matters :] Primary question is: why node that cannot sync became a new primary afrter restart primary host. |
| Comment by Kristina Chodorow (Inactive) [ 20/May/11 ] |
|
Hmm, how big are your oplogs? Can you send the output of db.printReplicationInfo() from the primary and recoving nodes? |
| Comment by MartinS [ 20/May/11 ] |
|
this time was max 15 minutes, how to force not to delete oplog until all nodes are in the same state? |
| Comment by Kristina Chodorow (Inactive) [ 20/May/11 ] |
|
This: 14:15:12 [replica set sync] replSet rollback error RS101 reached beginning of local oplog means you're going to have to resync this server. The servers have been diverged for too long a time to reconcile automatically. |