[SERVER-1835] problem with replication Created: 23/Sep/10  Updated: 30/Mar/12  Resolved: 05/Oct/11

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 1.6.2, 1.7.0
Fix Version/s: None

Type: Bug Priority: Blocker - P1
Reporter: Boris Kashinski Assignee: Kristina Chodorow (Inactive)
Resolution: Won't Fix Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Windows XP OS


Attachments: Text File log.txt     GIF File schema.gif     GIF File turn-off.gif    
Operating System: Windows
Participants:

 Description   

We have a some problem when use MongoDB replication. As the sample: we run master and slave and when turn off the net between
them, slave stop listening the master. When we restart slave server- then all OK. But it's not good decision.
Problem in more detail.
Slave usually starts replicating after a period of time.But after a
network outage between master and slave, the slave doesn't starts
replicating after a period of time. And when we turn on the network
between master and slave, the slave doesn't starts replication. We
look at the collection local->host and find the ip of master there.
But replication doesn't start automatically.



 Comments   
Comment by Kristina Chodorow (Inactive) [ 31/Aug/11 ]

Huh?

Comment by Boris Kashinski [ 31/Aug/11 ]

2011/8/30 Kristina Chodorow (JIRA) <jira@mongodb.org>

you!

Comment by Kristina Chodorow (Inactive) [ 30/Aug/11 ]

Not sure if you're still around. If so, the first thing I'd recommend is upgrading. There was an issue with reconnection attempts getting into an infinite loop that was fixed in 1.8. If that doesn't help, can you send the logs from A, B, and C during the experiment (while using 1.8)?

Comment by Boris Kashinski [ 08/Oct/10 ]

Can somebody say something about this situation?

Comment by Boris Kashinski [ 04/Oct/10 ]

ok. So,given servers:A,B,C.Where A,B-masters, C-slave. We pulling a physical wire from B. And C doesn't listen A and B computers. Only when we connect B replication started.

Comment by Eliot Horowitz (Inactive) [ 01/Oct/10 ]

I didn't quite follow.

So, given servers A B C

A B to start
B slaves A

then C??

Comment by Boris Kashinski [ 01/Oct/10 ]

Big thanks.We tested this situation on two computers with Linux. We set tcp_keepalive timeout. And replication started. But there are one problem:when we connect third computers and pulling a physical wire from second. Slave doesn't listen second ant third computers. Only when we connect second computer replication started.

Comment by Eliot Horowitz (Inactive) [ 30/Sep/10 ]

on unix you can set the tcp_keepalive timeout.
can you configure that on windows.
the default is usually 2 hours.
if you set that to 2 mintes it probably will fix it - but i don't know how on windows

Comment by Boris Kashinski [ 30/Sep/10 ]

We pulling a physical wire and get the same result

Comment by Eliot Horowitz (Inactive) [ 29/Sep/10 ]

Right.

What I think is happening is that there is a difference between software and hardware problems.
Can you try pulling a physical wire?

Comment by Boris Kashinski [ 29/Sep/10 ]

drunken russian electrician cut cable by ax or tired cleaning woman hurt ethernet connector

Comment by Eliot Horowitz (Inactive) [ 29/Sep/10 ]

What do you mean if the connection crashes?
If the master server crashed - you should very different behavior.

Comment by Boris Kashinski [ 29/Sep/10 ]

Sorry, but in practice this problem really exist. If connection is crashed- the replication on slave doesn't work.And really we will not know about this on slave. We will use the old data. As I think, in one of the block try/cache of MongoDB code must be restart of slave.

Comment by Eliot Horowitz (Inactive) [ 29/Sep/10 ]

In practice - this problem may not really exist.
If one of the remote servers crash, things will work as you expect.
Just the way you are testing this messes with tcp

Comment by Boris Kashinski [ 29/Sep/10 ]

If I understand correctly, you offer run many process of master/slave, but really,as for me, is not a good decision. We can do it with other db, but it's very not practical, not nice. We see the scheme http://www.mongodb.org/display/DOCS/One+Slave+Two+Masters -and it's very good for us,because we need in save all changes on the many masters in one central db. Maybe, there is a reason to take a look at source code of mongodb? If we could find the place, where the slave interview the masters, we maybe could find the problem and resolve it.

Comment by Eliot Horowitz (Inactive) [ 28/Sep/10 ]

No - not in the same way you can with master/slave.
You could run 1 process per master on the slave machine though.
Is that an option?

Comment by Boris Kashinski [ 28/Sep/10 ]

Can we realize this scheme with replica sets?

Comment by Boris Kashinski [ 28/Sep/10 ]

scheme of replication what we need-see schema.gif. We use scheme one slave many masters like this http://www.mongodb.org/display/DOCS/One+Slave+Two+Masters

Comment by Eliot Horowitz (Inactive) [ 27/Sep/10 ]

Is this master/slave or replica sets?
Replica sets might handle this a lot cleaner since they use a heartbeat

Comment by Boris Kashinski [ 27/Sep/10 ]

I stop the network connection by the click on network connections and select turn-off. (see file turn-off.gif)
No more than 1 minute

Comment by Eliot Horowitz (Inactive) [ 24/Sep/10 ]

How do you stop the network connection?
How long did you let it sit?

Comment by Boris Kashinski [ 24/Sep/10 ]

log of master and slave

Comment by Eliot Horowitz (Inactive) [ 23/Sep/10 ]

I'm completely lost...

Can you send the slaves logs annotated at the correct times with what happened at the system level

Generated at Thu Feb 08 02:58:11 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.