Loading...

XML

Word

Printable

JSON

Type: Question
Resolution: Done
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: Replication
Labels:
None

Assigned Teams:

Server Triage

I got one mongo v2.4.10 server with 1.7TB data, I am trying to migrate and upgrade the mongo to mongo v.3.0.15 server

I've setup a new mongo v.3.0.15 and configured replication for v3.0.15 to be secondary to sync with v.2.4.10 primary mongo.

The secondary was in STARTUP2 and the sync was almost finish as I can check with the growth of my storage device for the new machine which running mongo v.3.0.15

However there were some socket exceptions which caused both of my machine to resyn again from the start, just to ask anything I can configure or setup to prevent the error to happen again because I don't want to waste another 7 days to fail to sync up 1.7TB again.

Below are some logs from my mongo:

Primary mongo (v2.4.10):

```
Wed Jul 3 10:03:59.196 [conn21] SocketException handling request, closing client connection: 9001 socket exception [SEND_ERROR] server [101.0.0.182:32829]
```

Secondary mongo (v.3.0.15)
```

...
2019-07-03T09:54:29.169+0800 I NETWORK [ReplExecNetThread-0] Socket recv() timeout 192.168.168.122:27017
2019-07-03T09:54:29.169+0800 I NETWORK [ReplExecNetThread-0] SocketException: remote: 192.168.168.122:27017 error: 9001 socket exception [RECV_TIMEOUT] server [192.168.168.122:27017]
2019-07-03T09:54:29.169+0800 I NETWORK [ReplExecNetThread-0] DBClientCursor::init call() failed
2019-07-03T09:54:29.169+0800 I REPL [ReplicationExecutor] Error in heartbeat request to 192.168.168.122:27017; Location10276 DBClientBase::findN: transport error: 192.168.168.122:27017 ns: admin.$cmd query: { replSetHeartbeat: "ArchiverReplica", pv: 1, v: 1, from: "x.x.x.x:27017", fromId: 1, checkEmpty: false }

2019-07-03T09:54:29.170+0800 W NETWORK [ReplExecNetThread-0] Failed to connect to 192.168.168.122:27017 after 1 milliseconds, giving up.
2019-07-03T09:54:29.170+0800 I REPL [ReplicationExecutor] Error in heartbeat request to 192.168.168.122:27017; Location18915 Failed attempt to connect to 192.168.168.122:27017; couldn't connect to server 192.168.168.122:27017 (192.168.168.122), connection attempt failed
...
2019-07-03T10:07:41.452+0800 W NETWORK [ReplExecNetThread-0] Failed to connect to 192.168.168.122:27017 after 4995 milliseconds, giving up.
2019-07-03T10:07:41.452+0800 I REPL [ReplicationExecutor] Error in heartbeat request to 192.168.168.122:27017; Location18915 Failed attempt to connect to 192.168.168.122:27017; couldn't connect to server 192.168.168.122:27017 (192.168.168.122), connection attempt failed
2019-07-03T10:07:43.602+0800 I REPL [ReplicationExecutor] Member 192.168.168.122:27017 is now in state PRIMARY
...
2019-07-03T10:08:03.845+0800 I NETWORK [rsSync] Socket recv() errno:104 Connection reset by peer 192.168.168.122:27017
2019-07-03T10:08:03.845+0800 I NETWORK [rsSync] SocketException: remote: 192.168.168.122:27017 error: 9001 socket exception [RECV_ERROR] server [192.168.168.122:27017]
2019-07-03T10:08:03.853+0800 I NETWORK [rsSync] trying reconnect to 192.168.168.122:27017 (192.168.168.122) failed
2019-07-03T10:08:03.928+0800 I NETWORK [rsSync] reconnect 192.168.168.122:27017 (192.168.168.122) ok
2019-07-03T10:08:03.939+0800 E REPL [rsSync] 16465 recv failed while exhausting cursor
2019-07-03T10:08:03.939+0800 E REPL [rsSync] initial sync attempt failed, 9 attempts remaining
2019-07-03T10:08:08.939+0800 I REPL [rsSync] initial sync pending
2019-07-03T10:08:08.958+0800 I REPL [ReplicationExecutor] syncing from: 192.168.168.122:27017
2019-07-03T10:08:09.204+0800 I REPL [rsSync] initial sync drop all databases
2019-07-03T10:08:09.205+0800 I STORAGE [rsSync] dropAllDatabasesExceptLocal 3
2019-07-03T10:08:09.221+0800 I JOURNAL [rsSync] journalCleanup...
2019-07-03T10:08:09.221+0800 I JOURNAL [rsSync] removeJournalFiles
2019-07-03T10:08:09.895+0800 I JOURNAL [rsSync] journalCleanup...
2019-07-03T10:08:09.895+0800 I JOURNAL [rsSync] removeJournalFiles
...
resyn from the begining .......

```

Assignee:: [HELP ONLY] Backlog - Triage Team

Reporter:: Aaron Tai Wei Han

Participants:: [HELP ONLY] Backlog - Triage Team, Aaron Tai Wei Han, Eric Sedor

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: Jul 03 2019 06:35:58 AM UTC

Updated:: Dec 06 2022 02:54:57 AM UTC

Resolved:: Jul 03 2019 06:10:42 PM UTC

Details

Description

Attachments

Activity

People

Dates