Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-37468

Problem with initial sync when adding an member in a rs in 3.2.21

    • Type: Icon: Bug Bug
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Replication
    • Labels:
      None
    • ALL

      Hi,

      Since i migrate an repliset in mongodb 3.2.21 i can't add an member because the initial sync not works.

      It's seems to be an bug corrected in 3.2.21. But it seems not.

      In logs in the member i try to add i see this messages : 

      2018-10-03T16:28:43.553+0200 I NETWORK [rsSync] SocketException: remote: (NONE):0 error: 9001 socket exception [RECV_TIMEOUT] server [192.168.50.22:27017]
      2018-10-03T16:28:43.553+0200 I NETWORK [rsSync] DBClientCursor::init call() failed
      2018-10-03T16:28:43.555+0200 E REPL [rsSync] 10276 DBClientBase::findN: transport error: sf06-b.essos.lan:27017 ns: local.oplog.rs query: \{ ts: { $gte: Timestamp 1538576857000|17 } }
      2018-10-03T16:28:43.555+0200 E REPL [rsSync] initial sync attempt failed, 9 attempts remaining
      2018-10-03T16:28:48.555+0200 I REPL [rsSync] initial sync pending
      2018-10-03T16:28:48.556+0200 I REPL [ReplicationExecutor] syncing from: sf06-b.essos.lan:27017
      2018-10-03T16:28:48.559+0200 I REPL [rsSync] initial sync ensuring correct oplog visibility. Starting at \{ ts: Timestamp 1538576928000|18, t: -1 }
      2018-10-03T16:29:18.559+0200 I NETWORK [rsSync] Socket recv() timeout 192.168.50.22:27017
      2018-10-03T16:29:18.559+0200 I NETWORK [rsSync] SocketException: remote: (NONE):0 error: 9001 socket exception [RECV_TIMEOUT] server [192.168.50.22:27017]
      2018-10-03T16:29:18.559+0200 I NETWORK [rsSync] DBClientCursor::init call() failed
      2018-10-03T16:29:18.560+0200 E REPL [rsSync] 10276 DBClientBase::findN: transport error: sf06-b.essos.lan:27017 ns: local.oplog.rs query: \{ ts: { $gte: Timestamp 1538576928000|18 } }
      2018-10-03T16:29:18.560+0200 E REPL [rsSync] initial sync attempt failed, 8 attempts remaining
      2018-10-03T16:29:23.561+0200 I REPL [rsSync] initial sync pending
      2018-10-03T16:29:23.561+0200 I REPL [ReplicationExecutor] syncing from: sf06-b.essos.lan:27017
      2018-10-03T16:29:23.566+0200 I REPL [rsSync] initial sync ensuring correct oplog visibility. Starting at \{ ts: Timestamp 1538576932000|7, t: -1 }
      2018-10-03T16:29:53.566+0200 I NETWORK [rsSync] Socket recv() timeout 192.168.50.22:27017
      2018-10-03T16:29:53.566+0200 I NETWORK [rsSync] SocketException: remote: (NONE):0 error: 9001 socket exception [RECV_TIMEOUT] server [192.168.50.22:27017]
      2018-10-03T16:29:53.566+0200 I NETWORK [rsSync] DBClientCursor::init call() failed
      2018-10-03T16:29:53.568+0200 E REPL [rsSync] 10276 DBClientBase::findN: transport error: sf06-b.essos.lan:27017 ns: local.oplog.rs query: \{ ts: { $gte: Timestamp 1538576932000|7 } }
      2018-10-03T16:29:53.568+0200 E REPL [rsSync] initial sync attempt failed, 7 attempts remaining
      2018-10-03T16:29:58.568+0200 I REPL [rsSync] initial sync pending
      2018-10-03T16:29:58.568+0200 I REPL [ReplicationExecutor] syncing from: sf06-b.essos.lan:27017
      2018-10-03T16:29:58.572+0200 I REPL [rsSync] initial sync ensuring correct oplog visibility. Starting at \{ ts: Timestamp 1538576998000|18, t: -1 }
      2018-10-03T16:30:00.586+0200 E REPL [rsSync] 11602 operation was interrupted
      2018-10-03T16:30:00.587+0200 E REPL [rsSync] initial sync attempt failed, 6 attempts remaining
      2018-10-03T16:30:02.420+0200 I NETWORK [conn14] end connection 192.168.50.22:57160 (19 connections now open)
      2018-10-03T16:30:02.601+0200 I ASIO [ReplicationExecutor] dropping unhealthy pooled connection to sf06-b.essos.lan:27017
      2018-10-03T16:30:02.601+0200 I ASIO [ReplicationExecutor] after drop, pool was empty, going to spawn some connections
      2018-10-03T16:30:02.602+0200 I ASIO [NetworkInterfaceASIO-Replication-0] Connecting to sf06-b.essos.lan:27017
      2018-10-03T16:30:02.603+0200 I ASIO [NetworkInterfaceASIO-Replication-0] Successfully connected to sf06-b.essos.lan:27017, took 2ms (1 connections now open to sf06-b.essos.lan:27017)
      2018-10-03T16:30:02.604+0200 I REPL [ReplicationExecutor] Member sf06-b.essos.lan:27017 is now in state SECONDARY
      2018-10-03T16:30:04.424+0200 I REPL [ReplicationExecutor] replSetElect voting yea for sf06-a.essos.lan:27017 (4)
      2018-10-03T16:30:04.603+0200 I REPL [ReplicationExecutor] Member sf06-a.essos.lan:27017 is now in state PRIMARY
      2018-10-03T16:30:05.587+0200 I REPL [rsSync] initial sync pending
      2018-10-03T16:30:05.587+0200 I REPL [ReplicationExecutor] syncing from: sf06-b.essos.lan:27017
      2018-10-03T16:30:05.592+0200 I REPL [rsSync] initial sync ensuring correct oplog visibility. Starting at \{ ts: Timestamp 1538576999000|19, t: -1 }
      2018-10-03T16:30:08.720+0200 I NETWORK [initandlisten] connection accepted from 192.168.50.22:57178 #21 (20 connections now open)
      2018-10-03T16:30:15.617+0200 I NETWORK [initandlisten] connection accepted from 192.168.50.23:47300 #22 (21 connections now open)
      ^C
      [root@sf06-c ~]# tail /var/log/mongodb/mongod.log -F
      2018-10-03T16:30:40.745+0200 I NETWORK [initandlisten] connection accepted from 192.168.50.185:45334 #24 (23 connections now open)
      2018-10-03T16:31:06.007+0200 I NETWORK [initandlisten] connection accepted from 127.0.0.1:43886 #25 (24 connections now open)
      2018-10-03T16:31:10.599+0200 I NETWORK [rsSync] Socket recv() timeout 192.168.50.22:27017
      2018-10-03T16:31:10.599+0200 I NETWORK [rsSync] SocketException: remote: (NONE):0 error: 9001 socket exception [RECV_TIMEOUT] server [192.168.50.22:27017]
      2018-10-03T16:31:10.599+0200 I NETWORK [rsSync] DBClientCursor::init call() failed
      2018-10-03T16:31:10.600+0200 E REPL [rsSync] 10276 DBClientBase::findN: transport error: sf06-b.essos.lan:27017 ns: local.oplog.rs query: \{ ts: { $gte: Timestamp 1538576999000|19 } }
      2018-10-03T16:31:10.600+0200 E REPL [rsSync] initial sync attempt failed, 4 attempts remaining
      2018-10-03T16:31:15.600+0200 I REPL [rsSync] initial sync pending
      2018-10-03T16:31:15.600+0200 I REPL [ReplicationExecutor] syncing from: sf06-a.essos.lan:27017 by request
      2018-10-03T16:31:15.604+0200 I REPL [rsSync] initial sync ensuring correct oplog visibility. Starting at \{ ts: Timestamp 1538577047000|79, t: -1 }
      2018-10-03T16:31:45.604+0200 I NETWORK [rsSync] Socket recv() timeout 192.168.50.23:27017
      2018-10-03T16:31:45.604+0200 I NETWORK [rsSync] SocketException: remote: (NONE):0 error: 9001 socket exception [RECV_TIMEOUT] server [192.168.50.23:27017]
      2018-10-03T16:31:45.604+0200 I NETWORK [rsSync] DBClientCursor::init call() failed
      2018-10-03T16:31:45.606+0200 E REPL [rsSync] 10276 DBClientBase::findN: transport error: sf06-a.essos.lan:27017 ns: local.oplog.rs query: \{ ts: { $gte: Timestamp 1538577047000|79 } }
      2018-10-03T16:31:45.606+0200 E REPL [rsSync] initial sync attempt failed, 3 attempts remaining
      2018-10-03T16:31:50.606+0200 I REPL [rsSync] initial sync pending
      2018-10-03T16:31:50.607+0200 I REPL [ReplicationExecutor] syncing from: sf06-b.essos.lan:27017
      2018-10-03T16:31:50.712+0200 I REPL [rsSync] initial sync ensuring correct oplog visibility. Starting at \{ ts: Timestamp 1538577110000|31, t: -1 }
      2018-10-03T16:32:20.712+0200 I NETWORK [rsSync] Socket recv() timeout 192.168.50.22:27017
      2018-10-03T16:32:20.712+0200 I NETWORK [rsSync] SocketException: remote: (NONE):0 error: 9001 socket exception [RECV_TIMEOUT] server [192.168.50.22:27017]
      

       

            Assignee:
            kelsey.schubert@mongodb.com Kelsey Schubert
            Reporter:
            glapouge Guillaume LAPOUGE
            Votes:
            2 Vote for this issue
            Watchers:
            12 Start watching this issue

              Created:
              Updated:
              Resolved: