[SERVER-9199] Initial replication sync failed Created: 01/Apr/13  Updated: 10/Dec/14  Resolved: 12/Apr/13

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 2.4.1
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Marat Shakirov Assignee: J Rassi
Resolution: Done Votes: 0
Labels: azure, initialSync, replicaset
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Primary ubuntu 10.04/ mongo 2.4.1 id0
Slave1 ubuntu 10.04/ mongo 2.4.1 id1
Slave2 ubuntu 12.04/ mongo 2.4.1 id2


Attachments: File azure.pcap     Zip Archive mongod.zip     File mongodb.log    
Operating System: Linux
Steps To Reproduce:

Start new replica and after 5-6 hours replica going to Assertion Failure.

Mon Apr  1 06:10:12.909 [rsSync]   Fatal Assertion 16233
0xdc7f71 0xd87cf3 0xbff47f 0xc1b101 0xc1bf1d 0xc1c1ec 0xe10879 0x7f569eeb1e9a 0x7f569e1c4cbd 
 /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xdc7f71]
 /usr/bin/mongod(_ZN5mongo13fassertFailedEi+0xa3) [0xd87cf3]
 /usr/bin/mongod(_ZN5mongo11ReplSetImpl17syncDoInitialSyncEv+0x6f) [0xbff47f]
 /usr/bin/mongod(_ZN5mongo11ReplSetImpl11_syncThreadEv+0x71) [0xc1b101]
 /usr/bin/mongod(_ZN5mongo11ReplSetImpl10syncThreadEv+0x2d) [0xc1bf1d]
 /usr/bin/mongod(_ZN5mongo15startSyncThreadEv+0x6c) [0xc1c1ec]
 /usr/bin/mongod() [0xe10879]
 /lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x7f569eeb1e9a]
 /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f569e1c4cbd]
Mon Apr  1 06:10:12.919 [rsSync] 
 
***aborting after fassert() failure
 
 
Mon Apr  1 06:10:12.919 Got signal: 6 (Aborted).
 
Mon Apr  1 06:10:12.923 Backtrace:
0xdc7f71 0x6ce459 0x7f569e1074a0 0x7f569e107425 0x7f569e10ab8b 0xd87d2e 0xbff47f 0xc1b101 0xc1bf1d 0xc1c1ec 0xe10879 0x7f569eeb1e9a 0x7f569e1c4cbd 
 /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xdc7f71]
 /usr/bin/mongod(_ZN5mongo10abruptQuitEi+0x399) [0x6ce459]
 /lib/x86_64-linux-gnu/libc.so.6(+0x364a0) [0x7f569e1074a0]
 /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35) [0x7f569e107425]
 /lib/x86_64-linux-gnu/libc.so.6(abort+0x17b) [0x7f569e10ab8b]
 /usr/bin/mongod(_ZN5mongo13fassertFailedEi+0xde) [0xd87d2e]
 /usr/bin/mongod(_ZN5mongo11ReplSetImpl17syncDoInitialSyncEv+0x6f) [0xbff47f]
 /usr/bin/mongod(_ZN5mongo11ReplSetImpl11_syncThreadEv+0x71) [0xc1b101]
 /usr/bin/mongod(_ZN5mongo11ReplSetImpl10syncThreadEv+0x2d) [0xc1bf1d]
 /usr/bin/mongod(_ZN5mongo15startSyncThreadEv+0x6c) [0xc1c1ec]
 /usr/bin/mongod() [0xe10879]
 /lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x7f569eeb1e9a]
 /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f569e1c4cbd]

Participants:

 Description   

rs.conf={
"_id" : "prod-10",
"version" : 16,
"members" : [

{ "_id" : 0, "host" : "prod-media-s02.10.com:27017" }

,

{ "_id" : 1, "host" : "prod-webapp-s03.10.com:27017" }

,

{ "_id" : 2, "host" : "repl-10:27017", "priority" : 0, "slaveDelay" : 1800 }

]
}

id0 and id1 in the same hosting (hetzner). network between id0 and id1 1Gb link 0.2 ms

id2 deployed on azure. network between hetzner<>azure ~100MB link 30ms ping



 Comments   
Comment by Marat Shakirov [ 13/Apr/13 ]

Thank you!
Both machines uses ntp and same timezone

Comment by J Rassi [ 12/Apr/13 ]

I believe that adjusting the keepalive values was what fixed your issue (I suggest you consult Azure support for recommendations on what values to finalize on); the evidence available is consistent with the theory that a device in Azure was severing your connections. As such, I will resolve this ticket.

Here are some observations I made from the log you provided:

  • A first initial sync attempt was made, which ran for 31 hours and failed on a socket timeout (which leads me to suspect you may continue to encounter transient network failures). The excessive length of time was on account of an error that occurred after the initial data copy, which triggered the conservative measure of applying's a week's worth of oplog entries (when only a few hours' worth was necessary). Check that the system clocks for all three machines are in sync; consider using NTP for clock synchronization, if you aren't already.
  • The second initial sync attempt took 7 hours and was successful.
Comment by Marat Shakirov [ 12/Apr/13 ]

from repl-10

Comment by Marat Shakirov [ 12/Apr/13 ]

My replica now in STATE SECONDARY. Replica just slow replicate all base.
rs.conf()
{
"_id" : "prod-10",
"version" : 18,
"members" : [

{ "_id" : 0, "host" : "prod-media-s02.10:27017" }

,

{ "_id" : 1, "host" : "prod-webapp-s03.10:27017" }

,

{ "_id" : 2, "host" : "repl-10:27017" }

]
}
{
"set" : "prod-10",
"date" : ISODate("2013-04-12T16:43:24Z"),
"myState" : 2,
"syncingTo" : "prod-media-s02.10:27017",
"members" : [
{
"_id" : 0,
"name" : "prod-media-s02.10:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 228526,
"optime" :

{ "t" : 1365785002, "i" : 449 }

,
"optimeDate" : ISODate("2013-04-12T16:43:22Z"),
"lastHeartbeat" : ISODate("2013-04-12T16:43:22Z"),
"lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00Z"),
"pingMs" : 31
},
{
"_id" : 1,
"name" : "prod-webapp-s03.10:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 229177,
"optime" :

{ "t" : 1365785003, "i" : 277 }

,
"optimeDate" : ISODate("2013-04-12T16:43:23Z"),
"lastHeartbeat" : ISODate("2013-04-12T16:43:23Z"),
"lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00Z"),
"pingMs" : 31
},
{
"_id" : 2,
"name" : "repl-10:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 298833,
"optime" :

{ "t" : 1365785004, "i" : 117 }

,
"optimeDate" : ISODate("2013-04-12T16:43:24Z"),
"self" : true
}
],
"ok" : 1
}

Comment by Noah Davis [ 11/Apr/13 ]

Thank you Jason, I've added the issue here - https://jira.mongodb.org/browse/SERVER-9330

Comment by J Rassi [ 11/Apr/13 ]

Noah, you are not encountering the same issue. Could you create a new SERVER ticket with title "Error 10092 during initial sync" and attach to the ticket the full mongod log file from the node that failed?

Comment by Noah Davis [ 11/Apr/13 ]

Our replica also crashed last night 5-6 hours into its initial sync. We're on mongo 2.4.1. Here's the stacktrace from the replica:

https://gist.github.com/noahd1/64fded6a77a5334a3351

Could you tell me if this is likely the same issue? If so, can I provide any additional information to help with the debugging?

Thank you.

Comment by J Rassi [ 10/Apr/13 ]

That's not expected – if it's in STARTUP, it likely hasn't even started performing the initial sync.

For prod-media-s02 and repl-10, could you:

  • Paste the output of running rs.conf() and rs.status(), on both machines
  • Upload log files for mongod (that cover the time period since you last started repl-10), for both machines
Comment by Marat Shakirov [ 10/Apr/13 ]

Yes, I decrease values already and start replication. But my replica in STARTUP state 2+ days. very slow.
No faults in logs.

Comment by J Rassi [ 10/Apr/13 ]

Just checking in – were you able to decrease repl-10's keepalive values and reattempt the initial sync?

Comment by J Rassi [ 08/Apr/13 ]

17 minutes is fine. The packet capture did show all of the keepalive probes getting a response for the experiment. However, for the initial sync, I'm still suspicious that the connection was dropped between keepalive probes while it was idle. I just came across a few reports from Microsoft/MongoLab employees of, supposedly, stateful routing infrastructure within Azure that (during peak load) drops connections that have been idle for only 1-4 minutes:

Based on this, I have further steps for diagnosis:

  • Can you please retry the initial sync after adjusting the TCP keepalive system values on repl-10 as follows:
    # echo 5 > /proc/sys/net/ipv4/tcp_keepalive_intvl
    # echo 5 > /proc/sys/net/ipv4/tcp_keepalive_time
  • And, have you tried recently to perform an initial sync of a new member from within Hetzner?
Comment by Marat Shakirov [ 07/Apr/13 ]

I waited slightly more long - 17 minutes, it is important?

db.currentOp();
{ "inprog" : [ ] }
 
db.currentOp()
{
	"inprog" : [
		{
			"opid" : 1911926339,
			"active" : true,
			"secs_running" : 0,
			"op" : "getmore",
			"ns" : "local.oplog.rs",
			"query" : {
				"ts" : {
					"$gte" : {
						"t" : 1365269165,
						"i" : 1990
					}
				}
			},
			"client" : "88.198.197.220:48737",
			"desc" : "conn56787",
			"threadId" : "0x7f82d3ea0700",
			"connectionId" : 56787,
			"waitingForLock" : false,
			"numYields" : 0,
			"lockStats" : {
				"timeLockedMicros" : {
					"r" : NumberLong(8),
					"w" : NumberLong(0)
				},
				"timeAcquiringMicros" : {
					"r" : NumberLong(1),
					"w" : NumberLong(0)
				}
			}
		}
	]
}
 
db.currentOp()
{
	"inprog" : [
		{
			"opid" : 1909673552,
			"active" : true,
			"secs_running" : 0,
			"op" : "getmore",
			"ns" : "local.oplog.rs",
			"query" : {
				"ts" : {
					"$gte" : {
						"t" : 1365269165,
						"i" : 1990
					}
				}
			},
			"client" : "88.198.197.220:48737",
			"desc" : "conn56787",
			"threadId" : "0x7f82d3ea0700",
			"connectionId" : 56787,
			"waitingForLock" : false,
			"numYields" : 0,
			"lockStats" : {
				"timeLockedMicros" : {
					"r" : NumberLong(5),
					"w" : NumberLong(0)
				},
				"timeAcquiringMicros" : {
					"r" : NumberLong(1),
					"w" : NumberLong(0)
				}
			}
		}
	]
}
 
 
db.currentOp()
{
	"inprog" : [
		{
			"opid" : 1909673585,
			"active" : true,
			"secs_running" : 0,
			"op" : "getmore",
			"ns" : "local.oplog.rs",
			"query" : {
				"ts" : {
					"$gte" : {
						"t" : 1365269165,
						"i" : 1990
					}
				}
			},
			"client" : "88.198.197.220:48737",
			"desc" : "conn56787",
			"threadId" : "0x7f82d3ea0700",
			"connectionId" : 56787,
			"waitingForLock" : false,
			"numYields" : 0,
			"lockStats" : {
				"timeLockedMicros" : {
					"r" : NumberLong(8),
					"w" : NumberLong(0)
				},
				"timeAcquiringMicros" : {
					"r" : NumberLong(2),
					"w" : NumberLong(0)
				}
			}
		}
	]
}

Comment by Marat Shakirov [ 07/Apr/13 ]

tcpdump on 27017

Comment by J Rassi [ 06/Apr/13 ]

In the repl-10 log provided, I see ~30 attempts to perform an initial sync. Almost all of them fail at the exact same spot: the initial sync thread performs a 10-12 index build for the "songs.api_log" {_id: 1} index, and then the next socket write to the initial sync connection fails with ECONNRESET. The second-longest index builds seen performed by this thread are less than 3 minutes, and all socket writes immediately after those builds succeed.

Note that no data is sent over the initial sync connection during the {_id: 1} index builds (however there should still be TCP traffic exchanged, since MongoDB enables SO_KEEPALIVE on both sides of the connection and lowers the idle and missed probe intervals to 5 minutes – and note this is currently Linux-only). There may be an issue with a stateful device on the Hetzner<=>Azure link, possibly a NAT device aggressively dropping idle connections from its translation table.

Would you be able to take mongod on repl-10 offline (if it's not still offline) for long enough to perform the following experiment?

  • Run "tcpdump -w azure.pcap tcp port 27017" on repl-10
  • Run 4 separate instances of "mongo prod-media-s02.10.com" on repl-10
  • Wait 12 minutes
  • Run "db.currentOp()" in each of the four mongo sessions
  • Upload the pcap file to this ticket, and paste the output from tcpdump and from each mongo session in a comment

If it turns out there aren't responses to the keepalive probes, then I'd conclude it's a Hetzner or Azure issue; if there are, we'll diagnose further.

Comment by Marat Shakirov [ 06/Apr/13 ]

on repl-10

cat /proc/sys/net/ipv4/tcp_keepalive_intvl
75
cat /proc/sys/net/ipv4/tcp_keepalive_probes
9
cat /proc/sys/net/ipv4/tcp_keepalive_time
7200

sysct.conf:

net.ipv4.conf.all.rp_filter = 1
net.ipv4.icmp_echo_ignore_broadcasts = 1
vm.overcommit_memory = 1
vm.swappiness = 10
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.all.secure_redirects = 0
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.all.secure_redirects = 0
net.ipv4.tcp_rmem = 8192 87380 8388608
net.ipv4.tcp_wmem = 8192 65536 8388608
net.ipv4.tcp_mem = 8388608 12582912 16777216
 

on prod-media-s02

cat /proc/sys/net/ipv4/tcp_keepalive_intvl
20
at /proc/sys/net/ipv4/tcp_keepalive_probes
12
cat /proc/sys/net/ipv4/tcp_keepalive_time
7200

sysctl.conf:

net.ipv4.tcp_keepalive_probes = 12
net.ipv4.tcp_keepalive_intvl = 20
net.core.wmem_max = 8388608
net.core.rmem_max = 8388608
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.all.secure_redirects = 0
net.ipv4.tcp_wmem = 4096	16384	4093504
net.ipv4.tcp_rmem = 4096	87380	4093504
net.ipv4.tcp_mem = 95940	127922	191880
net.ipv6.conf.all.disable_ipv6 = 1
vm.overcommit_memory = 1
vm.swappiness = 5

Comment by J Rassi [ 05/Apr/13 ]

Can you paste the output of the following commands, on both prod-media-s02 and repl-10?

$ cat /proc/sys/net/ipv4/tcp_keepalive_intvl
$ cat /proc/sys/net/ipv4/tcp_keepalive_probes
$ cat /proc/sys/net/ipv4/tcp_keepalive_time

Generated at Thu Feb 08 03:19:40 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.