[SERVER-6446] replication fails with empty host Created: 13/Jul/12  Updated: 15/Aug/12  Resolved: 14/Jul/12

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: Azat Khuzhin Assignee: Eric Milkie
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

2.1.3-pre-


Issue Links:
Duplicate
duplicates SERVER-6422 replication initial sync retry may abort Closed
Related
related to SERVER-6303 shouldRetry needs a write lock Closed
Operating System: ALL
Participants:

 Description   

Fri Jul 13 12:14:10 [conn979] SocketException handling request, closing client connection: 9001 socket exception [2] server [46.229.161.179:49546] 
Fri Jul 13 12:14:10 [conn967] SocketException handling request, closing client connection: 9001 socket exception [2] server [46.229.161.179:43238] 
Fri Jul 13 12:14:10 [conn1273] command config.$cmd command: { serverStatus: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:11 reslen:2624 1608263ms
Fri Jul 13 12:14:10 [conn1021] SocketException handling request, closing client connection: 9001 socket exception [2] server [46.229.161.179:43267] 
Fri Jul 13 12:14:10 [conn1309] command config.$cmd command: { serverStatus: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:162 reslen:2624 1248183ms
Fri Jul 13 12:14:10 [conn1411] command config.$cmd command: { serverStatus: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:156 reslen:2624 228325ms
Fri Jul 13 12:14:10 [conn919] SocketException handling request, closing client connection: 9001 socket exception [2] server [46.229.161.179:46333] 
Fri Jul 13 12:14:10 [conn1177] SocketException handling request, closing client connection: 9001 socket exception [2] server [46.229.161.179:40361] 
Fri Jul 13 12:14:10 [conn1327] serverStatus was very slow: { after basic: 0, middle of mem: 0, after mem: 0, after connections: 0, after extra info: 0, after counters: 0, after repl: 0, after asserts: 0, after dur: 1056700, at end: 1056700 }
Fri Jul 13 12:14:10 [conn1333] SocketException handling request, closing client connection: 9001 socket exception [2] server [46.229.161.179:37326] 
Fri Jul 13 12:14:10 [conn1327] command config.$cmd command: { serverStatus: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:147 reslen:2624 1068358ms
Fri Jul 13 12:14:10 [conn1183] SocketException handling request, closing client connection: 9001 socket exception [2] server [46.229.161.179:43519] 
Fri Jul 13 12:14:10 [conn1417] command config.$cmd command: { serverStatus: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:13 reslen:2624 167837ms
Fri Jul 13 12:14:10 [conn1123] command config.$cmd command: { serverStatus: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:13 reslen:2644 3108421ms
Fri Jul 13 12:14:10 [conn1411] end connection 46.229.161.179:50059 (14 connections now open)
Fri Jul 13 12:14:10 [conn1303] SocketException handling request, closing client connection: 9001 socket exception [2] server [46.229.161.179:49913] 
Fri Jul 13 12:14:10 [conn1309] SocketException handling request, closing client connection: 9001 socket exception [2] server [46.229.161.179:53029] 
Fri Jul 13 12:14:10 [conn1291] SocketException handling request, closing client connection: 9001 socket exception [2] server [46.229.161.179:43589] 
Fri Jul 13 12:14:10 [conn1273] SocketException handling request, closing client connection: 9001 socket exception [2] server [46.229.161.179:34191] 
Fri Jul 13 12:14:10 [conn1327] SocketException handling request, closing client connection: 9001 socket exception [2] server [46.229.161.179:34202] 
Fri Jul 13 12:14:10 [conn1123] SocketException handling request, closing client connection: 9001 socket exception [2] server [46.229.161.179:40303] 
Fri Jul 13 12:14:10 [conn1417] end connection 46.229.161.179:53207 (5 connections now open)
Fri Jul 13 12:14:11 [rsSync] replSet initialSyncOplogApplication applied 1045 operations, synced to Jul 13 08:14:13:2f
Fri Jul 13 12:14:11 [repl writer worker 1] replication update of non-mod failed: { ts: Timestamp 1342181671000|130, h: 3934037683424155789, op: "u", ns: "seoquake_ad_api.ad", o2: { _id: ObjectId('4fffc5b7597240259dfa8f8f') }, o: { $set: { count: 5 }, $set: { ts_end: new Date(1342181671562) } } }
Fri Jul 13 12:14:11 [repl writer worker 1] replication info adding missing object
Fri Jul 13 12:14:11 [repl writer worker 1] Assertion: 13110:HostAndPort: host is empty
0x757253 0xa56304 0x821c19 0x821cc1 0x6a485a 0x6a6465 0xb3e651 0xb3cd7f 0x5cd3be 0x87fefa 0x69d7ce 0x7f827dd358ca 0x7f827d0f592d 
 /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x23) [0x757253]
 /usr/bin/mongod(_ZN5mongo11msgassertedEiPKc+0x94) [0xa56304]
 /usr/bin/mongod(_ZN5mongo11HostAndPort4initEPKc+0xe9) [0x821c19]
 /usr/bin/mongod(_ZN5mongo18DBClientConnection7connectEPKcRSs+0x51) [0x821cc1]
 /usr/bin/mongod(_ZN5mongo11OplogReader13commonConnectERKSs+0x24a) [0x6a485a]
 /usr/bin/mongod(_ZN5mongo11OplogReader7connectESs+0x35) [0x6a6465]
 /usr/bin/mongod(_ZN5mongo4Sync13getMissingDocERKNS_7BSONObjE+0x121) [0xb3e651]
 /usr/bin/mongod(_ZN5mongo4Sync11shouldRetryERKNS_7BSONObjE+0x10f) [0xb3cd7f]
 /usr/bin/mongod(_ZN5mongo7replset21multiInitialSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE+0x5e) [0x5cd3be]
 /usr/bin/mongod(_ZN5mongo10threadpool6Worker4loopEv+0x2a) [0x87fefa]
 /usr/bin/mongod() [0x69d7ce]
 /lib/libpthread.so.0(+0x68ca) [0x7f827dd358ca]
 /lib/libc.so.6(clone+0x6d) [0x7f827d0f592d]
Fri Jul 13 12:14:11 [repl writer worker 1] ERROR: exception: HostAndPort: host is empty on: { ts: Timestamp 1342181671000|130, h: 3934037683424155789, op: "u", ns: "seoquake_ad_api.ad", o2: { _id: ObjectId('4fffc5b7597240259dfa8f8f') }, o: { $set: { count: 5 }, $set: { ts_end: new Date(1342181671562) } } }
Fri Jul 13 12:14:11 [repl writer worker 1]   Fatal Assertion 16361
0x757253 0xa56053 0x5cd4e3 0x87fefa 0x69d7ce 0x7f827dd358ca 0x7f827d0f592d 
 /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x23) [0x757253]
 /usr/bin/mongod(_ZN5mongo13fassertFailedEi+0x63) [0xa56053]
 /usr/bin/mongod(_ZN5mongo7replset21multiInitialSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE+0x183) [0x5cd4e3]
 /usr/bin/mongod(_ZN5mongo10threadpool6Worker4loopEv+0x2a) [0x87fefa]
 /usr/bin/mongod() [0x69d7ce]
 /lib/libpthread.so.0(+0x68ca) [0x7f827dd358ca]
 /lib/libc.so.6(clone+0x6d) [0x7f827d0f592d]
Fri Jul 13 12:14:11 [repl writer worker 1] 
 
***aborting after fassert() failure
 
 
Fri Jul 13 12:14:11 Got signal: 6 (Aborted).
 
Fri Jul 13 12:14:11 Backtrace:
0x757253 0x5750a4 0x7f827d058230 0x7f827d0581b5 0x7f827d05afc0 0xa5608e 0x5cd4e3 0x87fefa 0x69d7ce 0x7f827dd358ca 0x7f827d0f592d 
 /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x23) [0x757253]
 /usr/bin/mongod(_ZN5mongo10abruptQuitEi+0x3e4) [0x5750a4]
 /lib/libc.so.6(+0x32230) [0x7f827d058230]
 /lib/libc.so.6(gsignal+0x35) [0x7f827d0581b5]
 /lib/libc.so.6(abort+0x180) [0x7f827d05afc0]
 /usr/bin/mongod(_ZN5mongo13fassertFailedEi+0x9e) [0xa5608e]
 /usr/bin/mongod(_ZN5mongo7replset21multiInitialSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE+0x183) [0x5cd4e3]
 /usr/bin/mongod(_ZN5mongo10threadpool6Worker4loopEv+0x2a) [0x87fefa]
 /usr/bin/mongod() [0x69d7ce]
 /lib/libpthread.so.0(+0x68ca) [0x7f827dd358ca]
 /lib/libc.so.6(clone+0x6d) [0x7f827d0f592d]



 Comments   
Comment by Azat Khuzhin [ 15/Jul/12 ]

Eric, thanks. It works for me.

Comment by Eric Milkie [ 14/Jul/12 ]

Yes, redo initial sync by deleting the data files for the secondary.

Comment by Azat Khuzhin [ 14/Jul/12 ]

I can.
I need to delete SECONDARY and start replication from initial sync? Or I can continue replication operation?

Comment by Eric Milkie [ 14/Jul/12 ]

Hi Azat, thanks for filing this. We fixed it very recently; this is a duplicate of SERVER-6422.
Can you try rebuilding from the latest master?

Comment by Azat Khuzhin [ 13/Jul/12 ]

backend-data:PRIMARY> rs.conf()
{
        "_id" : "backend-data",
        "version" : 1,
        "members" : [
                {
                        "_id" : 0,
                        "host" : "node1.mongo.backend:27018"
                },
                {
                        "_id" : 1,
                        "host" : "node4.mongo.backend:27019",
                        "arbiterOnly" : true
                },
                {
                        "_id" : 2,
                        "host" : "node4.mongo.backend:27018"
                }
        ]
}
backend-data:PRIMARY> rs.status()
{
        "set" : "backend-data",
        "date" : ISODate("2012-07-13T21:20:26Z"),
        "myState" : 1,
        "members" : [
                {
                        "_id" : 0,
                        "name" : "node1.mongo.backend:27018",
                        "health" : 1,
                        "state" : 1,
                        "stateStr" : "PRIMARY",
                        "uptime" : 32825,
                        "optime" : Timestamp(1342214426000, 152),
                        "optimeDate" : ISODate("2012-07-13T21:20:26Z"),
                        "self" : true
                },
                {
                        "_id" : 1,
                        "name" : "node4.mongo.backend:27019",
                        "health" : 1,
                        "state" : 7,
                        "stateStr" : "ARBITER",
                        "uptime" : 32805,
                        "lastHeartbeat" : ISODate("2012-07-13T21:20:25Z"),
                        "pingMs" : 0
                },
                {
                        "_id" : 2,
                        "name" : "node4.mongo.backend:27018",
                        "health" : 0,
                        "state" : 8,
                        "stateStr" : "(not reachable/healthy)",
                        "uptime" : 0,
                        "optime" : Timestamp(1342181666000, 138),
                        "optimeDate" : ISODate("2012-07-13T12:14:26Z"),
                        "lastHeartbeat" : ISODate("2012-07-13T21:13:00Z"),
                        "pingMs" : 0,
                        "errmsg" : "socket exception"
                }
        ],
        "ok" : 1
}
 

Comment by Azat Khuzhin [ 13/Jul/12 ]

This issue related to #SERVER-6303

Previous I wrote that all fine (https://jira.mongodb.org/browse/SERVER-6303?focusedCommentId=142386&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-142386).

But seems that, this wrong when, after initial replication finished need to apply new operations.
It means that between you initiate replication and initial replication finished you database was modified.
(In previous case database doesn't modified between initiate replication and initial replication finished)

Comment by Scott Hernandez (Inactive) [ 13/Jul/12 ]

What does rs.status() and rs.config() look like?

Comment by Azat Khuzhin [ 13/Jul/12 ]

After restarting mongodb on SECONDARY, that terminated after this, got this:

Fri Jul 13 17:12:56 [initandlisten] connection accepted from 109.206.172.216:53427 #6 (3 connections now open)
Fri Jul 13 17:12:58 [conn5] end connection 109.206.185.182:37982 (2 connections now open)
Fri Jul 13 17:12:58 [initandlisten] connection accepted from 109.206.185.182:37983 #7 (3 connections now open)
Fri Jul 13 17:13:00 [rsSyncNotifier] replset setting oplog notifier to node1.mongo.sq-ads-backend:27018
Fri Jul 13 17:13:00 [rsSync] build index local.replset.minvalid { _id: 1 }
Fri Jul 13 17:13:00 [rsSync] build index done.  scanned 0 total records. 0 secs
Fri Jul 13 17:13:00 [repl writer worker 1] replication update of non-mod failed: { ts: Timestamp 1342181671000|130, h: 3934037683424155789, op: "u", ns: "seoquake_ad_api.ad", o2: { _id: ObjectId('4fffc5b7597240259dfa8f8f') }, o: { $set: { count: 5 }, $set: { ts_end: new Date(1342181671562) } } }
Fri Jul 13 17:13:00 [repl writer worker 1]   Fatal Assertion 16359
0x757253 0xa56053 0x5cd2a1 0x87fefa 0x69d7ce 0x7f379ee858ca 0x7f379e24592d 
 /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x23) [0x757253]
 /usr/bin/mongod(_ZN5mongo13fassertFailedEi+0x63) [0xa56053]
 /usr/bin/mongod(_ZN5mongo7replset14multiSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE+0x51) [0x5cd2a1]
 /usr/bin/mongod(_ZN5mongo10threadpool6Worker4loopEv+0x2a) [0x87fefa]
 /usr/bin/mongod() [0x69d7ce]
 /lib/libpthread.so.0(+0x68ca) [0x7f379ee858ca]
 /lib/libc.so.6(clone+0x6d) [0x7f379e24592d]
Fri Jul 13 17:13:00 [repl writer worker 1] 
 
***aborting after fassert() failure
 
 
Fri Jul 13 17:13:00 Got signal: 6 (Aborted).
 
Fri Jul 13 17:13:00 Backtrace:
0x757253 0x5750a4 0x7f379e1a8230 0x7f379e1a81b5 0x7f379e1aafc0 0xa5608e 0x5cd2a1 0x87fefa 0x69d7ce 0x7f379ee858ca 0x7f379e24592d 
 /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x23) [0x757253]
 /usr/bin/mongod(_ZN5mongo10abruptQuitEi+0x3e4) [0x5750a4]
 /lib/libc.so.6(+0x32230) [0x7f379e1a8230]
 /lib/libc.so.6(gsignal+0x35) [0x7f379e1a81b5]
 /lib/libc.so.6(abort+0x180) [0x7f379e1aafc0]
 /usr/bin/mongod(_ZN5mongo13fassertFailedEi+0x9e) [0xa5608e]
 /usr/bin/mongod(_ZN5mongo7replset14multiSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE+0x51) [0x5cd2a1]
 /usr/bin/mongod(_ZN5mongo10threadpool6Worker4loopEv+0x2a) [0x87fefa]
 /usr/bin/mongod() [0x69d7ce]
 /lib/libpthread.so.0(+0x68ca) [0x7f379ee858ca]
 /lib/libc.so.6(clone+0x6d) [0x7f379e24592d]

Generated at Thu Feb 08 03:11:41 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.