[SERVER-38863] secondary init sync error(Restarting oplog query due to error: NetworkInterfaceExceededTimeLimit: error in fetcher batch callback :: caused by :: timed out.) Created: 06/Jan/19  Updated: 27/Oct/23  Resolved: 08/Jan/19

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Question Priority: Major - P3
Reporter: 纪德鹏 Assignee: Dmitry Agranat
Resolution: Works as Designed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File metrics.2019-01-06T15-23-14Z-00000     Text File moveShard5.log    
Participants:

 Description   

when add a senondary ,the init sync process can not fished ,secondary mongod log print the followng message :

2019-01-07T05:45:36.576+0800 I - [repl writer worker 9] libtorrent.fs.chunks collection clone progress: 686378/927304 74% (documents copied)
2019-01-07T05:47:20.420+0800 I - [repl writer worker 9] libtorrent.fs.chunks collection clone progress: 690780/927304 74% (documents copied)
2019-01-07T05:49:00.834+0800 I - [repl writer worker 9] libtorrent.fs.chunks collection clone progress: 695150/927304 74% (documents copied)
2019-01-07T05:49:44.268+0800 I ASIO [ShardRegistry] Dropping all pooled connections to moveShard2_configServer2:27019 due to NetworkInterfaceExceededTimeLimit: Connection pool has been idle for longer than the host timeout
2019-01-07T05:49:44.278+0800 I ASIO [ShardRegistry] Connecting to moveShard2_configServer2:27019
2019-01-07T05:50:09.168+0800 I REPL [replication-19] Restarting oplog query due to error: NetworkInterfaceExceededTimeLimit: error in fetcher batch callback :: caused by :: timed out. Last fetched optime (with hash): { ts: Timestamp(1546840107, 74), t: 7 }[8020115046077724190]. Restarts remaining: 10
2019-01-07T05:50:09.168+0800 I REPL [replication-19] Scheduled new oplog query Fetcher source: moveShard5:27017 database: local query: { find: "oplog.rs", filter: { ts:

{ $gte: Timestamp(1546840107, 74) }

}, tailable: true, oplogReplay: true, awaitData: true, maxTimeMS: 2000, batchSize: 13981010, term: 7, readConcern: { afterClusterTime: Timestamp(1546840107, 74) } } query metadata: { $replData: 1, $oplogQueryData: 1, $readPreference:

{ mode: "secondaryPreferred" }

} active: 1 findNetworkTimeout: 7000ms getMoreNetworkTimeout: 10000ms shutting down?: 0 first: 1 firstCommandScheduler: RemoteCommandRetryScheduler request: RemoteCommand 95373 – target:moveShard5:27017 db:local cmd:{ find: "oplog.rs", filter: { ts:

{ $gte: Timestamp(1546840107, 74) }

}, tailable: true, oplogReplay: true, awaitData: true, maxTimeMS: 2000, batchSize: 13981010, term: 7, readConcern: { afterClusterTime: Timestamp(1546840107, 74) } } active: 1 callbackHandle.valid: 1 callbackHandle.cancelled: 0 attempt: 1 retryPolicy: RetryPolicyImpl maxAttempts: 1 maxTimeMillis: -1ms
2019-01-07T05:50:09.168+0800 I ASIO [RS] Ending connection to host moveShard5:27017 due to bad connection status; 1 connections to that host remain open
2019-01-07T05:50:09.168+0800 I ASIO [RS] Connecting to moveShard5:27017
2019-01-07T05:50:16.168+0800 I REPL [replication-20] Restarting oplog query due to error: NetworkInterfaceExceededTimeLimit: error in fetcher batch callback :: caused by :: timed out. Last fetched optime (with hash): { ts: Timestamp(1546840107, 74), t: 7 }[8020115046077724190]. Restarts remaining: 9
2019-01-07T05:50:16.168+0800 I ASIO [RS] Ending connection to host moveShard5:27017 due to bad connection status; 1 connections to that host remain open
2019-01-07T05:50:16.168+0800 I REPL [replication-20] Scheduled new oplog query Fetcher source: moveShard5:27017 database: local query: { find: "oplog.rs", filter: { ts:

{ $gte: Timestamp(1546840107, 74) }

}, tailable: true, oplogReplay: true, awaitData: true, maxTimeMS: 2000, batchSize: 13981010, term: 7, readConcern: { afterClusterTime: Timestamp(1546840107, 74) } } query metadata: { $replData: 1, $oplogQueryData: 1, $readPreference:

{ mode: "secondaryPreferred" }

} active: 1 findNetworkTimeout: 7000ms getMoreNetworkTimeout: 10000ms shutting down?: 0 first: 1 firstCommandScheduler: RemoteCommandRetryScheduler request: RemoteCommand 95402 – target:moveShard5:27017 db:local cmd:{ find: "oplog.rs", filter: { ts:

{ $gte: Timestamp(1546840107, 74) }

}, tailable: true, oplogReplay: true, awaitData: true, maxTimeMS: 2000, batchSize: 13981010, term: 7, readConcern: { afterClusterTime: Timestamp(1546840107, 74) } } active: 1 callbackHandle.valid: 1 callbackHandle.cancelled: 0 attempt: 1 retryPolicy: RetryPolicyImpl maxAttempts: 1 maxTimeMillis: -1ms
2019-01-07T05:50:16.168+0800 I ASIO [RS] Connecting to moveShard5:27017
2019-01-07T05:50:23.168+0800 I REPL [replication-19] Restarting oplog query due to error: NetworkInterfaceExceededTimeLimit: error in fetcher batch callback :: caused by :: timed out. Last fetched optime (with hash): { ts: Timestamp(1546840107, 74), t: 7 }[8020115046077724190]. Restarts remaining: 8
2019-01-07T05:50:23.168+0800 I ASIO [RS] Ending connection to host moveShard5:27017 due to bad connection status; 1 connections to that host remain open
2019-01-07T05:50:23.168+0800 I REPL [replication-19] Scheduled new oplog query Fetcher source: moveShard5:27017 database: local query: { find: "oplog.rs", filter: { ts:

{ $gte: Timestamp(1546840107, 74) }

}, tailable: true, oplogReplay: true, awaitData: true, maxTimeMS: 2000, batchSize: 13981010, term: 7, readConcern: { afterClusterTime: Timestamp(1546840107, 74) } } query metadata: { $replData: 1, $oplogQueryData: 1, $readPreference:

{ mode: "secondaryPreferred" }

} active: 1 findNetworkTimeout: 7000ms getMoreNetworkTimeout: 10000ms shutting down?: 0 first: 1 firstCommandScheduler: RemoteCommandRetryScheduler request: RemoteCommand 95432 – target:moveShard5:27017 db:local cmd:{ find: "oplog.rs", filter: { ts:

{ $gte: Timestamp(1546840107, 74) }

}, tailable: true, oplogReplay: true, awaitData: true, maxTimeMS: 2000, batchSize: 13981010, term: 7, readConcern: { afterClusterTime: Timestamp(1546840107, 74) } } active: 1 callbackHandle.valid: 1 callbackHandle.cancelled: 0 attempt: 1 retryPolicy: RetryPolicyImpl maxAttempts: 1 maxTimeMillis: -1ms

details in the attachments 



 Comments   
Comment by Dmitry Agranat [ 07/Jan/19 ]

Hello,

Thanks for your report. Initial syncs can add extra load on the sync source thus the behavior you have encountered is not unexpected.

Please note that SERVER project is for reporting bugs or feature suggestions for the MongoDB server. For MongoDB-related support discussion please post on the mongodb-user group or Stack Overflow with the mongodb tag. A question like this involving more discussion would be best posted on the mongodb-user group.

Thank you,
Dima

Generated at Thu Feb 08 04:50:16 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.