[SERVER-31780] getting Error from MongoS Created: 27/Oct/17  Updated: 07/Jan/18  Resolved: 01/Dec/17

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 3.2.15
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: HARE KRUSHNA MONTRY Assignee: Mark Agarunov
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

hi,

we have an environment as below:

1) 2 shards [ 2 primary, 2 secondary, 2 Arbiter]
2) Config servers
3) 12 MongoS servers.

we are facing Application issues and below are the logs from MongoDb and from MongoS

---------------------------------------------------------------------------
We are getting the below error from MongoS routers.
----------------------------------------------------------------------------

2017-10-27T12:22:12.932+0000 I NETWORK  [NetworkInterfaceASIO-TaskExecutorPool-7-0] Marking host 10.18.62.194:37018 as failed :: caused by :: ExceededTimeLimit: operation exceeded time limit
2017-10-27T12:22:15.013+0000 I NETWORK  [NetworkInterfaceASIO-TaskExecutorPool-7-0] Marking host 10.18.62.194:37018 as failed :: caused by :: ExceededTimeLimit: operation exceeded time limit

---------------------------------------------------------------------
From MongoDB we are getting the below error
----------------------------------------------------------------------

 
7T11:01:25.240Z I SHARDING [conn1464] remotely refreshing metadata for CLMPROD.CLMAuditLog based on current shard version 214|31||59d09221afa4216db060338c, current metadata version is 214|31||59d09221afa4216db060338c
2017-10-27T11:01:25.241Z I SHARDING [conn1464] metadata of collection CLMPROD.CLMAuditLog already up to date (shard version : 214|31||59d09221afa4216db060338c, took 1 ms)
2017-10-27T11:01:25.242Z I SHARDING [conn1464] about to log metadata event into changelog: { _id: "SVR-AGMDBPRIM-01.mtn.ci-2017-10-27T11:01:25.242+0000-59f31205e7b2b1db44b6c514", server: "SVR-AGMDBPRIM-01.mtn.ci", clientAddr: "10.18.62.170:52658", time: new Date(1509102085242), what: "moveChunk.start", ns: "CLMPROD.CLMAuditLog", details: { min: { auditLog.CreationDate: "2017-09-30T14:38:15.887Z" }, max: { auditLog.CreationDate: "2017-09-30T15:00:07.040Z" }, from: "mtnic-rs1", to: "mtnic-rs2" } }
2017-10-27T11:01:25.245Z I SHARDING [conn1464] moveChunk request accepted at version 214|31||59d09221afa4216db060338c
2017-10-27T11:01:25.245Z I SHARDING [conn1464] moveChunk number of documents: 1
2017-10-27T11:01:25.245Z W SHARDING [conn1464] moveChunk failed to engage TO-shard in the data transfer:  :: caused by :: UnknownError: can't accept new chunks because  there are still 8 deletes from previous migration
2017-10-27T11:01:25.245Z I SHARDING [conn1464] about to log metadata event into changelog: { _id: "SVR-AGMDBPRIM-01.mtn.ci-2017-10-27T11:01:25.245+0000-59f31205e7b2b1db44b6c516", server: "SVR-AGMDBPRIM-01.mtn.ci", clientAddr: "10.18.62.170:52658", time: new Date(1509102085245), what: "moveChunk.from", ns: "CLMPROD.CLMAuditLog", details: { min: { auditLog.CreationDate: "2017-09-30T14:38:15.887Z" }, max: { auditLog.CreationDate: "2017-09-30T15:00:07.040Z" }, step 1 of 6: 0, step 2 of 6: 8, to: "mtnic-rs2", from: "mtnic-rs1



 Comments   
Comment by Mark Agarunov [ 01/Dec/17 ]

Hello hare.montry@tecnotree.com,

We haven’t heard back from you for some time, so I’m going to mark this ticket as resolved. If this is still an issue for you, please provide additional information and we will reopen the ticket.

Thanks,
Mark

Comment by Mark Agarunov [ 10/Nov/17 ]

Hello hare.montry@tecnotree.com,

We still need additional information to diagnose the problem. If this is still an issue for you, could you please provide the complete logs from all affected mongod and mongos nodes?,

Thanks,
Mark

Comment by Mark Agarunov [ 31/Oct/17 ]

Hello hare.montry@tecnotree.com,

Thank you for the report. Looking over the provided output, I believe this may be an instance of SERVER-27009, where an open cursor without a timeout is causing migrations to block. To verify that this is the case, could you please provide the complete logs from all affected mongod and mongos nodes?

I've generated a secure upload portal so that you can send us these files privately.

Thanks,
Mark

Generated at Thu Feb 08 04:28:10 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.