[SERVER-27515] Issue with sharding a huge collection(splitVector timeout) Created: 26/Dec/16  Updated: 16/Jan/17  Resolved: 12/Jan/17

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.2.3
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Suraj Sawant Assignee: Kaloian Manassiev
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: HTML File Collection Stats     Text File ConfigReplicaStatus.txt     Text File ConfigServerSlaveInfo.txt     Text File ShardingStatus.txt     File config.log     File mongos.log     File pimaryShard.log    
Issue Links:
Duplicate
duplicates SERVER-23784 Don't use 30 second network timeout o... Closed
Operating System: ALL
Participants:

 Description   

Hi,
We are facing timeout problem while sharding a quiet big collection.

sh.shardCollection("mp.AccWiseQty", {"outletId" :1, "variantId":1});
{ "code" : 50, "ok" : 0, "errmsg" : "Operation timed out" }

While inspecting logs,We have found that splitVector command is timing out.

Mongos Log Snapshot:(SplitVector Timeout)
2016-12-26T11:59:46.912+0530 D ASIO     [NetworkInterfaceASIO-ShardRegistry-0] Failed to execute command: RemoteCommand 3054972 -- target:81-47-mumbai.justdial.com:26200 db:admin expDate:2016-12-26T11:59:46.912+0530 cmd:{ splitVector: "mp.AccWiseQty", keyPattern: { outletId: 1.0, variantId: 1.0 }, min: { outletId: MinKey, variantId: MinKey }, max: { outletId: MaxKey, variantId: MaxKey }, maxChunkSizeBytes: 67108864, maxSplitPoints: 0, maxChunkObjects: 0 } reason: ExceededTimeLimit: Operation timed out

Collection Details:
Collection DB:mp
Collection Name:AccWiseQty
No.Of Docs:85686646
Collection Size:
"size" : 33235056950,
"avgObjSize" : 387,
"storageSize" : 9376174080

Collection Stats file is also attached for details about collection.

Have attached all possibly required logs by setting log verbose level to 2.
1.Config server replica set status
2.Collection stats(for which we are getting error)
3.Sharding Status
4.Config server slave replication info
5.mongos error log
6.primary shard error log
7.Collection Stats

We have gone through following link.
https://groups.google.com/forum/#!topic/mongodb-user/ozSgkhwPPBQ

It suggests to dump the data out and re-import it.
But re-importing data in our case is not feasible as data volume is huge.
We need solution that will shard the existing huge collection in-place.
May be some sort of increase in timeout-seconds may help but I dont know where to change.



 Comments   
Comment by Suraj Sawant [ 16/Jan/17 ]

Hi Kaloian,

I will upgrade the version to 3.2.10 or later and will check.
Thanks..

Suraj

Comment by Kaloian Manassiev [ 12/Jan/17 ]

Hi sawantsuraj91@gmail.com,

This problem is caused by mongos erroneously using a 30 second timeout when talking to shards and has been fixed as part of SERVER-23784 in version 3.2.10.

If you update to that version (or later) you should not be seeing this problem anymore.

Best regards,
-Kal.

Generated at Thu Feb 08 04:15:22 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.