[SERVER-39683] shard request to split a same chunk at the same time. Created: 20/Feb/19  Updated: 27/Feb/19  Resolved: 25/Feb/19

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.4.3
Fix Version/s: None

Type: Question Priority: Major - P3
Reporter: JackWang [X] Assignee: Eric Sedor
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: JPEG File 1550645643(1).jpg     JPEG File 1550645670(1).jpg     JPEG File 1550645718.jpg    
Issue Links:
Duplicate
is duplicated by SERVER-39682 shard request to split a same chunk a... Closed
Related
is related to SERVER-34448 Disable chunk splitting functionality... Closed
Participants:

 Description   

Hello, in the environment of my production server, there is occasional increase in machine load, the normal time is 0-1, the abnormal time is 5-10, the shard node cpu is very high when viewing, the machine always prints some information when viewing the log. When the log is not printed, the machine load drops and resumes. The "Finding the split vector for ctu.mobileToken" will always operate, how to avoid it, during this time, the request volume has not increased significantly, io utils 40%. I am looking forward to your reply, thank you!

2019-02-20T04:58:26.144+0000 I SHARDING [conn68389] received splitChunk request: { splitChunk: "ctu.mobileToken", configdb: "crs/10.60.96.104:21001,10.60.96.81:21001,10.60.96.92:21001", from: "rs3", keyPattern: { token: 1.0 }, shardVersion: [ Timestamp 4419000|633, ObjectId('5bc858a5c9ebaf2f1aafd8db') ], min: { token: "5c6bc051xbpFF1evtQv1JhORbtFRiBkFeIl8Ws83" }, max: { token: MaxKey }, splitKeys: [ { token: "5c6bc2301lWzOdbLzOYXcibfkgMDHi7XKN4xkRW3" }, { token: "5c6bc40dU74I7yewg2ifmA2XCUEtrwvwlmzRP0m3" }, { token: "5c6bc5f0tTGE1yJKpZIkoISq99cmTTqeu74XKTe3" }, { token: "5c6bc7d8pm0SayzAzdIC8gYssmU6PjGhozzv0iF3" }, { token: "5c6bc9c3gRS7D56NkHv7UalFsAxz9dXkAWtI7tI3" }, { token: "5c6bcba5kTeFzefCUUKcVXI16C4LXYeuA8GtcAW3" }, { token: "5c6bcd8elMIjPAmVT2yJRO8cB80LItVVURDgErj3" }, { token: "5c6bcf853Zlt9Hhfl6zZualYooyCCpkdFQ6vere3" }, { token: "5c6bd180q3wyUljtRQ3dlyqSCJJO0R6itPWGyPg2" }, { token: "5c6bd382FCnxqHujlhcyfGWciqMXQCbhBGPND6u3" }, { token: "5c6bd53eaQdVIbd5NquKxr5Wx66DvRU6qaPYoIR3" }, { token: "5c6bd6e58Z5iBK9SnMhlPF5q1pQB5PaeZ2xjpoe3" }, { token: "5c6bd884CGMAajikF1JRksBiLsJWIfG2A9RLCD43" }, { token: "5c6bda2dYS0ccVBJuZ5d5LmFSljvXPsxENL8xL43" }, { token: "5c6bdbedFUq4zx2kuHxUfhFqk6Rjrzohsu6Jx3I3" }, { token: "5c6bddabUMYMsVXQQ6w9HibJI4fZDk7rttx9NQy2" }, { token: "5c6bdf69ve7X83v2E9rFZNZ9WL2LaTCnccEATcL3" }, { token: "5c6be11eB5ZywhbVM2fTOUjrz6OdTNpnc9ZQ0Jq3" }, { token: "5c6be2a5WPt9bMt9doFB2v7vA3uWKv1vapGwzYU3" }, { token: "5c6be42aHdlCe4Mg5Vylmj5Grv7iZIhq01Wn4wl3" }, { token: "5c6be5aejqvp9N6Ep2UN18Z95nljZKnZA9umD153" }, { token: "5c6be735f0RMucZh6BSbWQ0r62AXRGEQHxB8E0k3" }, { token: "5c6be8baBQmqG9Kme1MJEowNNM7KEBemATRSC6a3" }, { token: "5c6bea42Vq58mff3vYwq4DvJ3d1qT8p67MtYu1T2" }, { token: "5c6bebc0GJ6zszVTZeLRO8tH7zLiotBAHUMkVek3" }, { token: "5c6bed48thbSbfWazo72LsiIFd4SulYtXPdMU8m3" }, { token: "5c6beecc1fqXsdcK8VxwXFqZIoMHAyppn6LZnnX3" }, { token: "5c6bf04aKgObuSFhhCKmDtUkWGhEuI5kEh8vwNn3" }, { token: "5c6bf1c3vcQeCNFZ0wxMWj4TiUFVhlXyMeRYLAl3" }, { token: "5c6bf33elMtm5lbxCH3u5m6013XdKoR1elIyqXC3" }, { token: "5c6bf4b2xM6BTt8bBUvb2Ritsmp6B2isoYu8lf63" }, { token: "5c6bf629Sra7XhKyiyA7cqzRQ4qr5dInm6fx4v12" }, { token: "5c6bf795c2LaLIwcIgSDEED6nuGrcXLSkxyIXnk3" }, { token: "5c6bf906B2djS6LIxPUi5n7W3XQLEndmxLac4HK3" }, { token: "5c6bfa68jOLppJzZ25ItuFS3cKhP93T95TuVbCK3" }, { token: "5c6bfbc8bKpBqvUJP3Ab7WWbW5JtYahHTKOZpdA2" }, { token: "5c6bfd2aVtdUlT96ZtuV7vLX6mS4mRpGN92oiii2" }, { token: "5c6bfe7cUN0nDZoxVRP70iQBtp87yf4xCLfJcKB2" }, { token: "5c6bffd2jiTcr2GrZSSQo9Ks5W5oD0zkPZzOUFn3" }, { token: "5c6c0129VIk5lz1UoeZhi5h7hJlUEK0dBfm7XCA3" }, { token: "5c6c0279TRZRoB3LknwS1JZI41N0uJgbaLq00hp3" }, { token: "5c6c03c3uid5Ng54HxfY456mCGIY6U9FLtK0FL23" }, { token: "5c6c050fwB1LkbGprhV2eTM9wsn5Nbda6KESCgH3" }, { token: "5c6c06552Gib8GsoUxVGiR8NXhAmvWc39OO2nU23" }, { token: "5c6c07a3T8fpftwGK2E0HWoJmrJxDtKnwCSxA6x3" }, { token: "5c6c08ebXwEFKA9fqxKe8NWtDt0XZ9wRN48WOqe2" }, { token: "5c6c0a30CpUT9A8EvSSEkmvPzhfHEBAslbeJsg52" }, { token: "5c6c0b7cNVobHHdrQFGkWDyALv5vlRqY5iH82V42" }, { token: "5c6c0cc0BtHmxER6oWLfiTmIPWLuMgf7eUbwnSe3" }, { token: "5c6c0e05R8clkU0CvUWpfaNSdFilFnhOyUcdOhC2" }, { token: "5c6c0f4fyPDv5PzORWpKrZm5u0gHUKRFImoJ1Os3" }, { token: "5c6c10a17O3jL2zvbkldVPh2UlVeuNyRmDmsznm3" }, { token: "5c6c11f8WxSfvOn9xIuFxEfP6OqcgTQjAjBqqSo3" }, { token: "5c6c1351a2oMsAO9NLjlkrgs2np2YUvjOduEGze3" }, { token: "5c6c14ac89IP7uTQyhnNQcjkZ4aFo8xndlNQ3083" }, { token: "5c6c160bzt7MeaqNvovuFgOIxrE5yh5lgxFsCBt2" }, { token: "5c6c1772VTCyIPW39JCKVEqhSwKEpZIsDCg2e243" }, { token: "5c6c18e6ofcjDuA63LLI9WUbCoRPhPa1A6fjI6a2" }, { token: "5c6c1a62ToOX2qIi35TmcupyaVsYMKxN27jRgyP3" }, { token: "5c6c1bf0oDSIO4fZJbDBIDr3O4GYfHaxbLrqxwK3" }, { token: "5c6c1d96AmKBg2ohUrBHJiwwZ2exjRoHQEn3p5w2" }, { token: "5c6c1f4dIh6h15FzogJocK8keUsf0DNQIPIRy6g2" }, { token: "5c6c21204qrUjJyOV6XMhSTR2vXnoeMxNEm9oCm3" }, { token: "5c6c2312qVcnXxr3C1zQDFV7fxugw4EsYHc8BjN3" }, { token: "5c6c2533F8ayiQmw2AUA4dckyBiHUn2qG3umYna3" }, { token: "5c6c276eOQYFTvb8JMyPLML9WU0RXHy9jYOso2Q3" }, { token: "5c6c297eg75wFtZWDy3qUE03NoMLTOeqYM51laL3" }, { token: "5c6c2be6SGIixLpYVln6E4DxL43qYqmqg9Gc0KL3" }, { token: "5c6c2ed4bnkmG5YwqIY4EpeuuGvOlWnXON1qTZB2" }, { token: "5c6c3249mAqSFFOAL1xgWcP84OVprnft5y5q6eU3" }, { token: "5c6c3667Tv3A2s6S2ilpu3YMWfDnSigElvBUCGT2" }, { token: "5c6c3b81I6UNFvkI3WCe0fTjqOLT1kKrM2StZ7J3" }, { token: "5c6c423dy9M8FPGSUMMWXUHFW2pQKX1Eiz5kAW13" }, { token: "5c6c4b6aMdrhkBU1Ivfc7Sd6dP07VBSMTs8yPSQ3" }, { token: "5c6c57d0n4GaOwYvzkJYKLCHMoyq3LAAlXECKct3" }, { token: "5c6c67d327vbJjnNKqbpHNGi8ArxZv419VnN1Sd2" }, { token: "5c6c7908ubizbPIiBvxFYlPtdhIZTKSgHIh6oYU3" }, { token: "5c6c835daGjks1dycAXBZn7pO38g7sMiWaqD3F63" }, { token: "5c6c89639O9RLGlnLDqMqGDHYE0ODktKhnrxSJq3" }, { token: "5c6c8da1Megjlf2w5vw2b0IHYYYkKpoNBRSuxoK3" }, { token: "5c6c914b9qhq0R0fcy5iZCKwepnyW3pR6ADLk2m3" }, { token: "5c6c9472SYf8PmtRsb3JoHifbZNSSRg2c7bdhHf3" }, { token: "5c6c9766pvXJJp7YNqLyz7aJZOdxmdAa5CLsEpG3" }, { token: "5c6c9a37rnjarQR6l0WyrZKCWevU1mJSS4uiXce3" }, { token: "5c6c9cd5z9GkKmFgxpixPfeLYnELudHO5LBDL4T2" }, { token: "5c6c9f5fTvTpx7cOptZf7iTYSS6ZmNXKZ5YWniF3" }, { token: "5c6ca1c5enM36ga8wUyeZqTwwtlvq5jGd6welKH3" }, { token: "5c6ca413wvw9OtORfbwLBBBkuYNqoEzYl6EIyPQ3" }, { token: "5c6ca658RBm795DZ7YRJghX9cCJSejZDUQZhlk32" }, { token: "5c6ca848WatvEDrOqwaEfUNSD4QYSbYtBOmJlee2" }, { token: "5c6caa4azxtCnbVnsC3sGvjADfVDsg1rvoeqCDp3" }, { token: "5c6cac5aiivuItD7TX5SzF1V9hkq5i4O7ucCZ083" }, { token: "5c6cae5at8GYNXz5wdzWslHeLpgv24mvalSz9lN3" }, { token: "5c6cb05bSQW05ZRSGOPql3CwhG3HjPULSOqlPYF3" }, { token: "5c6cb253hi3zVmtohwdnc1dH1QoZYAdpWxb54Xy2" }, { token: "5c6cb44aCpKcPk9dbS54TUHMVf4uaZmTtYPnTc23" }, { token: "5c6cb631CpqBppPIXoiaCCaTigNtN5LVWzSupkT3" }, { token: "5c6cb80fwDStrxgGKlFT78H6yvao7wYahWJv7NS2" }, { token: "5c6cb9f3R2obvoiRKkQZQdm1mmkcpS79m9BVmDR3" }, { token: "5c6cbbd3ACnLYaCwBBs4qLuNnk5fz74p5uAVBZ62" }, { token: "5c6cbda3tH1uNBhOAHVjWb16seJR87csKJuxPd32" }, { token: "5c6cbf73oCE79n1x3OGFItOKU6em26nHn0aXOVb3" }, { token: "5c6cc1438ZeT2ZbjLTD7kxgVQrURMQnobBanl7X3" }, { token: "5c6cc313B2l2Vx9AJtZxh12Lu8yXEEbsaRkfz9W3" }, { token: "5c6cc4eaBzvvdyYPJQ7hqou95abbzpt0e4XuESG3" }, { token: "5c6cc6b4yrCafnMjDJkpJviuT40YjKfW6JtEGnP3" }, { token: "5c6cc87d0PTl9gcWXs8Fj43RzC9Ql8BiCT5WMR42" }, { token: "5c6cca4bmdnrQp3mM4Vb2GbqXx4sjCFmcXhO2Cp3" }, { token: "5c6ccc18rQZrHOIvBmR2qwVhqCZu5FtIc4tVcOG3" }, { token: "5c6ccde8aDRhEY4Em20CAA6qBkSdCbanfS5n4JC3" }, { token: "5c6ccfb64oGIVFZgb6k4Q4watJnshVk2fq4TFT83" }, { token: "5c6cd17f9cDMR2PgtuEklGCXBr0i411p8DErlPr3" }, { token: "5c6cd3338ftdwMbIFOyn4TFcJ1lOOSEjtukIJu03" }, { token: "5c6cd4f69ogP1QFrLVhFxDtJ4QwGB7cHIRy8RDk3" }, { token: "5c6cd6b6hFc0pD1LTPfymORS5WRhC3uv8nDmber2" }, { token: "5c6cd864hA3Rtmb5QUi0W6Y9BMBh8E6Lipc78tQ3" }, { token: "5c6cda15XJ4xiAsY2PAs7wx5nDiei9LfkImgdMj2" }, { token: "5c6cdbc4Plwak4CedPxxPRwndEe5kvFenGSzr7F3" }, { token: "5c6cde72cvQMouAe4SNgu0pYwu4tH3VaXK3L61D3" } ] }



 Comments   
Comment by JackWang [X] [ 27/Feb/19 ]

Hello, I am here for help. This is related to my work. My production line has accumulated 500 million data. Now the machine resources are alarmed. When I perform the remove operation, there is very little data (such as 10,000), and the machine loads the alarm. A lot of slow queries; I want to know how to quickly delete some of the data in the shard collection, looking forward to your reply

------------------ Original ------------------
From: "Eric Sedor (Jira)"<jira@mongodb.org>;
Date: Wed, Feb 27, 2019 04:57 AM
To: "hao.wang"<hao.wang@dingxiang-inc.com>;

Subject: [MongoDB-JIRA] (SERVER-39683) shard request to split a same chunk at the same time.

[ https://jira.mongodb.org/browse/SERVER-39683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=2163534#comment-2163534 ]

Eric Sedor 于 19-2-26 下午8:56 编辑了SERVER-39683 的评论:
-------------------------------------------------

To clarify, we are suggesting adding 2 mongos to bring the total to 5, and this is considered a mitigation but not a solution. SERVER-34448 in MongoDB 4.2 is considered the solution.

You would need to direct write traffic at all 5 via a connection string that included all of them. The goal of this change is to ensure that the majority of mongos-driven split attempts coincide with a chunk reaching its maximum size. Currently this is most accurately attained with 5 mongos receiving evenly distributed writes.

For further discussion about how the system works, please post on the mongodb-user group or Stack Overflow with the mongodb tag. A question like this involving more discussion would be best posted on the mongodb-user group.

原值 (作者: eric.sedor):
To clarify, we are suggesting adding 2 mongos to bring the total to 5, and this is considered a mitigation but not a solution. SERVER-34448 in MongoDB 4.2 is considered the solution.

You would need to direct write traffic at all 5 via a connection string that included all of them. The goal of this change is to ensure that the majority of mongos-driven split attempts to coincide with a chunk reaching its maximum size. Currently this is most accurately attained with 5 mongos receiving evenly distributed writes.

For further discussion about how the system works, please post on the mongodb-user group or Stack Overflow with the mongodb tag. A question like this involving more discussion would be best posted on the mongodb-user group.

----------------------
This message was sent from MongoDB's issue tracking system. To respond to this ticket, please login to https://jira.mongodb.org using your JIRA or MMS credentials.

Comment by JackWang [X] [ 27/Feb/19 ]

感谢你的耐心讲解,但是我的生产环境依然是偶尔机器负载高,并且那个时刻,shard节点一直打印“find the split vector ”,我并没有具体的解决措施,我的环境有5亿数据,我想快速删除一部分,但是没有好的方案。我现在很是苦恼,希望得到你的帮助

------------------ Original ------------------
From: "Eric Sedor (Jira)"<jira@mongodb.org>;
Date: Wed, Feb 27, 2019 04:57 AM
To: "hao.wang"<hao.wang@dingxiang-inc.com>;

Subject: [MongoDB-JIRA] (SERVER-39683) shard request to split a same chunk at the same time.

[ https://jira.mongodb.org/browse/SERVER-39683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=2163534#comment-2163534 ]

Eric Sedor 在 SERVER-39683中留言:
-------------------------------------

To clarify, we are suggesting adding 2 mongos to bring the total to 5, and this is considered a mitigation but not a solution. SERVER-34448 in MongoDB 4.2 is considered the solution.

You would need to direct write traffic at all 5 via a connection string that included all of them. The goal of this change is to ensure that the majority of mongos-driven split attempts to coincide with a chunk reaching its maximum size. Currently this is most accurately attained with 5 mongos receiving evenly distributed writes.

For further discussion about how the system works, please post on the mongodb-user group or Stack Overflow with the mongodb tag. A question like this involving more discussion would be best posted on the mongodb-user group.

----------------------
This message was sent from MongoDB's issue tracking system. To respond to this ticket, please login to https://jira.mongodb.org using your JIRA or MMS credentials.

Comment by Eric Sedor [ 26/Feb/19 ]

To clarify, we are suggesting adding 2 mongos to bring the total to 5, and this is considered a mitigation but not a solution. SERVER-34448 in MongoDB 4.2 is considered the solution.

You would need to direct write traffic at all 5 via a connection string that included all of them. The goal of this change is to ensure that the majority of mongos-driven split attempts coincide with a chunk reaching its maximum size. Currently this is most accurately attained with 5 mongos receiving evenly distributed writes.

For further discussion about how the system works, please post on the mongodb-user group or Stack Overflow with the mongodb tag. A question like this involving more discussion would be best posted on the mongodb-user group.

 

Comment by JackWang [X] [ 26/Feb/19 ]

Thank you for your reply, but I still can't understand why adding 5 mongos can solve this problem. If I increase it from 5 mongos to 5, but there are no application connection operations, can I do this? Still 5 mongos must have a connection.

Comment by Eric Sedor [ 25/Feb/19 ]

Hi JackWang@180721, thank you for your patience.

We believe this is expected behavior resulting from how mongos nodes estimate when a chunk split needs to occur. Our current efforts around SERVER-34448 are expected to remove this issue by shifting split responsibility to mongod.

In the meantime, you may be able to work around the impact of this issue by increasing the number of your mongos routers to 5 to match assumptions made by the mongos autoSplit algorithm (which is influenced by a splitTestFactor of 5).

Comment by JackWang [X] [ 21/Feb/19 ]

3 mongos 3 config and 3 shard ;
Looking forward to your reply

Comment by Eric Sedor [ 21/Feb/19 ]

Thanks for writing in. We are investigating and will get back to you with any questions we have. For now we did have one:

Can you let us know how many mongos routers are in this deployment?

Generated at Thu Feb 08 04:52:48 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.