[SERVER-40065] spike latency issue on shard cluster Created: 11/Mar/19 Updated: 22/Apr/19 Resolved: 22/Apr/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Performance |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Adun | Assignee: | Eric Sedor |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Operating System: | ALL | ||||||||
| Participants: | |||||||||
| Description |
|
on MongoDB 3.4.19, shard cluster with 2 shards. collection `foo` is sharded with hashed key "UID"
we write a test case in mongo-cxx-driver to test the latency.
output as below:
What we did:
What we found: 1. the latency has many spike, such as 4000+ ms in some update_one operation, but not any slow log in mongod server(we set slowms to 10ms).
3. after we run `rs.stepDown()` on config Primary, the spike gone away(FROM: ./testmongo | awk '{s+=$NF}END{print s}')
What we tried:
|
| Comments |
| Comment by Eric Sedor [ 22/Apr/19 ] |
|
Hi stutiredboy@gmail.com; we are closing this ticket as a duplicate of |
| Comment by Eric Sedor [ 18/Apr/19 ] |
|
Thanks for pointing this out, stutiredboy@gmail.com; we'll re-open this ticket, check on this relationship, and also see how we might have diagnosed this sooner. |
| Comment by Adun [ 18/Apr/19 ] |
|
@Eric issue https://jira.mongodb.org/browse/SERVER-40707 is the root cause of this issue |
| Comment by Eric Sedor [ 19/Mar/19 ] |
|
Thanks stutiredboy@gmail.com; unfortunately we're not seeing anything to indicate a bug. We would expect a server-side problem introducing this latency would result in logged slow ops. Without that, there are other possibilities such as network latency or a driver-side configuration issue such as connection pool size. As this ticket does not appear to a bug, I will now close it. If you need further assistance troubleshooting, please post on the mongodb-user group or on Stack Overflow with the mongodb tag. If you are able to narrow this experiment down to a reproducible bug, we'd be interested in that. |
| Comment by Eric Sedor [ 18/Mar/19 ] |
|
We appreciate your patience stutiredboy@gmail.com, and are continuing to review and discuss internally. |
| Comment by Adun [ 14/Mar/19 ] |
|
we run on MongoDB 3.4 again
$ sort -k4 -n -r 7.log
the slowms is 100ms, but we can't get any slow log in mongod.
the diganostic.data from 3.4.19 were attached too. |
| Comment by Adun [ 14/Mar/19 ] |
|
yes, awk dicard the decimal point, the real total value may be as: 135259.646036 ``` uid 10035044 timediff 1.547164 ```
i have attached diagnostic.data from 4.0.6
|
| Comment by Eric Sedor [ 13/Mar/19 ] |
|
It is very strange to hear about 4,000ms of latency when no slow operations are logged. Can you also clarify the numbers in the awk output above? Is there a decimal point missing? If not and this is milliseconds then a lack of slow logged ops is very unexpected. For the 4.0.6 deployment, would you please run this test, archive (tar or zip) the $dbpath/diagnostic.data directory (described here) and attach it to this ticket? |
| Comment by Eric Sedor [ 13/Mar/19 ] |
|
Hello and thank you for the detail you've provided so far. We are investigating and will let you know if we have additional questions. |
| Comment by Adun [ 12/Mar/19 ] |
|
What we tried: 10. on MongoDB 3.6.5, compare to normal results on 3.4.19(about 10xxxx), it has a little higher latency, but stable:
11. on MongoDB 4.0.6, compare to normal results on 3.4.19((about 10xxxx)), it has higher latency, but stable:
as our benchmark, we know 3.6 and 4.0 have lower performance than before, so we think the result from 3.6 and 4.0 and normal, without high spike. |
| Comment by Adun [ 11/Mar/19 ] |
the output above is from:
What we tried: 9. write the test with mongo-cxx-driver r3.3.1 and r3.4.0, on MongoDB 3.4, they have the same behave: unstable latency with high spike |