[SERVER-52932] Increased operation times after upgrade to 4.4.1 Created: 18/Nov/20 Updated: 02/Jun/22 Resolved: 17/May/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 4.4.1 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor - P4 |
| Reporter: | Alin Dumitru | Assignee: | Bruce Lucas (Inactive) |
| Resolution: | Incomplete | Votes: | 1 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Docker containers on Ubuntu 16.04 AWS VMs |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Operating System: | ALL | ||||||||
| Participants: | |||||||||
| Description |
|
We upgraded 2 replica sets from 4.2.8 to 4.4.1 and saw immediate performance degradation as query times went from milliseconds to seconds
I'll attach diagnostic.data from the runs with version 4.4.1 The client is Node.js and uses mongodb lib version 3.6.2 If you need additional info please let me know. |
| Comments |
| Comment by Bruce Lucas (Inactive) [ 17/May/22 ] | ||||
|
aljursav@gmail.com , valentin@plarium.com - sorry for the delay in responding. Can you please open separate tickets with supporting data, as it is possible you are encountering a different issue and it helps us to keep tickets specific to an issue. alin.silvian@gmail.com it has been some time since we've heard from you, so I'll close this ticket as "incomplete". Feel free to re-open it or open a new ticket with the requested data if you are still interested in pursuing this issue. | ||||
| Comment by Alex Savelev [ 31/Jan/22 ] | ||||
|
Good day! {"t": ,"s":"I", "c":"COMMAND", "id":51803, "ctx":"conn11144","msg":"Slow query","attr":{"type":"command","ns":"Viss_DAIRM2.auditLogEntries","command":{"insert":"auditLogEntries","ordered":true,"bypassDocumentValidation":false,"writeConcern": {"w":"majority","j":true},"txnNumber":27,"$db":"Viss_DAIRM2","lsid":{"id":{"$uuid":"3b5d4fa2-fa00-4c5b-afd3-0b0a0afb0191"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1643630891,"i":6}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}}},"ninserted":1,"keysInserted":3,"numYields":0,"reslen":230,"locks":{"ParallelBatchWriterMode":{"acquireCount":{"r":2}},"ReplicationStateTransition":{"acquireCount":{"w":3}},"Global":{"acquireCount":{"w":2}},"Database":{"acquireCount":{"w":2}},"Collection":{"acquireCount":{"w":2}},"Mutex":{"acquireCount": {"r":2}}},"flowControl":{"acquireCount":1,"timeAcquiringMicros":1},"writeConcern":{"w":"majority","j":true,"wtimeout":0,"provenance":"clientSupplied"},"storage":{"data":{"bytesRead":8461,"timeReadingMicros":46369}},"protocol":"op_msg","durationMillis":100}} Maybe there is some solution to this problem? | ||||
| Comment by Valentin Abalmasov [ 07/Sep/21 ] | ||||
|
What we currently see in mongodb.log of mongos is a lot of slow DNS resolution issues
i did hide the real DNS names but I can confirm that we have all those names in /etc/hosts file and they should be resolved locally | ||||
| Comment by Valentin Abalmasov [ 07/Sep/21 ] | ||||
|
Hi guys. We are experiencing the same problem with performance degradation after upgraded from 4.2.15-ent to 4.4.8-ent. I have uploaded required monogdb.log and diagnostic.data files to related ticket https://jira.mongodb.org/browse/SERVER-51104. Please, tell me if I can help you diagnosing the problem further. Attaching diagnostic data from MongoS and primary MongoD.
Thanks in advance, | ||||
| Comment by Alin Dumitru [ 22/Apr/21 ] | ||||
|
Hi. The diagnostic data was from a node showing the issue. Unfortunately I did not perform a new test because this can be reproduced only on intensive production DB servers and we did have issues with rollback the first time, which resulted in an extended downtime. I do plan to replicate the test because, as far as I can tell, this affects versions as far as 4.4.4. I'll get back to you as soon I have a safe setup for this scenario. | ||||
| Comment by Eric Sedor [ 20/Apr/21 ] | ||||
|
Hi alin.silvian@gmail.com, I wanted to follow up on bruce.lucas's last request. Are you able to provide any of the requested information? | ||||
| Comment by Bruce Lucas (Inactive) [ 19/Nov/20 ] | ||||
|
Thanks. I do see some subtle changes in metrics at the time when you saw the increased latency, but nothing to suggest to me why you might be seeing a latency tail of tens of seconds (assuming I'm reading your chart correctly). Some more questions:
| ||||
| Comment by Alin Dumitru [ 19/Nov/20 ] | ||||
|
Sure. It’s GMT + 2 | ||||
| Comment by Bruce Lucas (Inactive) [ 19/Nov/20 ] | ||||
|
Thanks, I will take a look and let you know. Can you tell me what the timezone for that chart is? | ||||
| Comment by Alin Dumitru [ 19/Nov/20 ] | ||||
|
Hello, They reverted to normal after switching back to 4.2.8. | ||||
| Comment by Bruce Lucas (Inactive) [ 18/Nov/20 ] | ||||
|
alin.silvian@gmail.com the metrics show symptoms of a problematic lock in the storage layer:
Starting from A we see a large number of threads waiting on the "read timestamp queue lock", accompanied by a high rate of context switches and system cpu utilization, which is characteristic of an inefficient spin lock. We addressed this problem in 4.4.2 with |