[SERVER-58050] mongod standalone sharded replicaset issue during upgrade 4.4.0 to 4.4.6 Created: 24/Jun/21 Updated: 28/Jul/21 Resolved: 28/Jul/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Question | Priority: | Major - P3 |
| Reporter: | Kin Wai Cheung | Assignee: | Eric Sedor |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Participants: |
| Description |
|
Hi, on our dev env we run ong 4.4.0 binaries consisting of: 1 standalone config replica set 2 standalone shards replicaset 1 router I've disabled balancer, stopped config and started with 4.4.6 binaries. This went fine. Then I proceeded with stopping shard0001 and starting with 4.4.6 binaries. At this moment it seems to not be progressing for over 20 minutes. log of the mongod:
|
| Comments |
| Comment by Eric Sedor [ 28/Jul/21 ] |
|
Hi kinwai.cheung@clarivate.com, at this point I'll close the ticket. But, please feel free to comment again or open a new ticket if you see such a startup time again. |
| Comment by Eric Sedor [ 01/Jul/21 ] |
|
Thanks kinwai.cheung@clarivate.com, I'm going to keep this ticket open for now in case it comes up again soon. Feel free to update any time. |
| Comment by Kin Wai Cheung [ 01/Jul/21 ] |
|
Sure , atm the cluster have been upgraded to 4.4.6 I did a clean stop afterwards and restarted without any long startup (as expected). But i'll consider upgrading to 4.4.7 if we face a long startup again. |
| Comment by Eric Sedor [ 30/Jun/21 ] |
|
Hi kinwai.cheung@clarivate.com That probably won't help and we recommend against any upgrade process that differs from the documented order of upgrade steps. Instead, this looks like it is related to startup time on a single node during the WiredTiger recovery process on startup. I do see evidence in the diagnostic data that a relatively large number of data handles were activated (data-handle connection data handles currently active) during the startup of shard0000:
This looks a lot like an issue that was improved by While working on Would you be willing to upgrade to 4.4.7 when it becomes available, and provide logs and diagnostic data for a long startup time on that version, so that we can investigate this issue with improved logging? |
| Comment by Kin Wai Cheung [ 29/Jun/21 ] |
|
Can we avoid this by doing the following in a sharded cluster of 1 member replica sets?
then start with new binaries:
|
| Comment by Kin Wai Cheung [ 29/Jun/21 ] |
|
@Eric Sedor
requested files have been uploaded |
| Comment by Eric Sedor [ 28/Jun/21 ] |
|
Hi kinwai.cheung@clarivate.com, we can take a look. I've created a secure upload portal for you. Files uploaded to this portal are visible only to MongoDB employees and are routinely deleted after some time. For the node you saw the problem, would you please archive (tar or zip) and upload to that link:
|
| Comment by Kin Wai Cheung [ 24/Jun/21 ] |
|
FYI: mongod recovery finished after 1 hour and 5 minutes. Is this normal behaviour and will it occur during 3 member replica set upgrade? |
| Comment by Kin Wai Cheung [ 24/Jun/21 ] |
|
I see similair issues raised under: https://jira.mongodb.org/browse/WT-7452 https://jira.mongodb.org/browse/SERVER-56222
Another question: I don't seem to have encountered this when we upgraded from 4.0.0 to 4.2.8 to 4.4.0. Is this because we run a standalone replica set or will we encounter the same if we upgrade one member at a time in a 3 member replica set?
|