[SERVER-54876] Upgrade mongodb 4.4.4 from 4.2.12 failed Created: 02/Mar/21 Updated: 06/Dec/22 Resolved: 12/May/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 4.4.3, 4.4.4 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Nicola Battista | Assignee: | Backlog - Replication Team |
| Resolution: | Done | Votes: | 0 |
| Labels: | Upgrade/Downgrade, post-rc0 | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Vmware virtual machines Centos 7.9. |
||
| Attachments: |
|
||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||
| Assigned Teams: |
Replication
|
||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||
| Backport Requested: |
v4.4
|
||||||||||||||||||||||||
| Sprint: | Repl 2021-04-19, Repl 2021-05-03, Repl 2021-05-17 | ||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Description |
|
Hi, we've a replica set with 6 nodes mongodb e one arbiter. I tried upgrade from 4.2.12 to 4.4.4 (rolling upgrade) and i did the following steps:
This is the output log trace before the node crash :
I tried also with the version 4.4.3 but have the same errors.
|
| Comments |
| Comment by Matthew Russotto [ 12/May/21 ] | |
|
Investigation is complete; fix will be tracked as SERVER-56619 and | |
| Comment by Matthew Russotto [ 04/May/21 ] | |
|
It turns out we can get an optime/wall time which is a "ghost in the machine". If an arbiter gets a durable optime/wall time at any point (which can happen if it goes into REMOVED while it has an oplog entry; there may be other bugs which can cause this), that optime will be passed to the other members of the set. Even if the arbiter is subsequently restarted or even resynced, that durable optime will be greater than the (correct) null durable optime the arbiter is sending in heartbeats, so it will remain in the state of the other nodes in the system, which will pass it around using replSetUpdatePosition. | |
| Comment by Nicola Battista [ 07/Apr/21 ] | |
|
Hi Matthew, On the Arbiter : xxx-mongodb:ARBITER> use local
Regards Nicola | |
| Comment by Matthew Russotto [ 07/Apr/21 ] | |
|
dmitry.agranat What would be required is that the arbiter was at one point not an arbiter, and wrote some oplog entries with a walltime of 0 (which could have happened in some previous versions). Then it became an arbiter; we would continue to send that last oplog time as the last durable optime of the arbiter, which would crash a 4.4 node. So to check this, I would like to see if we can get the last entry in the oplog collection (local.oplog.rs) on the arbiter. There really shouldn't be such a collection; if there isn't, that isn't the problem. | |
| Comment by Dmitry Agranat [ 17/Mar/21 ] | |
|
Thanks nicola.battista89@gmail.com. We're assigning this ticket to the appropriate team to be further investigated. Updates will be posted on this ticket as they happen. | |
| Comment by Nicola Battista [ 17/Mar/21 ] | |
|
Hi, All nodes have this output : db.adminCommand( { getParameter: 1, featureCompatibilityVersion: 1 } ) , },
Thanks Regards. Nicola
| |
| Comment by Dmitry Agranat [ 17/Mar/21 ] | |
|
Thanks nicola.battista89@gmail.com for uploading mongodb logs. I have another question, what is the current featureCompatibilityVersion on all 7 members of this replica set? To view the featureCompatibilityVersion for a mongod instance, run the following command on each mongod instance:
| |
| Comment by Nicola Battista [ 17/Mar/21 ] | |
|
Hi, i've sent the mongodb logs called mongod_tp2_mongo00.log.gz and mongod_tp2_mongo01.log.gz. Thanks Regards Nicola
| |
| Comment by Dmitry Agranat [ 17/Mar/21 ] | |
|
Hi nicola.battista89@gmail.com, the uploaded data only contains diagnostic.data archive but no mongod logs. Can you upload mongod logs covering the time of the reported event? | |
| Comment by Nicola Battista [ 15/Mar/21 ] | |
|
Hi Dmitry Agranat, I've sent you the files in the secure portal. the name files called : log_tp2-mongo00.tar.gz and log_tp2-mongo01.tar.gz Thank you. Regards Nicola | |
| Comment by Nicola Battista [ 08/Mar/21 ] | |
|
Hi, Being a production database, tomorrow we will replicate the bug again and i'll send you the required logs. Thank you. Regards. Nicola | |
| Comment by Dmitry Agranat [ 08/Mar/21 ] | |
|
Hi nicola.battista89@gmail.com, We still need additional information to diagnose the problem. If this is still an issue for you, would you please compress and upload it into this secure portal:
Thanks, | |
| Comment by Dmitry Agranat [ 02/Mar/21 ] | |
|
Hi nicola.battista89@gmail.com, thank you for the report. For completeness, could you please compress and upload into this secure portal:
Thanks, |