[SERVER-60553] Secondary replicaset initial sync errors with "NotWritablePrimary: Not-primary error while processing 'find' operation on 'database_production' database via fire-and-forget command execution." Created: 08/Oct/21 Updated: 29/Oct/23 Resolved: 01/Nov/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 4.4.5, 5.0.3 |
| Fix Version/s: | 6.2.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Robert Beekman | Assignee: | Amirsaman Memaripour |
| Resolution: | Fixed | Votes: | 1 |
| Labels: | connection-pool, executor, replica-set | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Operating System: | ALL | ||||||||
| Sprint: | Service Arch 2022-10-03, Service Arch 2022-10-17, Service Arch 2022-10-31, Service Arch 2022-11-14 | ||||||||
| Participants: | |||||||||
| Case: | (copied to CRM) | ||||||||
| Description |
|
We have a 3-node replicaset where I'm trying to replace a node with a new server. To do this I'd like to add the new server to the replicaset and remove an old one eventually.
I'm trying to add a member to this replicaset, but I see a ton of errors in the syncing replica member's log.
On the primary server, where the replica is syncing from I can see a ton of:
The machines can connect to each other and it seems some data is being synced, albeit very slowly.
The primary runs 4.4.5 and the secondary 4.4.5, but I also tried 4.4.9 with the same results.
Any idea what's going on? |
| Comments |
| Comment by Amirsaman Memaripour [ 01/Nov/22 ] | |||||||
|
Pushed a patch to temporarily suppress logs for failed fire-and-forget, mirroring operations on recovering secondaries. We can revert the changes upon completion of SERVER-70510. Closing this as Fixed since the concerning logs on recovering secondaries are addressed by the fix. | |||||||
| Comment by Githook User [ 31/Oct/22 ] | |||||||
|
Author: {'name': 'Amirsaman Memaripour', 'email': 'amirsaman.memaripour@mongodb.com', 'username': 'samanca'}Message: | |||||||
| Comment by Edwin Zhou [ 18/May/22 ] | |||||||
|
Thank you for following up brooke.miller@mongodb.com and robert@appsignal.com. I'll keep this ticket open since it's a separate issue to WT-7570, and we're investigating the
error which seems to occur on 4.4 during initial sync. | |||||||
| Comment by Brooke Miller [ 16/May/22 ] | |||||||
|
Thank you, robert@appsignal.com, for the update. edwin.zhou@mongodb.com, does it make sense to close this ticket since Robert is being assisted with a separate segfault issue in WT-7570? | |||||||
| Comment by Robert Beekman [ 13/May/22 ] | |||||||
|
Hi @Brooke Miller,
We haven't seen it happen anymore, though we haven't done initial syncs in a while either. Coincidentally we did have a segfault this morning on a secondary, but It seems similar to this issue: https://jira.mongodb.org/browse/WT-7570
Best, Robert | |||||||
| Comment by Brooke Miller [ 12/May/22 ] | |||||||
|
We still need additional information to diagnose the problem. If this is still an issue for you, would you please provide the information requested by Edwin above? Regards, Brooke | |||||||
| Comment by Edwin Zhou [ 07/Mar/22 ] | |||||||
|
I apologize for having missed your follow up. Thank you for letting us know you've experienced this issue again after replacing a disk. If this is still an issue for you or if you still have logs and diagnostics from the previous incident, please upload them through this upload portal. Best, | |||||||
| Comment by Robert Beekman [ 22/Nov/21 ] | |||||||
|
Hi Edwin Zhou! We just experienced the same issue while replacing the disk in a MongoDB server. During the replication from another server we see a ton of these errors in the log. If you can provide a new uploader link that's active I can upload the requested data. Best, Robert | |||||||
| Comment by Edwin Zhou [ 18/Nov/21 ] | |||||||
|
We haven’t heard back from you for some time, so I’m going to close this ticket. If this is still an issue for you, please provide additional information and we will reopen the ticket. Best, | |||||||
| Comment by Edwin Zhou [ 03/Nov/21 ] | |||||||
|
We still need additional information to diagnose the problem. If this is still an issue for you, would you please archive (tar or zip) the mongod.log files covering the incident and the $dbpath/diagnostic.data directory and upload them to this support uploader location? Best, | |||||||
| Comment by Edwin Zhou [ 18/Oct/21 ] | |||||||
|
Would you please archive (tar or zip) the mongod.log files covering the incident and the $dbpath/diagnostic.data directory (the contents are described here) and upload them to this support uploader location? I've created a secure upload portal for you. Files uploaded to this portal are hosted on Box, are visible only to MongoDB employees, and are routinely deleted after some time. Best, | |||||||
| Comment by Robert Beekman [ 08/Oct/21 ] | |||||||
|
Complete log of a connection that triggers this error:
Primary side:
Syncing secondary side:
|