[SERVER-32827] Initial sync can fail when syncing a capped collection if the capped collection rolls over on the sync source Created: 22/Jan/18 Updated: 06/Dec/22 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Querying, Replication |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Sergey | Assignee: | Backlog - Replication Team |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Master: MongoDB 3.4.5 |
||
| Attachments: |
|
||||||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||||||
| Assigned Teams: |
Replication
|
||||||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||||||||||||||
| Description |
|
There is a problem with an initial sync. Several attempts have failed with the following error: The capped collections size on which the errors are occured: 30 - 100 GB Here are some more detailed info about the collections: Number of CappedPositionLost errors, collection name, capped size, capacity The logs for the 7 failed attempts to perform the initial sync are attached. Currently there is only one alive instance is left in the replica set on our production system. Please help us to bring the replica up. |
| Comments |
| Comment by Louis Williams [ 11/Oct/21 ] |
|
Moving back to "Open" because the dependent ticket, |
| Comment by Judah Schvimer [ 15/Mar/21 ] |
|
Thank you for reaching out! Thanks, |
| Comment by Erlon Cruz [ 14/Mar/21 ] |
|
Hi folks, what would it take to fix this bug? We have a customer with this problem but I needed to understand if this is something easily fixable or would require large or structural changes to Mongo. |
| Comment by Sergey [ 23/Jan/18 ] |
|
I reproduced the problem on test environment. Please see the attached test.tar.gz. It contains a script to reproduce the problem and the logs of the test run from my computer. m1, m2 are two replicas. In the test a new replica (m3) is added and MongoDB gives the same error as the error we had on our production environment (CappedPositionLost). test.sh is a script to reproduce the problem with MongoDB when an initial sync of a replica fails. The problem occurs when there is a capped collection with a secondary index and high insert rate and a new replica performs the initial sync from the existing member of the replica set. By the time the new replica finishes building the indexes for the capped collection the collection's data is already washed out by the new data and the new replica reports CappedPositionLost error and the initial sync fails. How to run the test: test.sh has been tested on MacBook Pro 15" 2016 on Intel Core i7 and on Dell notebook with Intel Core i5 on Docker v17. If you have a slow CPU please increase DOCKER_NEW_REPLICA_CPUS parameter, or decrease otherwise. |
| Comment by Mark Agarunov [ 22/Jan/18 ] |
|
Hello plsfixmymongo, Thank you for the report. To get a better idea of what may be causing the CappedPositionLost error could you please provide the following:
This should give us some insight into this behavior. Thanks, |
| Comment by Sergey [ 22/Jan/18 ] |
|
The capped collections have secondary indexes |
| Comment by Sergey [ 22/Jan/18 ] |
|
This issue is probably related to the subject: TOOLS-1636 |