[SERVER-19311] mongoD becomes SECONDARY before finishing index builds during initial sync Created: 07/Jul/15 Updated: 08/Jul/15 Resolved: 08/Jul/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 2.6.10 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Dai Shi | Assignee: | Sam Kleinman (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Operating System: | ALL |
| Steps To Reproduce: | Stop a secondary member in this replica set. Wipe all data in dbpath and restart. The replica will start initial sync, and somewhere in the middle of the index build stage the replica will transition to SECONDARY mode before the index build is finished. |
| Participants: |
| Description |
|
When performing an initial sync on one of our sharded clusters, the replica transitions into SECONDARY mode before the background index builds are complete. This causes queries to run very slowly against this replica as the indexes are not complete. I have a log of this replica during the sync, but first wanted to ask how we should scrub before attaching here since it is our first time submitting a public ticket. |
| Comments |
| Comment by Sam Kleinman (Inactive) [ 07/Jul/15 ] |
|
It seems as if there are three approaches to this issue:
I'm going to go ahead and close this issue, but feel free to open a support request and we can work on specific solutions. |
| Comment by Dai Shi [ 07/Jul/15 ] |
|
That is also not a great option since reconfig typically triggers a primary election, which causes many seconds of being unable to serve traffic. If we did this for all replica sets, we would serve a ton of site 500s both before and after the initial sync. This is not to mention the added work it would take to harden the code to automate reconfiguring replica sets and issuing the reconfig commands properly and ensuring they did the right thing each time (we have several hundred replica sets, so we can't do this by hand). |
| Comment by Eric Milkie [ 07/Jul/15 ] |
|
You can make the member hidden before forcing an initial sync. |
| Comment by Dai Shi [ 07/Jul/15 ] |
|
This unfortunately is not an option for us, as we actually force an initial sync on one member of every replica set every day due to fragmentation. This only started happening after upgrading to 2.6, and was not an issue in 2.4. |
| Comment by Eric Milkie [ 07/Jul/15 ] |
|
It is unlikely to be fixed in 2.4 or 2.6 because the behavior change was a side effect of rewriting the index building code. In this situation, it is advisable to add a new node to a replica set as hidden and non-voting at first, to avoid reducing availability while the node is running initial sync. Once the node has completed building all the indexes, reconfigure the set and promote the node to a full member. |
| Comment by Dai Shi [ 07/Jul/15 ] |
|
This seems like a pretty severe issue, since replicas that go into SECONDARY state before they have all the indexes queue like crazy in production. Are you saying the fix will not be backported to 2.6? |
| Comment by Eric Milkie [ 07/Jul/15 ] |
|
I believe this behavior was corrected in MongoDB 3.0, whereupon initial sync will always build all indexes in the foreground. It also builds them more efficiently than version 2.6. |