[SERVER-21100] InitialSync finishes before indexes finish building Created: 23/Oct/15 Updated: 11/Nov/15 Resolved: 11/Nov/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 3.0.7 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Stefan Wojcik | Assignee: | Wan Bachtiar |
| Resolution: | Cannot Reproduce | Votes: | 3 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Operating System: | ALL | ||||||||
| Participants: | |||||||||
| Description |
|
I recently added a new node to our replica set. It caused errors in our application, because it finished the initialSync and switched to a valid SECONDARY before its indexes finished building. Since we send some of the queries to our secondary, that resulted with a) really slow queries, and b) errors (since we rely on hints in some scenarios) Below is a short excerpt from our logs showing the issue. Let me know if you need the whole log.
|
| Comments |
| Comment by Wan Bachtiar [ 11/Nov/15 ] | |||||||||||||||||||||||||
|
Hi Stefan, Thanks for the information. As noted in Stephen's earlier reply, as of MongoDB v3.0 builds for indexes that existed prior to initial sync should be completed before the node transitions into SECONDARY state (as per SERVER-14737) regardless of whether indexes were originally created with background:true. If you are still seeing this background index issue with all the nodes upgraded to v3.0.7, please open a new issue with all the relevant log files. Regards, Wan. | |||||||||||||||||||||||||
| Comment by Stefan Wojcik [ 10/Nov/15 ] | |||||||||||||||||||||||||
|
Hi Wan, Thanks, this is insightful. If you look at the attached logs: 2015-10-23T23:28:17 is when the initial sync from ciodb17 started
I'm curious why the logs have a form of [repl index builder N] here (indicating a "background index building process from a replication"). I'm 100% certain these indexes have existed on the primary for weeks. The logs from the primary prove that no extra indexes were created during the initial sync. I might be relevant that this happened during an upgrade process, in which I added secondary v3.0.7 nodes to a cluster in which the primary nodes were v2.6.11. | |||||||||||||||||||||||||
| Comment by Wan Bachtiar [ 10/Nov/15 ] | |||||||||||||||||||||||||
|
Hi Stefan, In an attempt to reproduce the issue that you have described, I ran a small sharded cluster using MongoDB v3.0.7: 2 shards with 3 replica members on each shard. During the test, I noticed that a typical initial sync process of building an index (foreground and background) would show up in the log with tag entry: [rsSync] . For example:
While a log tag entry of [repl index builder <N>] would indicate a background index building process from a replication. For example:
While a foreground index building would have a log tag entry of [repl writer worker <N>]:
In my tests, the replication process of index creation is the only method that could create an index building after the secondary transitioned into SECONDARY state. A log snippet from a test secondary member:
This behaviour was triggered by a background index creation through a mongos, while a secondary member was in initial-sync cloning process. Can you confirm that all of your nodes were using v3.0.7 noted on the original ticket information ? Kind regards, Wan. | |||||||||||||||||||||||||
| Comment by Stefan Wojcik [ 03/Nov/15 ] | |||||||||||||||||||||||||
|
> Was this an existing index on the primary, or a new index build that may have been created (or in progress) after the initial sync started? It was an existing index. > if you haven't tried 3.0.7 without hinting this could be a worthwhile test. Yup, I plan on doing that. We still have some hints in the code because v2.6.11 was really bad at choosing the right indexes. Regardless, even if it didn't fail with a "bad hint" error, it would utilize a sub-optimal index or it would do a collection scan, which is almost as bad as failing completely. | |||||||||||||||||||||||||
| Comment by Stennie Steneker (Inactive) [ 03/Nov/15 ] | |||||||||||||||||||||||||
|
Hi Stefan, We haven't been able to reproduce this issue yet, but have reopened to investigate further. In MongoDB 3.0 builds for indexes that existed prior to initial sync should be completed before the node transitions into secondary state (as per Your timeline doesn't note when the background index with a bad hint was created. Was this an existing index on the primary, or a new index build that may have been created (or in progress) after the initial sync started? As noted in Wan's earlier response, it would also be worth investigating why you need the $hint via a discussion in the mongodb-user group. There have been some significant improvements to the query plan revision behaviour (in particular Regards, | |||||||||||||||||||||||||
| Comment by Stefan Wojcik [ 02/Nov/15 ] | |||||||||||||||||||||||||
|
I understand Wan, but isn't it a flawed logic to mark a newly added node as a valid secondary before all the indexes are built? | |||||||||||||||||||||||||
| Comment by Wan Bachtiar [ 02/Nov/15 ] | |||||||||||||||||||||||||
|
Hi Stefan, Thanks for the report. If you are setting your connection to read from secondary members and also using $hint to force specific indexes to fulfil the query, this would be an expected behaviour. In this case, the $hint operator forces the query planner to use certain indexes that have not been built on the new secondary. Instead of using $hint it is generally recommended to rely on the query planner. If you have any questions on why the query planner isn't choosing the expected index, please post on the mongodb-user group with information on the indexes, the query, and the explain(true) output of the query. Regards, Wan | |||||||||||||||||||||||||
| Comment by Stefan Wojcik [ 28/Oct/15 ] | |||||||||||||||||||||||||
|
Ok, let me know if I can help in any other way. Wonder if it has something to do with indexes that have `background: true`, like the one below:
| |||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 28/Oct/15 ] | |||||||||||||||||||||||||
|
stefan@close.io, from the logs you provided it seems there could be a bug here. Unfortunately I haven't yet found a hint in the logs as to how this could be happening, so the next step is to try to reproduce this behavior locally. | |||||||||||||||||||||||||
| Comment by Stefan Wojcik [ 27/Oct/15 ] | |||||||||||||||||||||||||
|
@Ramon, following up on this issue | |||||||||||||||||||||||||
| Comment by Stefan Wojcik [ 23/Oct/15 ] | |||||||||||||||||||||||||
|
Also, rough timeline info should be helpful: (ciodb17 is the primary, ciodb1a is the newly added server) 2015-10-23T23:28:17 initial sync from ciodb17 started | |||||||||||||||||||||||||
| Comment by Stefan Wojcik [ 23/Oct/15 ] | |||||||||||||||||||||||||
|
Thanks for a quick response Ramon. I attached the files you asked for. Let me know if you need anything else to isolate the issue. One thing I'd add is that most of the indexes on our primary have been added as background indexes (and their spec still contains `background: true`). Not sure if it matters though. | |||||||||||||||||||||||||
| Comment by Stefan Wojcik [ 23/Oct/15 ] | |||||||||||||||||||||||||
|
Attaching rs.conf()'s output, and logs for the primary and the newly added secondary. | |||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 23/Oct/15 ] | |||||||||||||||||||||||||
|
Hi stefan@close.io, can you please send us the following information so we can investigate further:
Thanks, |