[SERVER-21100] InitialSync finishes before indexes finish building Created: 23/Oct/15  Updated: 11/Nov/15  Resolved: 11/Nov/15

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 3.0.7
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Stefan Wojcik Assignee: Wan Bachtiar
Resolution: Cannot Reproduce Votes: 3
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File ciodb17_2015_10_23.log     Text File ciodb1a_2015_10_23.log     Text File shard_1_conf.log    
Issue Links:
Related
is related to SERVER-14737 Initial sync uses background index bu... Closed
Operating System: ALL
Participants:

 Description   

I recently added a new node to our replica set. It caused errors in our application, because it finished the initialSync and switched to a valid SECONDARY before its indexes finished building. Since we send some of the queries to our secondary, that resulted with a) really slow queries, and b) errors (since we rely on hints in some scenarios)

Below is a short excerpt from our logs showing the issue. Let me know if you need the whole log.

2015-10-23T05:23:45.533+0000 I -        [repl index builder 9]   Index Build (background): 2787300/32715281 8%
2015-10-23T05:23:47.133+0000 I REPL     [rsSync] initial sync finishing up
2015-10-23T05:23:47.133+0000 I REPL     [rsSync] replSet set minValid=5629c44b:6
2015-10-23T05:23:47.133+0000 I REPL     [rsSync] initial sync done
2015-10-23T05:23:47.153+0000 I REPL     [ReplicationExecutor] transition to RECOVERING
2015-10-23T05:23:47.160+0000 I REPL     [ReplicationExecutor] transition to SECONDARY
2015-10-23T05:23:47.754+0000 I REPL     [ReplicationExecutor] could not find member to sync from
2015-10-23T05:23:48.008+0000 I -        [repl index builder 11]   Index Build (background): 2792400/32715290 8%(...)
(...)
2015-10-23T05:48:19.019+0000 I -        [repl index builder 10]   Index Build (background): 5575800/32715628 17%
2015-10-23T05:48:19.020+0000 I -        [repl index builder 16]   Index Build (background): 5575800/32715628 17%
2015-10-23T05:48:19.885+0000 I NETWORK  [initandlisten] connection accepted from 10.21.16.41:43272 #3573 (40 connections now open)
2015-10-23T05:48:19.898+0000 I ACCESS   [conn3573] Successfully authenticated as principal __system on local
2015-10-23T05:48:19.908+0000 I QUERY    [conn3573] assertion 17007 Unable to execute query: error processing query: ns=closeio.activity limit=50 skip=0
Tree: $and
    _cls == "Activity.Call"
    direction == "inbound"
    organization == { $ref: "organization", $id: "orga_alksjdfalksjdfklajc" }
    phone == "+6503334444"
    plivo_number $in [ { $ref: "plivo_number", $id: ObjectId('aklsjdfljsdfklajsdf') } ]
Sort: { date_created: -1 }
Proj: {}
 planner returned error: bad hint ns:closeio.activity query:{ $query: { plivo_number: { $in: [ { $ref: "plivo_number", $id: ObjectId('aklsjdfljsdfklajsdf') } ] }, organization: { $ref: "organization", $id: "orga_alksjdfalksjdfklajc" }, direction: "inbound", _cls: "Activity.Call", phone: "+6503334444" }, $readPreference: { mode: "secondary" }, $hint: { _cls: 1, organization: 1, phone: 1, lead: 1 }, $orderby: { date_created: -1 } }
2015-10-23T05:48:19.908+0000 I QUERY    [conn3573]  ntoskip:0 ntoreturn:50
2015-10-23T05:48:20.007+0000 I -        [repl index builder 4]   Index Build (background): 5577900/32715629 17%
2015-10-23T05:48:20.009+0000 I -        [repl index builder 17]   Index Build (background): 5577900/32715629 17%
2015-10-23T05:48:20.011+0000 I -        [repl index builder 12]   Index Build (background): 5577900/32715629 17%
(...)
2015-10-23T12:04:08.042+0000 I -        [repl index builder 13]   Index Build (background): 32749400/32752875 99%
2015-10-23T12:04:08.046+0000 I -        [repl index builder 16]   Index Build (background): 32749400/32752875 99%
2015-10-23T12:04:08.965+0000 I INDEX    [repl index builder 4] build index done.  scanned 32752154 total records. 34423 secs
2015-10-23T12:04:08.966+0000 I INDEX    [repl index builder 3] build index done.  scanned 32752154 total records. 34423 secs
2015-10-23T12:04:08.967+0000 I INDEX    [repl index builder 8] build index done.  scanned 32752154 total records. 25852 secs
2015-10-23T12:04:08.969+0000 I INDEX    [repl index builder 12] build index done.  scanned 32752154 total records. 25851 secs
2015-10-23T12:04:08.969+0000 I INDEX    [repl index builder 14] build index done.  scanned 32752154 total records. 25851 secs
2015-10-23T12:04:08.970+0000 I INDEX    [repl index builder 5] build index done.  scanned 32752154 total records. 34423 secs
2015-10-23T12:04:08.974+0000 I INDEX    [repl index builder 13] build index done.  scanned 32752154 total records. 25851 secs
2015-10-23T12:04:08.975+0000 I INDEX    [repl index builder 16] build index done.  scanned 32752154 total records. 25851 secs
2015-10-23T12:04:08.976+0000 I INDEX    [repl index builder 10] build index done.  scanned 32752154 total records. 25852 secs
2015-10-23T12:04:08.977+0000 I INDEX    [repl index builder 15] build index done.  scanned 32752154 total records. 25851 secs
2015-10-23T12:04:08.978+0000 I INDEX    [repl index builder 17] build index done.  scanned 32752154 total records. 25851 secs
2015-10-23T12:04:08.978+0000 I INDEX    [repl index builder 9] build index done.  scanned 32752154 total records. 25852 secs
2015-10-23T12:04:08.979+0000 I INDEX    [repl index builder 11] build index done.  scanned 32752154 total records. 25851 secs



 Comments   
Comment by Wan Bachtiar [ 11/Nov/15 ]

Hi Stefan,

Thanks for the information.

As noted in Stephen's earlier reply, as of MongoDB v3.0 builds for indexes that existed prior to initial sync should be completed before the node transitions into SECONDARY state (as per SERVER-14737) regardless of whether indexes were originally created with background:true.

If you are still seeing this background index issue with all the nodes upgraded to v3.0.7, please open a new issue with all the relevant log files.

Regards,

Wan.

Comment by Stefan Wojcik [ 10/Nov/15 ]

Hi Wan,

Thanks, this is insightful. If you look at the attached logs:

2015-10-23T23:28:17 is when the initial sync from ciodb17 started
2015-10-23T02:30:25 starting at this time, we can see a few background indexes starting to be built, i.e.:

2015-10-23T02:30:25.110+0000 I INDEX    [repl index builder 3] build index on: closeio.activity properties: { v: 1, key: { _cls: 1, organization: 1, date_created: -1 }, name: "_cls_1_organization_1_date_created_-1", ns: "closeio.activity", background: true }
2015-10-23T02:30:25.112+0000 I INDEX    [repl index builder 4] build index on: closeio.activity properties: { v: 1, key: { organization: 1, date_created: -1 }, name: "organization_1_date_created_-1", ns: "closeio.activity", background: true }
2015-10-23T02:30:25.135+0000 I INDEX    [repl index builder 5] build index on: closeio.activity properties: { v: 1, key: { lead: 1, user: 1, date_created: -1 }, name: "lead_1_user_1_date_created_-1", ns: "closeio.activity", background: true }

I'm curious why the logs have a form of [repl index builder N] here (indicating a "background index building process from a replication"). I'm 100% certain these indexes have existed on the primary for weeks. The logs from the primary prove that no extra indexes were created during the initial sync.

I might be relevant that this happened during an upgrade process, in which I added secondary v3.0.7 nodes to a cluster in which the primary nodes were v2.6.11.

Comment by Wan Bachtiar [ 10/Nov/15 ]

Hi Stefan,

In an attempt to reproduce the issue that you have described, I ran a small sharded cluster using MongoDB v3.0.7: 2 shards with 3 replica members on each shard.

During the test, I noticed that a typical initial sync process of building an index (foreground and background) would show up in the log with tag entry: [rsSync] . For example:

2015-11-03T10:59:51.219+1100 I -        [rsSync]   Index Build: 1569000/10000000 15%
2015-11-03T10:59:54.004+1100 I -        [rsSync]   Index Build: 2811500/10000000 28%

While a log tag entry of [repl index builder <N>] would indicate a background index building process from a replication. For example:

2015-11-10T11:42:20.004+1100 I -        [repl index builder 1]   Index Build (background): 464400/10000000 4%
2015-11-10T11:42:23.004+1100 I -        [repl index builder 1]   Index Build (background): 1191300/10000000 11%

While a foreground index building would have a log tag entry of [repl writer worker <N>]:

2015-11-10T05:35:22.558+0000 I INDEX    [repl writer worker 15] build index on: test.gen properties: { v: 1, key: { status: 1.0 }, name: "status_1", ns: "test.gen" }
2015-11-10T05:35:22.558+0000 I INDEX    [repl writer worker 15]          building index using bulk method

In my tests, the replication process of index creation is the only method that could create an index building after the secondary transitioned into SECONDARY state.

A log snippet from a test secondary member:

2015-11-10T05:38:52.550+0000 I NETWORK  [initandlisten] connection accepted from 10.0.0.160:33245 #33 (5 connections now open)
2015-11-10T05:38:56.007+0000 I -        [rsSync]   Index: (2/3) BTree Bottom Up Progress: 6604500/8452531 78%
2015-11-10T05:38:57.000+0000 I NETWORK  [conn31] end connection 10.0.0.161:39998 (4 connections now open)
2015-11-10T05:38:57.002+0000 I NETWORK  [initandlisten] connection accepted from 10.0.0.161:40001 #34 (5 connections now open)
2015-11-10T05:38:57.553+0000 I INDEX    [rsSync]         done building bottom layer, going to commit
2015-11-10T05:39:02.679+0000 I STORAGE  [FileAllocator] allocating new datafile /data/db/test.13, filling with zeroes...
2015-11-10T05:39:02.701+0000 I STORAGE  [FileAllocator] done allocating datafile /data/db/test.13, size: 511MB,  took 0.021 secs
2015-11-10T05:39:08.590+0000 I INDEX    [rsSync]         done building bottom layer, going to commit
2015-11-10T05:39:08.590+0000 I INDEX    [rsSync] build index done.  scanned 8452531 total records. 189 secs
2015-11-10T05:39:08.669+0000 I REPL     [rsSync] oplog sync 3 of 3
2015-11-10T05:39:08.670+0000 I REPL     [rsSync] initial sync finishing up
2015-11-10T05:39:08.670+0000 I REPL     [rsSync] replSet set minValid=56418229:1
2015-11-10T05:39:08.674+0000 I -        [repl index builder 2]   Index Build (background): 1595200/8452531 18%
2015-11-10T05:39:08.679+0000 I REPL     [rsSync] initial sync done
2015-11-10T05:39:08.692+0000 I REPL     [ReplicationExecutor] transition to RECOVERING
2015-11-10T05:39:08.693+0000 I REPL     [ReplicationExecutor] transition to SECONDARY
2015-11-10T05:39:11.003+0000 I -        [repl index builder 2]   Index Build (background): 2352300/8452531 27%
2015-11-10T05:39:13.011+0000 I REPL     [ReplicationExecutor] could not find member to sync from
2015-11-10T05:39:14.250+0000 I -        [repl index builder 2]   Index Build (background): 2583600/8452531 30%

This behaviour was triggered by a background index creation through a mongos, while a secondary member was in initial-sync cloning process.
Unfortunately I am unable to reproduce this behaviour outside of the replication.

Can you confirm that all of your nodes were using v3.0.7 noted on the original ticket information ?
Do you have any further context that might be helpful to reproduce the issue?

Kind regards,

Wan.

Comment by Stefan Wojcik [ 03/Nov/15 ]

> Was this an existing index on the primary, or a new index build that may have been created (or in progress) after the initial sync started?

It was an existing index.

> if you haven't tried 3.0.7 without hinting this could be a worthwhile test.

Yup, I plan on doing that. We still have some hints in the code because v2.6.11 was really bad at choosing the right indexes. Regardless, even if it didn't fail with a "bad hint" error, it would utilize a sub-optimal index or it would do a collection scan, which is almost as bad as failing completely.

Comment by Stennie Steneker (Inactive) [ 03/Nov/15 ]

Hi Stefan,

We haven't been able to reproduce this issue yet, but have reopened to investigate further.

In MongoDB 3.0 builds for indexes that existed prior to initial sync should be completed before the node transitions into secondary state (as per SERVER-14737) regardless of whether indexes were originally created with background:true.

Your timeline doesn't note when the background index with a bad hint was created. Was this an existing index on the primary, or a new index build that may have been created (or in progress) after the initial sync started?

As noted in Wan's earlier response, it would also be worth investigating why you need the $hint via a discussion in the mongodb-user group. There have been some significant improvements to the query plan revision behaviour (in particular SERVER-20139), so if you haven't tried 3.0.7 without hinting this could be a worthwhile test.

Regards,
Stephen

Comment by Stefan Wojcik [ 02/Nov/15 ]

I understand Wan, but isn't it a flawed logic to mark a newly added node as a valid secondary before all the indexes are built?

Comment by Wan Bachtiar [ 02/Nov/15 ]

Hi Stefan,

Thanks for the report.

If you are setting your connection to read from secondary members and also using $hint to force specific indexes to fulfil the query, this would be an expected behaviour.

In this case, the $hint operator forces the query planner to use certain indexes that have not been built on the new secondary.

Instead of using $hint it is generally recommended to rely on the query planner. If you have any questions on why the query planner isn't choosing the expected index, please post on the mongodb-user group with information on the indexes, the query, and the explain(true) output of the query.

Regards,

Wan

Comment by Stefan Wojcik [ 28/Oct/15 ]

Ok, let me know if I can help in any other way. Wonder if it has something to do with indexes that have `background: true`, like the one below:

{
		"v" : 1,
		"key" : {
			"_cls" : 1,
			"thread" : 1,
			"date_created" : -1
		},
		"name" : "_cls_1_thread_1_date_created_-1",
		"ns" : "closeio.activity",
		"background" : true
	}

Comment by Ramon Fernandez Marina [ 28/Oct/15 ]

stefan@close.io, from the logs you provided it seems there could be a bug here. Unfortunately I haven't yet found a hint in the logs as to how this could be happening, so the next step is to try to reproduce this behavior locally.

Comment by Stefan Wojcik [ 27/Oct/15 ]

@Ramon, following up on this issue

Comment by Stefan Wojcik [ 23/Oct/15 ]

Also, rough timeline info should be helpful:

(ciodb17 is the primary, ciodb1a is the newly added server)

2015-10-23T23:28:17 initial sync from ciodb17 started
2015-10-23T05:23:47 initial sync is marked as done, then the server has immediately transitioned to RECOVERING and then SECONDARY
2015-10-23T05:48:19 first bad hint happened (on an index that was still missing)
2015-10-23T12:04:08 last index build was done
2015-10-23T12:11:02 Overflow sort stage error was logged. Its executionTimeMillisEstimate is 1059169 (17.65 minutes), meaning the query started running before the last index was built

Comment by Stefan Wojcik [ 23/Oct/15 ]

Thanks for a quick response Ramon. I attached the files you asked for. Let me know if you need anything else to isolate the issue.

One thing I'd add is that most of the indexes on our primary have been added as background indexes (and their spec still contains `background: true`). Not sure if it matters though.

Comment by Stefan Wojcik [ 23/Oct/15 ]

Attaching rs.conf()'s output, and logs for the primary and the newly added secondary.

Comment by Ramon Fernandez Marina [ 23/Oct/15 ]

Hi stefan@close.io, can you please send us the following information so we can investigate further:

  • full log for this secondary and also for the primary at the time
  • version(s) of MongoDB you're using
  • details about your deployment (rs.conf() for example)

Thanks,
Ramón.

Generated at Thu Feb 08 03:56:18 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.