[SERVER-16274] secondary fasserts trying to replicate an index Created: 21/Nov/14  Updated: 23/Mar/17  Resolved: 06/Jan/15

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 2.6.5, 2.8.0-rc0
Fix Version/s: 2.6.8, 2.8.0-rc5

Type: Bug Priority: Major - P3
Reporter: Jeffrey Yemin Assignee: Eric Milkie
Resolution: Done Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File db27017.log     Text File db27018.log    
Issue Links:
Duplicate
is duplicated by SERVER-15871 Segmentation fault in secondary rebui... Closed
is duplicated by SERVER-13304 starting background index builds on s... Closed
Related
related to SERVER-27834 Index builds during initial sync shou... Closed
is related to SERVER-15393 Renaming a collection with newly adde... Closed
Tested
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Completed:
Steps To Reproduce:

Reproduction is not consistent, but it appears to happen during a sequence of index creations on the same collection (dropping the collection between each one), each with different options. One of the creations is in the background.

You can see the name of the collection as well as the index name in the exception message in the ticket description.

Participants:

 Description   

A secondary fasserts with the following log message:

2014-11-21T18:57:02.566+0000 E REPL     [repl writer worker 15] writer worker caught exception:  :: caused by :: 85 Index with name: theField_1 already exists with different options on: { ts: Timestamp 1416596222000|61, h: -5680555093148999460, v: 2, op: "i", ns: "JavaDriverTest.system.indexes", o: { key: { theField: 1 }, name: "theField_1", ns: "JavaDriverTest.com.mongodb.acceptancetest.index.AddIndexAcceptanceTest", expireAfterSeconds: 1600 } }
2014-11-21T18:57:02.566+0000 I -        [repl writer worker 15] Fatal Assertion 16360
2014-11-21T18:57:02.566+0000 I -        [repl writer worker 15]
 
***aborting after fassert() failure



 Comments   
Comment by Githook User [ 06/Feb/15 ]

Author:

{u'username': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-16274 synchronize start of bg index builds on secondaries
Branch: v2.6
https://github.com/mongodb/mongo/commit/8321909044353580b66947c73f264a8575b90c66

Comment by Githook User [ 06/Jan/15 ]

Author:

{u'username': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-16274 synchronize start of bg index builds on secondaries
Branch: master
https://github.com/mongodb/mongo/commit/e35f2d62ccabee95075dd03d2eac85339e063e37

Comment by Roy [ 21/Dec/14 ]

Thanks Andy.
This actually causes the instance to be in a "disaster" state, turning it back on fails again and again, and requires a full resync of the data, so it's pretty severe.

Comment by Andy Schwerin [ 20/Dec/14 ]

mamoos1, the issue is marked "Backport Requested". Once a developer produces a fix on the master development branch, we will assess it for backport.

Comment by Roy [ 20/Dec/14 ]

Hi,

This actually happened to me with 2.6.6 as well.
Can you please specify why this is not going to get fixed for the 2.6 branch? (I see 2.8.0-rc4 as the only fix version...)

Thanks.

Comment by Eric Milkie [ 21/Nov/14 ]

This is most likely related to the way we track background index builds on secondaries.
There is a race between registering a new background index build in the curop list and applying subsequent commands that need to interrupt in-progress index builds. In the case above, normal behavior would have started the bg index build, started the drop collection, interrupted the bg index build, and completed the drop. Instead, the order was: start drop collection, discover no index builds to interrupt, finish drop collection, start bg index build. At the end of the operation, the collection exists when it should not.

Generated at Thu Feb 08 03:40:32 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.