[SERVER-13496] Creating index with same name but different spec in mixed version replicaset can abort replication Created: 05/Apr/14  Updated: 11/Jul/16  Resolved: 16/Apr/14

Status: Closed
Project: Core Server
Component/s: Index Maintenance
Affects Version/s: 2.6.0-rc3
Fix Version/s: 2.6.1, 2.7.0

Type: Bug Priority: Major - P3
Reporter: Cailin Nelson Assignee: Eric Milkie
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-13206 Mixed-version 2.4/2.6 replica set cra... Closed
is related to SERVER-13404 2.6 secondaries abort when replicated... Closed
Operating System: ALL
Backport Completed:
Participants:

 Description   
Issue Status as of April 18, 2014

ISSUE SUMMARY
If a 2.4.9 or earlier MongoDB primary replica set member executes ensureIndex for a pre-existing named index, but with a different key spec than the already existing index, a 2.6.0 mongod secondary member of the replica set will throw a fatal exception during replication of the index build to that member. This will cause the mongod secondary process to exit.

The issue is limited to the situation where all of the following conditions are true:

  • The primary replica set member is running MongoDB 2.4.9 (or older) and at least one secondary replica set member is running MongoDB 2.6.0
  • The index build uses a custom name
  • The index build spec is different from the actual spec for the existing named index.

Note that version 2.4.10 is unaffected by this issue.

USER IMPACT
An affected secondary member will exit and then will fail to start up, as it will keep retrying to build the index. To fix the issue, users need to resync the secondary node(s) from the primary again.

WORKAROUNDS
The problem can be circumvented by avoiding at least one of the three conditions mentioned above.

RESOLUTION
The patch adds a check for differing index specs. When this situation is encountered on a secondary replica set member, it ignores the index build as a primary does, instead of aborting with a fatal error.

AFFECTED VERSIONS
Version 2.6.0 in a replica set with 2.4.9 (or older) is affected by this issue.

PATCHES
The patch is included in the 2.6.1 production release.

Original description

A 2.4.9 primary node executed ensureIndex for a preexisting named index, but using different keys than the existing index. (I.e. application code and reality were out of sync.) The 2.4.9 primary will simply ignore the ensureIndex. However, this causes a fatal replication error on the 2.6.0 secondary.

Note that you will only encounter this situation if all of the following are true:

  • You have a 2.4.9 primary and a 2.6.0 secondary
  • You are using named indexes
  • You call ensureIndex with a spec that does not match the actual spec for the existing named index.

2014-04-04T20:37:23.480+0000 [initandlisten] connection accepted from 10.10.0.229:38151#36061 (76 connections now open)
2014-04-04T20:37:23.487+0000 [initandlisten] connection accepted from 10.20.0.227:46135 #36062 (77 connections now open)
2014-04-04T20:37:23.487+0000 [initandlisten] connection accepted from 10.20.0.227:46136 #36063 (78 connections now open)
2014-04-04T20:37:23.524+0000 [initandlisten] connection accepted from 10.20.0.227:46143 #36064 (79 connections now open)
2014-04-04T20:37:23.525+0000 [initandlisten] connection accepted from 10.20.0.227:46144 #36065 (80 connections now open)
2014-04-04T20:37:23.554+0000 [repl writer worker 4] ERROR: writer worker caught exception:  :: caused by :: 67 Trying to create an index with same name cid_1_date_1 with different key spec { cid: 1, date: -1 } vs existing spec { cid: 1.0, date: 1.0 } on: { ts: Timestamp 1396643843000|71, h: -4673813938368258385, v: 2, op: "i", ns: "mmsdbpings.system.indexes", o: { name: "cid_1_date_1", ns: "mmsdbpings.data.customerPings", key: { cid: 1, date: -1 } } }
2014-04-04T20:37:23.555+0000 [repl writer worker 4] Fatal Assertion 16360
2014-04-04T20:37:23.555+0000 [repl writer worker 4] 

Steps To Reproduce

Here is a simple reproduction case with a 2.4.6 primary and a 2.6.0rc3 secondary:

[scripts] (v20140325)$ mongo localhost:27000
backup_test:PRIMARY> db.version()
2.4.6
backup_test:PRIMARY> use test
switched to db test
backup_test:PRIMARY> db.users.ensureIndex({a:1, b:1},{name :'someIdx'})
backup_test:PRIMARY> db.users.ensureIndex({a:1, c:1},{name :'someIdx'})

Note that this issue does not affect 2.4.10; you must use 2.4.9 or older in order to be affected.



 Comments   
Comment by Githook User [ 17/Apr/14 ]

Author:

{u'username': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-13496 do not abort replication on secondaries when index name conflicts (via 2.4.9)
(cherry picked from commit 0fbd76d233e213e43f53b8882c4dd3c71897a7f3)

Conflicts:
src/mongo/base/error_codes.err
Branch: v2.6
https://github.com/mongodb/mongo/commit/8d2e6aab88f62bd718fc262315b98c60af89a6bd

Comment by Githook User [ 16/Apr/14 ]

Author:

{u'username': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-13496 do not abort replication on secondaries when index name conflicts (via 2.4.9)
Branch: master
https://github.com/mongodb/mongo/commit/0fbd76d233e213e43f53b8882c4dd3c71897a7f3

Comment by Kamran K. [ 05/Apr/14 ]

SERVER-13404 has a similar scenario.

Generated at Thu Feb 08 03:31:54 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.