Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Done
Priority: Critical - P2
Fix Version/s: 2.4.10
Affects Version/s: 2.4.9
Component/s: Index Maintenance
Labels:
None

Operating System:
ALL
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Issue Status as of March 27, 2014

ISSUE SUMMARY
If a background index build is already in progress, issuing the same index creation command will report success immediately. While this in itself is not a problem, if replication is active, the index creation command will be replicated causing possible problems on secondaries. This will become a problem if the index build ultimately has an error, due to a unique key constraint violation for example. Because of how replication works this can lead to secondary nodes shutting down and not recovering on a restart, since this is an unrecoverable data replication error which causes the server to to shutdown (Fatal Assertion 16360, exception during application of oplog operations on secondaries) for data integrity reasons.

USER IMPACT
Users are affected if they create the same background index multiple times concurrently, for example in environments with automated index creation. Due to the fact that the secondaries will shut down during an unrecoverable replication error (to protect data integrity and consistency), this can cause loss of majority and a primary step-down, rendering the replica set read-only.

SOLUTION
The fix is to only replicate index creation commands if they successfully build an index, and not in the case the index already exists. This will prevent secondaries from attempting an index build that could fail, and wasn't successful on the primary.

WORKAROUNDS
To avoid the issue, ensure that background indexes creation commands are only issued one at a time, for each index definition.

AFFECTED VERSIONS
All versions from 2.4.0 to 2.4.9 are affected. The 2.2 and 2.6 production release cycles are unaffected by this issue.

PATCHES
The fix is included in the production release version 2.4.10.

Original Description

If you attempt to build an index that is already in the process of being built (e.g. a background index), the attempt succeeds immediately and also is replicated immediately, even if the in-progress index build might eventually fail (due to, say, a unique constraint violation). This could result in SECONDARY nodes in a replica set attempting index builds that are destined to fail. Such SECONDARY nodes will shut down when the index build fails, since this is a data replication error that cannot be automatically resolved by the server.

The code fix was to not replicate index builds in the case that such an index already exists.
----------------------------------------------------------------------------
Original description and example reproducer:

On two-member replica set prepare collection like this:

function prepare() {
    db.c.drop()
    for (var i=0; i<10000; i++)
        db.c.insert({x:i})
    db.c.insert({x:0})
    printjson(db.getLastError(2))
}

Then start two threads in parallel each doing the following index creations:

function index() {
    while (true) {
        db.c.ensureIndex({x:1}, {unique:true, background:true})
    }
}

(I think the while loop isn't necessary as long as the two threads start at close enough to the same time.)

Index build fails as expected on primary:

Sun Feb  9 10:46:53.380 [conn7] background addExistingToIndex exception E11000 duplicate key error index: test.c.$x_1  dup key: { : null }
Sun Feb  9 10:46:53.380 [conn7] end connection 127.0.0.1:52421 (0 connections now open)

But secondary fasserts:

Sun Feb  9 10:46:52.164 [repl writer worker 1] info: indexing in foreground on this replica; was a background index build on the primary
Sun Feb  9 10:46:52.164 [repl writer worker 1] build index test.c { x: 1.0 }
Sun Feb  9 10:46:52.182 [repl writer worker 1] ERROR: writer worker caught exception: E11000 duplicate key error index: test.c.$x_1  dup key: { : null } on: { ts: Timestamp 1391960812000|1778, h: 5159704\
80723141670, v: 2, op: "i", ns: "test.system.indexes", o: { _id: ObjectId('52f7a2ec7ddfdca0480c6170'), ns: "test.c", key: { x: 1.0 }, name: "x_1", unique: true, background: true } }
Sun Feb  9 10:46:52.183 [repl writer worker 1]   Fatal Assertion 16360
0xde46e1 0xda44d3 0xc2ca7c 0xdb2021 0xe2cf69 0x30040079d1 0x30014e8b6d
 mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xde46e1]
 mongod(_ZN5mongo13fassertFailedEi+0xa3) [0xda44d3]
 mongod(_ZN5mongo7replset14multiSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE+0x12c) [0xc2ca7c]
 mongod(_ZN5mongo10threadpool6Worker4loopEv+0x281) [0xdb2021]
 mongod() [0xe2cf69]
 /lib64/libpthread.so.0() [0x30040079d1]
 /lib64/libc.so.6(clone+0x6d) [0x30014e8b6d]
Sun Feb  9 10:46:52.184 [repl writer worker 1]

It appears that an index creation oplog entry gets made in this case in spite of the index creation failure.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

server12662_twobgindex.js
1 kB
Mar 13 2014 05:51:49 PM UTC

is duplicated by

SERVER-13953 Duplicate key error causes all secondaries to fassert() and crash in production

Closed

Assignee:: Eliot Horowitz (Inactive)
Reporter:: Bruce Lucas (Inactive)
Participants:: Amalia Hawkins, Bruce Lucas, Eliot Horowitz, Eric Milkie, Githook User
Votes:: 2 Vote for this issue
Watchers:: 9 Start watching this issue

Created:: Feb 09 2014 03:59:57 PM UTC
Updated:: Jul 11 2016 05:16:32 PM UTC
Resolved:: Mar 15 2014 04:18:29 AM UTC

Details

Description

Original Description

Attachments

Attachments

Issue Links

Activity

People

Dates