[SERVER-12956] Stopping a secondary in phase 2/3 of an index build corrupts the index Created: 28/Feb/14  Updated: 11/Jul/16  Resolved: 15/Mar/14

Status: Closed
Project: Core Server
Component/s: Index Maintenance
Affects Version/s: 2.4.9
Fix Version/s: 2.4.10

Type: Bug Priority: Critical - P2
Reporter: David Hows Assignee: Eliot Horowitz (Inactive)
Resolution: Done Votes: 6
Labels: corrupt
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File mongod.log     File server12956_killsecindex.js    
Issue Links:
Duplicate
is duplicated by SERVER-8147 Replication breaks with errorMessage ... Closed
Related
is related to SERVER-12957 Stopping a secondary during an index ... Closed
Operating System: ALL
Steps To Reproduce:

Create a replica set
Have enough data in the set to make the index build take 10s of seconds
Build an index on the primary
Kill -5 the secondary when it reaches phase 2/3 of the index build (you can follow the logs to find this)
Start the secondary (it will not rebuild the index)
Insert a document to the primary
The secondary will crash

Participants:

 Description   
Issue Status as of March 31, 2014

ISSUE SUMMARY

If a secondary node is shut down during phase 2 of an index build (constructing the B-Tree), it will not attempt to rebuild the index upon restart and leave behind a corrupted index catalog. Subsequent inserts on the primary that are replicated to this secondary node will crash the secondary node.

USER IMPACT

A secondary node affected in this way has to be resynced or repaired. This only happens during abnormal termination of the mongod process with an index build in progress. Normal shutdown will not trigger this issue.

SOLUTION

On startup, incomplete index builds need to be cleaned up. The index creation needs to be manually restarted.

WORKAROUNDS

None.

AFFECTED VERSIONS

All versions from 2.4.0 to 2.4.9 are affected by this bug. The 2.6 series is unaffected by the issue.

PATCHES

The fix is included in the 2.4.10 production release.

Original Description

If a secondary is stopped during phase 2 of an index build the member will not attempt to re-build the index (as it does in phase 1). Leaving a corrupt index on that secondary will cause crashes and other problems.



 Comments   
Comment by Githook User [ 14/Mar/14 ]

Author:

{u'username': u'erh', u'name': u'Eliot Horowitz', u'email': u'eliot@10gen.com'}

Message: SERVER-12956: on startup, remove half built indexes
Branch: v2.4
https://github.com/mongodb/mongo/commit/55a6a0d284e0921a877020592857b3abf6c6cb73

Comment by Amalia Hawkins [ 12/Mar/14 ]

Attached a js test file that replicates the problem, no pun intended. Currently requires manual checking of output, but will clean that up later.

Comment by Eric Milkie [ 03/Mar/14 ]

Note: Issue does not affect master (2.6), will apply fix to 2.4 branch.

Generated at Thu Feb 08 03:30:11 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.