[SERVER-13975] Creating index on collection named "system" can cause server to abort Created: 16/May/14 Updated: 11/Jul/16 Resolved: 17/Jun/14 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Storage |
| Affects Version/s: | 2.6.0, 2.6.1, 2.6.2 |
| Fix Version/s: | 2.6.4 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Stefan Wójcik | Assignee: | J Rassi |
| Resolution: | Done | Votes: | 1 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Operating System: | ALL | ||||||||
| Steps To Reproduce: | 1. Clone the repository from https://github.com/mongoengine/mongoengine. |
||||||||
| Participants: | |||||||||
| Description |
| Comments |
| Comment by Githook User [ 17/Jun/14 ] |
|
Author: {u'username': u'jrassi', u'name': u'Jason Rassi', u'email': u'rassi@10gen.com'}Message: |
| Comment by Githook User [ 17/Jun/14 ] |
|
Author: {u'username': u'jrassi', u'name': u'Jason Rassi', u'email': u'rassi@10gen.com'}Message: |
| Comment by J Rassi [ 16/Jun/14 ] |
|
The fix for this issue is to add a procedure to the startup sequence that rectifies system.namespaces for any affected databases. This procedure will output a log message "dropping orphaned index" for each orphan index that is found on a collection named "system". Note that this startup procedure is being adding only to the 2.6 branch, and not for the 2.7.x series. Users must upgrade through 2.6.3+ before upgrading to 2.8.x in order to avoid this issue. Hence:
|
| Comment by Stefan Wójcik [ 21/May/14 ] |
|
Thanks for a detailed explanation Jason! I'll patch MongoEngine's tests to use a different collection name. |
| Comment by J Rassi [ 21/May/14 ] |
|
The MongoEngine test case from tests.document.instance that reproduces the error is test_complex_nesting_document_and_embedded_document. This test creates a collection named "system" and then drops it; this exposes an issue in MongoDB 2.4 (and earlier) in the handling of system collections. A namespace where the collection component starts with the string "system." is considered a "system namespace". Collections with system namespace strings (such as system.profile, system.namespaces, system.indexes, system.js, system.users) are called "system collections", which are special in that user-initiated write operations (e.g. insert, update, delete) and administrative operations (e.g. drop, compact) against them are sometimes disallowed, and an initial sync skips over certain ones during the copy phase. The assertion failure can be reproduced with the following: Step #2 creates a collection with namespace name "test.system" (which is not considered a system namespace), and an _id index with namespace name "test.system._id_" (which is considered a system namespace). Step #3 first attempts to drop the _id index for the collection, which fails and outputs a log level 2 warning because the index namespace name is a system namespace (system namespaces can't be dropped). However, the error is swallowed and the drop of the collection proceeds, because the collection is not in a system namespace. This leaves the database in a corrupt state: it has an index on a collection which doesn't exist. When the collection is re-created in 2.6, MongoDB detects that the database is corrupted (it attempts to create the _id index for a new collection, but realizes that the index already exists) and shuts down. Running "db.repairDatabase()" on the database (which rebuilds all indexes from each collection's index list) fixes the issue. |
| Comment by Thomas Rueckstiess [ 21/May/14 ] |
|
Hi Stefan, Thanks for the additional details. We are now able to reproduce the crash and are looking into the cause. We will update the ticket when we know more. Thomas |
| Comment by Stefan Wójcik [ 21/May/14 ] |
|
Hey Kaloian, You're right - I was reusing a data directory previously used by mongod 2.4.10. The name of the database was consistent among all the tests regardless of the mongod version, but the db was supposed to be dropped at the end of each test. Not sure why the error occurs and whether it's something you guys would wanna fix or not. Here's a way to reproduce the error reliably: |
| Comment by Kaloian Manassiev [ 21/May/14 ] |
|
Hi Stefan, We have two hypotheses - one is that the database has already been corrupted and the same corruption gets hit over and over. The other is that MongoDB is reusing an extent, which was not all zeroes and we interpreted some junk leftover data as a presence of an index. Are you running the tests with a clean database? If not, can you please try to re-run them on a clean database and let me know if this still reproduces? Thanks in advance. -Kal. |