[SERVER-13975] Creating index on collection named "system" can cause server to abort Created: 16/May/14  Updated: 11/Jul/16  Resolved: 17/Jun/14

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: 2.6.0, 2.6.1, 2.6.2
Fix Version/s: 2.6.4

Type: Bug Priority: Major - P3
Reporter: Stefan Wójcik Assignee: J Rassi
Resolution: Done Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File mongoengine-crash.log    
Issue Links:
Depends
Tested
Operating System: ALL
Steps To Reproduce:

1. Clone the repository from https://github.com/mongoengine/mongoengine.
2. run python setup.py test -s tests.document.instance.
3. Mongod crashes every time, with the log given in the description.

Participants:

 Description   
Issue Status as of Jul 18, 2014

ISSUE SUMMARY
In MongoDB 2.4 and earlier, creating an index on a collection with the name "system" and subsequently dropping the index leaves the database in a corrupt state. The corruption has no user-visible effects in MongoDB 2.4, but in affected versions of MongoDB 2.6 upon creation of a new index on this collection the server would detect the corruption and abort.

USER IMPACT
mongod shuts down upon detecting database corruption.

WORKAROUNDS
Running db.repairDatabase() on the affected database fixes this issue.

AFFECTED VERSIONS
MongoDB production releases up to 2.6.3 are affected by this issue.

FIX VERSION
The fix is included in the 2.6.4 production release.

RESOLUTION DETAILS
The fix for this issue is to add a procedure to the startup sequence that rectifies the system.namespaces collection for any affected databases. This procedure will output a log message "dropping orphaned index" for each orphan index that is found on a collection named "system".

Original description

I'm sorry for a vague summary, but I'm not entirely sure which operation is responsible for mongod crashing. Anyway, the important part is that the process crashes every single time (ver 2.6.1, Max OS X).

Log from the crash:

2014-05-16T12:23:01.833-0700 [conn1] mongoenginetest.log: clearing plan cache - collection info cache reset
2014-05-16T12:23:01.833-0700 [conn1] Only one plan is available; it will be run but will not be cached. query: { name: "mongoenginetest.log.$_id_" } sort: {} projection: {}, planSummary: COLLSCAN
2014-05-16T12:23:01.834-0700 [conn1] Only one plan is available; it will be run but will not be cached. query: { ns: "mongoenginetest.log", name: "_id_" } sort: {} projection: {}, planSummary: COLLSCAN
2014-05-16T12:23:01.834-0700 [conn1] 	 dropIndexes done
2014-05-16T12:23:01.834-0700 [conn1] Only one plan is available; it will be run but will not be cached. query: { name: "mongoenginetest.log" } sort: {} projection: {}, planSummary: COLLSCAN
2014-05-16T12:23:01.834-0700 [conn1] command mongoenginetest.$cmd command: drop { drop: "log" } keyUpdates:0 numYields:0 locks(micros) w:29648 reslen:82 29ms
2014-05-16T12:23:01.836-0700 [conn1] run command mongoenginetest.$cmd { drop: "system" }
2014-05-16T12:23:01.836-0700 [conn1] CMD: drop mongoenginetest.system
2014-05-16T12:23:01.836-0700 [conn1] command mongoenginetest.$cmd command: drop { drop: "system" } keyUpdates:0 numYields:0 locks(micros) w:80 reslen:62 0ms
2014-05-16T12:23:01.837-0700 [conn1] run command mongoenginetest.$cmd { drop: "node" }
2014-05-16T12:23:01.837-0700 [conn1] CMD: drop mongoenginetest.node
2014-05-16T12:23:01.837-0700 [conn1] command mongoenginetest.$cmd command: drop { drop: "node" } keyUpdates:0 numYields:0 locks(micros) w:74 reslen:62 0ms
2014-05-16T12:23:01.838-0700 [conn1] run command mongoenginetest.$cmd { insert: "node", ordered: true, documents: [ { _id: ObjectId('53766595ab265ddf6a6df6d2'), parameters: {} } ] }
2014-05-16T12:23:01.838-0700 [conn1] Database::_addNamespaceToCatalog ns: mongoenginetest.node
2014-05-16T12:23:01.838-0700 [conn1] ExtentManager::increaseStorageSize ns:mongoenginetest.node desiredSize:8192 fromFreeList: 1 eloc: 0:aae000
2014-05-16T12:23:01.848-0700 [conn1] _reuse extent was:mongoenginetest.doc now:mongoenginetest.node
2014-05-16T12:23:01.848-0700 [conn1] Database::_addNamespaceToCatalog ns: mongoenginetest.node.$_id_
2014-05-16T12:23:01.848-0700 [conn1] build index on: mongoenginetest.node properties: { v: 1, key: { _id: 1 }, name: "_id_", ns: "mongoenginetest.node" }
2014-05-16T12:23:01.848-0700 [conn1] mongoenginetest.node: clearing plan cache - collection info cache reset
2014-05-16T12:23:01.848-0700 [conn1] allocating new extent
2014-05-16T12:23:01.848-0700 [conn1] ExtentManager::increaseStorageSize ns:mongoenginetest.node.$_id_ desiredSize:131072 fromFreeList: 1 eloc: 0:d6e000
2014-05-16T12:23:01.848-0700 [conn1] _reuse extent was:mongoenginetest.log.$_id_ now:mongoenginetest.node.$_id_
2014-05-16T12:23:01.848-0700 [conn1] 	 added index to empty collection
2014-05-16T12:23:01.848-0700 [conn1] mongoenginetest.node: clearing plan cache - collection info cache reset
2014-05-16T12:23:01.848-0700 [conn1] insert mongoenginetest.node query: { _id: ObjectId('53766595ab265ddf6a6df6d2'), parameters: {} } ninserted:1 keyUpdates:0 numYields:0 locks(micros) w:10616 10ms
2014-05-16T12:23:01.848-0700 [conn1] command mongoenginetest.$cmd command: insert { insert: "node", ordered: true, documents: [ { _id: ObjectId('53766595ab265ddf6a6df6d2'), parameters: {} } ] } keyUpdates:0 numYields:0 locks(micros) w:32 reslen:40 10ms
2014-05-16T12:23:01.849-0700 [journal] journal WRITETODATAFILES 1
2014-05-16T12:23:01.849-0700 [journal] journal WRITETODATAFILES 2
2014-05-16T12:23:01.849-0700 [journal] journal WRITETODATAFILES 0.137ms
2014-05-16T12:23:01.849-0700 [conn1] run command mongoenginetest.$cmd { insert: "system", ordered: true, documents: [ { _id: ObjectId('53766595ab265ddf6a6df6d3'), name: "system", nodes: { node: ObjectId('53766595ab265ddf6a6df6d2') } } ] }
2014-05-16T12:23:01.849-0700 [conn1] Database::_addNamespaceToCatalog ns: mongoenginetest.system
2014-05-16T12:23:01.849-0700 [conn1] ExtentManager::increaseStorageSize ns:mongoenginetest.system desiredSize:8192 fromFreeList: 1 eloc: 0:a92000
2014-05-16T12:23:01.849-0700 [conn1] _reuse extent was:mongoenginetest.mongoengine.counters now:mongoenginetest.system
2014-05-16T12:23:01.850-0700 [conn1] mongoenginetest.system Invariant failure nsi.details( descriptor->indexNamespace() ) == NULL src/mongo/db/catalog/index_catalog.cpp 450
2014-05-16T12:23:01.858-0700 [conn1] mongoenginetest.system 0x1006ac7ab 0x1006642c2 0x100653fc9 0x1000f61e7 0x1000f69c6 0x1000f72fe 0x1000e5b1d 0x1001a36dc 0x1001a44d6 0x1001a571e 0x1001a652f 0x1001a92f7 0x1001b7605 0x1001b8101 0x1001b907c 0x1003cfc7f 0x1002a1050 0x100006854 0x100671b71 0x1006e0bb5
 0   mongod                              0x00000001006ac7ab _ZN5mongo15printStackTraceERSo + 43
 1   mongod                              0x00000001006642c2 _ZN5mongo10logContextEPKc + 114
 2   mongod                              0x0000000100653fc9 _ZN5mongo15invariantFailedEPKcS1_j + 233
 3   mongod                              0x00000001000f61e7 _ZN5mongo12IndexCatalog15IndexBuildBlock4initEv + 805
 4   mongod                              0x00000001000f69c6 _ZN5mongo12IndexCatalog11createIndexENS_7BSONObjEbNS0_16ShutdownBehaviorE + 704
 5   mongod                              0x00000001000f72fe _ZN5mongo12IndexCatalog17ensureHaveIdIndexEv + 590
 6   mongod                              0x00000001000e5b1d _ZN5mongo8Database16createCollectionERKNS_10StringDataERKNS_17CollectionOptionsEbb + 1821
 7   mongod                              0x00000001001a36dc _ZN5mongo18WriteBatchExecutor13execOneInsertEPNS0_16ExecInsertsStateEPPNS_16WriteErrorDetailE + 3592
 8   mongod                              0x00000001001a44d6 _ZN5mongo18WriteBatchExecutor11execInsertsERKNS_21BatchedCommandRequestEPSt6vectorIPNS_16WriteErrorDetailESaIS6_EE + 1334
 9   mongod                              0x00000001001a571e _ZN5mongo18WriteBatchExecutor11bulkExecuteERKNS_21BatchedCommandRequestEPSt6vectorIPNS_19BatchedUpsertDetailESaIS6_EEPS4_IPNS_16WriteErrorDetailESaISB_EE + 66
 10  mongod                              0x00000001001a652f _ZN5mongo18WriteBatchExecutor12executeBatchERKNS_21BatchedCommandRequestEPNS_22BatchedCommandResponseE + 2547
 11  mongod                              0x00000001001a92f7 _ZN5mongo8WriteCmd3runERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb + 645
 12  mongod                              0x00000001001b7605 _ZN5mongo12_execCommandEPNS_7CommandERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb + 37
 13  mongod                              0x00000001001b8101 _ZN5mongo7Command11execCommandEPS0_RNS_6ClientEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb + 2245
 14  mongod                              0x00000001001b907c _ZN5mongo12_runCommandsEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi + 1388
 15  mongod                              0x00000001003cfc7f _ZN5mongo11newRunQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1_ + 1615
 16  mongod                              0x00000001002a1050 _ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE + 1968
 17  mongod                              0x0000000100006854 _ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE + 308
 18  mongod                              0x0000000100671b71 _ZN5mongo17PortMessageServer17handleIncomingMsgEPv + 1681
 19  mongod                              0x00000001006e0bb5 thread_proxy + 229
2014-05-16T12:23:01.858-0700 [conn1]
 
***aborting after invariant() failure
 
 
2014-05-16T12:23:01.865-0700 [conn1] SEVERE: Got signal: 6 (Abort trap: 6).
Backtrace:0x1006ac7ab 0x1006ac50f 0x7fff912fb5aa 0 0x7fff929d4b1a 0x10065403b 0x1000f61e7 0x1000f69c6 0x1000f72fe 0x1000e5b1d 0x1001a36dc 0x1001a44d6 0x1001a571e 0x1001a652f 0x1001a92f7 0x1001b7605 0x1001b8101 0x1001b907c 0x1003cfc7f 0x1002a1050
 0   mongod                              0x00000001006ac7ab _ZN5mongo15printStackTraceERSo + 43
 1   mongod                              0x00000001006ac50f _ZN5mongo12_GLOBAL__N_110abruptQuitEi + 191
 2   libsystem_platform.dylib            0x00007fff912fb5aa _sigtramp + 26
 3   ???                                 0x0000000000000000 0x0 + 0
 4   libsystem_c.dylib                   0x00007fff929d4b1a abort + 125
 5   mongod                              0x000000010065403b _ZN5mongo15invariantFailedEPKcS1_j + 347
 6   mongod                              0x00000001000f61e7 _ZN5mongo12IndexCatalog15IndexBuildBlock4initEv + 805
 7   mongod                              0x00000001000f69c6 _ZN5mongo12IndexCatalog11createIndexENS_7BSONObjEbNS0_16ShutdownBehaviorE + 704
 8   mongod                              0x00000001000f72fe _ZN5mongo12IndexCatalog17ensureHaveIdIndexEv + 590
 9   mongod                              0x00000001000e5b1d _ZN5mongo8Database16createCollectionERKNS_10StringDataERKNS_17CollectionOptionsEbb + 1821
 10  mongod                              0x00000001001a36dc _ZN5mongo18WriteBatchExecutor13execOneInsertEPNS0_16ExecInsertsStateEPPNS_16WriteErrorDetailE + 3592
 11  mongod                              0x00000001001a44d6 _ZN5mongo18WriteBatchExecutor11execInsertsERKNS_21BatchedCommandRequestEPSt6vectorIPNS_16WriteErrorDetailESaIS6_EE + 1334
 12  mongod                              0x00000001001a571e _ZN5mongo18WriteBatchExecutor11bulkExecuteERKNS_21BatchedCommandRequestEPSt6vectorIPNS_19BatchedUpsertDetailESaIS6_EEPS4_IPNS_16WriteErrorDetailESaISB_EE + 66
 13  mongod                              0x00000001001a652f _ZN5mongo18WriteBatchExecutor12executeBatchERKNS_21BatchedCommandRequestEPNS_22BatchedCommandResponseE + 2547
 14  mongod                              0x00000001001a92f7 _ZN5mongo8WriteCmd3runERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb + 645
 15  mongod                              0x00000001001b7605 _ZN5mongo12_execCommandEPNS_7CommandERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb + 37
 16  mongod                              0x00000001001b8101 _ZN5mongo7Command11execCommandEPS0_RNS_6ClientEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb + 2245
 17  mongod                              0x00000001001b907c _ZN5mongo12_runCommandsEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi + 1388
 18  mongod                              0x00000001003cfc7f _ZN5mongo11newRunQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1_ + 1615
 19  mongod                              0x00000001002a1050 _ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE + 1968



 Comments   
Comment by Githook User [ 17/Jun/14 ]

Author:

{u'username': u'jrassi', u'name': u'Jason Rassi', u'email': u'rassi@10gen.com'}

Message: SERVER-13975 At startup, clean up orphan idxs on "system" collection
Branch: v2.6
https://github.com/mongodb/mongo/commit/179ef63cbb9df2b55f46f9c893ebbf0afad58b52

Comment by Githook User [ 17/Jun/14 ]

Author:

{u'username': u'jrassi', u'name': u'Jason Rassi', u'email': u'rassi@10gen.com'}

Message: SERVER-13975 At startup, clean up orphan idxs on "system" collection
Branch: v2.6
https://github.com/mongodb/mongo/commit/179ef63cbb9df2b55f46f9c893ebbf0afad58b52

Comment by J Rassi [ 16/Jun/14 ]

The fix for this issue is to add a procedure to the startup sequence that rectifies system.namespaces for any affected databases. This procedure will output a log message "dropping orphaned index" for each orphan index that is found on a collection named "system".

Note that this startup procedure is being adding only to the 2.6 branch, and not for the 2.7.x series. Users must upgrade through 2.6.3+ before upgrading to 2.8.x in order to avoid this issue. Hence:

  • For users that encounter this issue with versions 2.6.0 - 2.6.2 of the server: resolve by upgrading to 2.6.3 (or any 2.6.x greater than 2.6.3).
  • For users that encounter this issue with versions 2.8.0+ of the server: resolve by (1) downgrading to 2.6.3 (or any 2.6.x greater than 2.6.3), (2) restarting mongod, (3) verifying that the "dropping orphaned index" log message is generated at startup, and (4) re-upgrading to the original version.
Comment by Stefan Wójcik [ 21/May/14 ]

Thanks for a detailed explanation Jason! I'll patch MongoEngine's tests to use a different collection name.

Comment by J Rassi [ 21/May/14 ]

The MongoEngine test case from tests.document.instance that reproduces the error is test_complex_nesting_document_and_embedded_document. This test creates a collection named "system" and then drops it; this exposes an issue in MongoDB 2.4 (and earlier) in the handling of system collections.

A namespace where the collection component starts with the string "system." is considered a "system namespace". Collections with system namespace strings (such as system.profile, system.namespaces, system.indexes, system.js, system.users) are called "system collections", which are special in that user-initiated write operations (e.g. insert, update, delete) and administrative operations (e.g. drop, compact) against them are sometimes disallowed, and an initial sync skips over certain ones during the copy phase.

The assertion failure can be reproduced with the following:
1. Start mongod 2.4.10.
2. Run db.system.insert({})
3. Run db.system.drop()
4. Upgrade to mongod 2.6.1
5. Run db.system.insert({}).

Step #2 creates a collection with namespace name "test.system" (which is not considered a system namespace), and an _id index with namespace name "test.system._id_" (which is considered a system namespace). Step #3 first attempts to drop the _id index for the collection, which fails and outputs a log level 2 warning because the index namespace name is a system namespace (system namespaces can't be dropped). However, the error is swallowed and the drop of the collection proceeds, because the collection is not in a system namespace. This leaves the database in a corrupt state: it has an index on a collection which doesn't exist.

When the collection is re-created in 2.6, MongoDB detects that the database is corrupted (it attempts to create the _id index for a new collection, but realizes that the index already exists) and shuts down. Running "db.repairDatabase()" on the database (which rebuilds all indexes from each collection's index list) fixes the issue.

Comment by Thomas Rueckstiess [ 21/May/14 ]

Hi Stefan,

Thanks for the additional details. We are now able to reproduce the crash and are looking into the cause. We will update the ticket when we know more.

Thomas

Comment by Stefan Wójcik [ 21/May/14 ]

Hey Kaloian,

You're right - I was reusing a data directory previously used by mongod 2.4.10. The name of the database was consistent among all the tests regardless of the mongod version, but the db was supposed to be dropped at the end of each test. Not sure why the error occurs and whether it's something you guys would wanna fix or not.

Here's a way to reproduce the error reliably:
1. Create a new directory, start mongod 2.4.10's dbpath to that dir.
2. Run MongoEngine's tests: python setup.py test -s tests.document.instance.
3. Stop mongod 2.4.10 and start 2.6.1, pointing to the same dbpath.
4. Rerun MongoEngine's tests.

Comment by Kaloian Manassiev [ 21/May/14 ]

Hi Stefan,

We have two hypotheses - one is that the database has already been corrupted and the same corruption gets hit over and over. The other is that MongoDB is reusing an extent, which was not all zeroes and we interpreted some junk leftover data as a presence of an index.

Are you running the tests with a clean database? If not, can you please try to re-run them on a clean database and let me know if this still reproduces?

Thanks in advance.

-Kal.

Generated at Thu Feb 08 03:33:28 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.