[SERVER-15053] MongoDB throws segmentation fault while running aggregation Created: 27/Aug/14  Updated: 10/Dec/14  Resolved: 13/Oct/14

Status: Closed
Project: Core Server
Component/s: Aggregation Framework, Stability
Affects Version/s: 2.6.4
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Roy Assignee: Unassigned
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Ubuntu 12.04


Issue Links:
Duplicate
duplicates SERVER-15580 Evaluating candidate query plans with... Closed
Related
related to SERVER-14969 Dropping index during active aggregat... Closed
Operating System: Linux
Participants:

 Description   

2014-08-27T15:34:26.007+0000 [conn5314] SEVERE: Invalid access at address: 0xece2ac08
2014-08-27T15:34:26.027+0000 [conn5314] SEVERE: Got signal: 11 (Segmentation fault).
Backtrace:0x11e6111 0x11e54ee 0x11e55df 0x7f151ebfbcb0 0xaac37f 0xd5570c 0xd5622d 0xd5932e 0xc6e7c4 0xc6eec5 0xc7e9b3 0xc80046 0xca0a0f 0xca0e29 0xcfc7bb 0x9b3816 0xa2889a 0xa29ce2 0xa2bea6 0xd5dd6d
/usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0x11e6111]
/usr/bin/mongod() [0x11e54ee]
/usr/bin/mongod() [0x11e55df]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0) [0x7f151ebfbcb0]
/usr/bin/mongod(_ZN5mongo18KeepMutationsStage4workEPm+0xef) [0xaac37f]
/usr/bin/mongod(_ZN5mongo15MultiPlanRunner12workAllPlansEPNS_7BSONObjEm+0x13c) [0xd5570c]
/usr/bin/mongod(_ZN5mongo15MultiPlanRunner12pickBestPlanEPmPNS_7BSONObjE+0xed) [0xd5622d]
/usr/bin/mongod(_ZN5mongo15MultiPlanRunner7getNextEPNS_7BSONObjEPNS_7DiskLocE+0x34e) [0xd5932e]
/usr/bin/mongod(_ZN5mongo20DocumentSourceCursor9loadBatchEv+0x1e4) [0xc6e7c4]
/usr/bin/mongod(_ZN5mongo20DocumentSourceCursor7getNextEv+0x115) [0xc6eec5]
/usr/bin/mongod(_ZN5mongo19DocumentSourceGroup8populateEv+0x73) [0xc7e9b3]
/usr/bin/mongod(_ZN5mongo19DocumentSourceGroup7getNextEv+0x426) [0xc80046]
/usr/bin/mongod(_ZN5mongo18DocumentSourceSort8populateEv+0x12f) [0xca0a0f]
/usr/bin/mongod(_ZN5mongo18DocumentSourceSort7getNextEv+0xb9) [0xca0e29]
/usr/bin/mongod(_ZN5mongo8Pipeline3runERNS_14BSONObjBuilderE+0x8b) [0xcfc7bb]
/usr/bin/mongod(_ZN5mongo15PipelineCommand3runERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x526) [0x9b3816]
/usr/bin/mongod(_ZN5mongo12_execCommandEPNS_7CommandERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x3a) [0xa2889a]
/usr/bin/mongod(_ZN5mongo7Command11execCommandEPS0_RNS_6ClientEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0x1042) [0xa29ce2]
/usr/bin/mongod(_ZN5mongo12_runCommandsEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x6c6) [0xa2bea6]
/usr/bin/mongod(_ZN5mongo11newRunQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1_+0x22ed) [0xd5dd6d]

I previously posted SERVER-14969, which was similar, but I don't think it's related since there are no index jobs or drops running in the background at this point.
I disabled some of the logging feature at the machine while this occurred, so unfortunately I don't have any additional logs I can provide or the type of query/command that ran in order to cause the segmentation fault.



 Comments   
Comment by J Rassi [ 13/Oct/14 ]

I've confirmed this ticket as a duplicate of SERVER-15580. Feel free to add further questions/comments over at that ticket. Resolving.

Comment by Roy [ 29/Sep/14 ]

Hi Thomas,

For me, it reproduced in 2 different environments that contain the same data structure, but are unrelated other than that, so I doubt if it's a corrupt index.
Is there any other way I can help you reproduce? e.g. offer a demo environment where it does reproduce for your research?

Roy.

Comment by Thomas Rueckstiess [ 23/Sep/14 ]

Hi Roy,

sorry for the delay on this issue, but we are still not able to reproduce the segmentation fault.

I've modified the data and made the fields arrays like your example documents. I created all the indices you provided, then executed both commands simultaneously in two mongo shells. Here is a snippet of the log file that shows that the commands are running successfully and are updating results and aggregating data.

2014-09-23T14:29:47.569+0200 [conn15] update test.files query: { appId: 11114.0, fileStatus: { $nin: [ 2.0, "DELETED" ] }, snapshotLastScanned: { $lt: 1409316400697.0 } } update: { $inc: { unseenScans: 1.0 } } nscanned:17515 nscannedObjects:6400 nMatched:5147 nModified:5147 keyUpdates:2 numYields:148 locks(micros) w:2880786 1527ms
2014-09-23T14:29:47.570+0200 [conn15] command test.$cmd command: update { update: "files", ordered: true, updates: [ { q: { appId: 11114.0, fileStatus: { $nin: [ 2.0, "DELETED" ] }, snapshotLastScanned: { $lt: 1409316400697.0 } }, u: { $inc: { unseenScans: 1.0 } }, multi: true } ] } keyUpdates:0 numYields:0  reslen:55 1528ms
2014-09-23T14:29:47.572+0200 [conn18] command test.$cmd command: aggregate { aggregate: "files", pipeline: [ { $match: { isForeign: false, fileStatus: { $in: [ 0, 1 ] }, isFolder: false, appId: 11114 } }, { $group: { count: { $sum: 1 }, _id: "$fileAccessLevel" } } ], cursor: {} } keyUpdates:0 numYields:19 locks(micros) r:13261 reslen:297 13ms
2014-09-23T14:29:47.814+0200 [conn18] command test.$cmd command: aggregate { aggregate: "files", pipeline: [ { $match: { isForeign: false, fileStatus: { $in: [ 0, 1 ] }, isFolder: false, appId: 11114 } }, { $group: { count: { $sum: 1 }, _id: "$fileAccessLevel" } } ], cursor: {} } keyUpdates:0 numYields:19 locks(micros) r:15756 reslen:297 241ms
2014-09-23T14:29:48.711+0200 [conn18] command test.$cmd command: aggregate { aggregate: "files", pipeline: [ { $match: { isForeign: false, fileStatus: { $in: [ 0, 1 ] }, isFolder: false, appId: 11114 } }, { $group: { count: { $sum: 1 }, _id: "$fileAccessLevel" } } ], cursor: {} } keyUpdates:0 numYields:50 locks(micros) r:49784 reslen:297 884ms
2014-09-23T14:29:49.155+0200 [conn18] command test.$cmd command: aggregate { aggregate: "files", pipeline: [ { $match: { isForeign: false, fileStatus: { $in: [ 0, 1 ] }, isFolder: false, appId: 11114 } }, { $group: { count: { $sum: 1 }, _id: "$fileAccessLevel" } } ], cursor: {} } keyUpdates:0 numYields:49 locks(micros) r:37877 reslen:297 428ms
2014-09-23T14:29:49.169+0200 [conn18] command test.$cmd command: aggregate { aggregate: "files", pipeline: [ { $match: { isForeign: false, fileStatus: { $in: [ 0, 1 ] }, isFolder: false, appId: 11114 } }, { $group: { count: { $sum: 1 }, _id: "$fileAccessLevel" } } ], cursor: {} } keyUpdates:0 numYields:19 locks(micros) r:13253 reslen:297 12ms
2014-09-23T14:29:49.170+0200 [conn15] update test.files query: { appId: 11114.0, fileStatus: { $nin: [ 2.0, "DELETED" ] }, snapshotLastScanned: { $lt: 1409316400697.0 } } update: { $inc: { unseenScans: 1.0 } } nscanned:17515 nscannedObjects:6400 nMatched:5147 nModified:5147 keyUpdates:2 numYields:147 locks(micros) w:3019634 1599ms
2014-09-23T14:29:49.170+0200 [conn15] command test.$cmd command: update { update: "files", ordered: true, updates: [ { q: { appId: 11114.0, fileStatus: { $nin: [ 2.0, "DELETED" ] }, snapshotLastScanned: { $lt: 1409316400697.0 } }, u: { $inc: { unseenScans: 1.0 } }, multi: true } ] } keyUpdates:0 numYields:0  reslen:55 1600ms
2014-09-23T14:29:49.367+0200 [conn18] command test.$cmd command: aggregate { aggregate: "files", pipeline: [ { $match: { isForeign: false, fileStatus: { $in: [ 0, 1 ] }, isFolder: false, appId: 11114 } }, { $group: { count: { $sum: 1 }, _id: "$fileAccessLevel" } } ], cursor: {} } keyUpdates:0 numYields:19 locks(micros) r:15569 reslen:297 195ms

Can you run a validate command on this collection with the "full" option to see if there are any corruption issues? Notice that validate is a resource-intensive command and should best be run during off-peak hours or a maintenance window:

db.files.validate(true)

Please let us know what the validate command returns.

If there is a corrupt index, you may be able to drop the particular index and re-create it. A verbose log file (verbosity level 2 or higher ideally) during such a segmentation fault would also be helpful.

Regards,
Thomas

Comment by Roy [ 22/Sep/14 ]

Thomas, Any new info?
Thanks.

Comment by Roy [ 09/Sep/14 ]

Hi Thomas,
Is there any additional info you need in order to resolve this issue?
This is a major issue, causing us to use a different index for the aggregate, which is very problematic in terms of performance...

Thanks.

Comment by Roy [ 29/Aug/14 ]

Hi,

Actually I converted my aggregation into a map/reduce, and it still happened. So what I did was to convert that map/reduce's query into a query that does not use the same index as the update and filters out some of the parameters in the map function - and it didn't crash anymore. So I think that indexes are indeed relevant here.

Just a few changes (not sure if they actually have any effect):
fileAccessLevel is an array (example: "fileAccessLevel" : [ 0, "PRIVATE" ])
same goes for fileStatus (example: "fileStatus" : [ 1, "TRASHED" ])

My indexes look like this (sorry for the long list..):

[
        {
                "v" : 1,
                "key" : {
                        "_id" : 1
                },
                "name" : "_id_",
                "ns" : "Tenant_2.files"
        },
        {
                "v" : 1,
                "key" : {
                        "appId" : 1,
                        "domains" : 1
                },
                "name" : "appId_1_domains_1",
                "ns" : "Tenant_2.files",
                "background" : true
        },
        {
                "v" : 1,
                "key" : {
                        "fileAccessLevel" : 1
                },
                "name" : "fileAccessLevel_1",
                "ns" : "Tenant_2.files",
                "background" : true
        },
        {
                "v" : 1,
                "key" : {
                        "emails" : 1
                },
                "name" : "emails_1",
                "ns" : "Tenant_2.files",
                "background" : true
        },
        {
                "v" : 1,
                "key" : {
                        "groupIds" : 1
                },
                "name" : "groupIds_1",
                "ns" : "Tenant_2.files",
                "background" : true
        },
        {
                "v" : 1,
                "key" : {
                        "_fts" : "text",
                        "_ftsx" : 1
                },
                "name" : "name_text",
                "ns" : "Tenant_2.files",
                "background" : true,
                "weights" : {
                        "name" : 1
                },
                "default_language" : "english",
                "language_override" : "language",
                "textIndexVersion" : 2
        },
        {
                "v" : 1,
                "key" : {
                        "appId" : 1,
                        "parentKeyPrefix" : 1,
                        "parentType" : 1
                },
                "name" : "appId_1_parentKeyPrefix_1_parentType_1",
                "ns" : "Tenant_2.files",
                "background" : true
        },
        {
                "v" : 1,
                "key" : {
                        "modifiedDate" : -1
                },
                "name" : "modifiedDate_-1",
                "ns" : "Tenant_2.files",
                "background" : true
        },
        {
                "v" : 1,
                "key" : {
                        "id" : "hashed"
                },
                "name" : "id_hashed",
                "ns" : "Tenant_2.files",
                "background" : true
        },
        {
                "v" : 1,
                "key" : {
                        "appId" : "hashed"
                },
                "name" : "appId_hashed",
                "ns" : "Tenant_2.files",
                "background" : true
        },
        {
                "v" : 1,
                "key" : {
                        "ownerAddress" : "hashed"
                },
                "name" : "ownerAddress_hashed",
                "ns" : "Tenant_2.files",
                "background" : true
        },
        {
                "v" : 1,
                "key" : {
                        "isFolder" : "hashed"
                },
                "name" : "isFolder_hashed",
                "ns" : "Tenant_2.files",
                "background" : true
        },
        {
                "v" : 1,
                "key" : {
                        "isForeign" : "hashed"
                },
                "name" : "isForeign_hashed",
                "ns" : "Tenant_2.files",
                "background" : true
        },
        {
                "v" : 1,
                "key" : {
                        "objectType" : "hashed"
                },
                "name" : "objectType_hashed",
                "ns" : "Tenant_2.files",
                "background" : true
        },
        {
                "v" : 1,
                "key" : {
                        "labels.trashed" : "hashed"
                },
                "name" : "labels.trashed_hashed",
                "ns" : "Tenant_2.files",
                "background" : true
        },
        {
                "v" : 1,
                "key" : {
                        "isTrashed" : "hashed"
                },
                "name" : "isTrashed_hashed",
                "ns" : "Tenant_2.files",
                "background" : true
        },
        {
                "v" : 1,
                "key" : {
                        "fileSize" : -1,
                        "appId" : 1
                },
                "name" : "fileSize_-1_appId_1",
                "ns" : "Tenant_2.files",
                "background" : true
        },
        {
                "v" : 1,
                "key" : {
                        "ownerAddress" : -1,
                        "appId" : 1
                },
                "name" : "ownerAddress_-1_appId_1",
                "ns" : "Tenant_2.files",
                "background" : true
        },
        {
                "v" : 1,
                "key" : {
                        "fileAccessLevel" : -1,
                        "appId" : 1
                },
                "name" : "fileAccessLevel_-1_appId_1",
                "ns" : "Tenant_2.files",
                "background" : true
        },
        {
                "v" : 1,
                "key" : {
                        "name" : -1,
                        "appId" : 1
                },
                "name" : "name_-1_appId_1",
                "ns" : "Tenant_2.files",
                "background" : true
        },
        {
                "v" : 1,
                "key" : {
                        "modifiedDate" : -1,
                        "appId" : 1
                },
                "name" : "modifiedDate_-1_appId_1",
                "ns" : "Tenant_2.files",
                "background" : true
        },
        {
                "v" : 1,
                "key" : {
                        "isForeign" : 1,
                        "fileStatus" : 1,
                        "fileSize" : -1,
                        "appId" : 1
                },
                "name" : "isForeign_1_fileStatus_1_fileSize_-1_appId_1",
                "ns" : "Tenant_2.files",
                "background" : true
        },
        {
                "v" : 1,
                "key" : {
                        "isForeign" : 1,
                        "fileStatus" : 1,
                        "ownerAddress" : -1,
                        "appId" : 1
                },
                "name" : "isForeign_1_fileStatus_1_ownerAddress_-1_appId_1",
                "ns" : "Tenant_2.files",
                "background" : true
        },
        {
                "v" : 1,
                "key" : {
                        "isForeign" : 1,
                        "fileAccessLevel" : -1,
                        "appId" : 1
                },
                "name" : "isForeign_1_fileAccessLevel_-1_appId_1",
                "ns" : "Tenant_2.files",
                "background" : true
        },
        {
                "v" : 1,
                "key" : {
                        "isForeign" : 1,
                        "fileStatus" : 1,
                        "name" : -1,
                        "appId" : 1
                },
                "name" : "isForeign_1_fileStatus_1_name_-1_appId_1",
                "ns" : "Tenant_2.files",
                "background" : true
        },
        {
                "v" : 1,
                "key" : {
                        "isForeign" : 1,
                        "fileStatus" : 1,
                        "modifiedDate" : -1,
                        "appId" : 1
                },
                "name" : "isForeign_1_fileStatus_1_modifiedDate_-1_appId_1",
                "ns" : "Tenant_2.files",
                "background" : true
        },
        {
                "v" : 1,
                "key" : {
                        "isForeign" : 1,
                        "fileStatus" : 1,
                        "appId" : 1
                },
                "name" : "isForeign_1_fileStatus_1_appId_1",
                "ns" : "Tenant_2.files",
                "background" : true
        },
        {
                "v" : 1,
                "key" : {
                        "name" : 1
                },
                "name" : "name_1",
                "ns" : "Tenant_2.files",
                "background" : true
        },
        {
                "v" : 1,
                "key" : {
                        "fileStatus" : 1
                },
                "name" : "fileStatus_1",
                "ns" : "Tenant_2.files",
                "background" : true
        },
        {
                "v" : 1,
                "key" : {
                        "isForeign" : 1,
                        "fileStatus" : 1
                },
                "name" : "isForeign_1_fileStatus_1",
                "ns" : "Tenant_2.files",
                "background" : true
        },
        {
                "v" : 1,
                "key" : {
                        "parentIds" : 1
                },
                "name" : "parentIds_1",
                "ns" : "Tenant_2.files",
                "background" : true
        },
        {
                "v" : 1,
                "key" : {
                        "cabinetState" : 1
                },
                "name" : "cabinetState_1",
                "ns" : "Tenant_2.files",
                "background" : true
        },
        {
                "v" : 1,
                "key" : {
                        "appId" : 1,
                        "fileStatus" : 1,
                        "ownerAddress" : 1
                },
                "name" : "appId_1_fileStatus_1_ownerAddress_1",
                "ns" : "Tenant_2.files",
                "background" : true
        },
        {
                "v" : 1,
                "key" : {
                        "parents.id" : 1
                },
                "name" : "parents.id_1",
                "ns" : "Tenant_2.files",
                "background" : true
        },
        {
                "v" : 1,
                "key" : {
                        "parentId" : "hashed"
                },
                "name" : "parentId_hashed",
                "ns" : "Tenant_2.files",
                "background" : true
        },
        {
                "v" : 1,
                "key" : {
                        "appId" : 1,
                        "fileStatus" : 1,
                        "unseenScans" : 1,
                        "snapshotLastScanned" : 1
                },
                "name" : "appId_1_fileStatus_1_unseenScans_1_snapshotLastScanned_1",
                "ns" : "Tenant_2.files",
                "background" : true
        }
]

Comment by Thomas Rueckstiess [ 29/Aug/14 ]

Hi Roy,

We're still not able to reproduce the crash you describe.

I've created a dummy collection with 100,000 documents matching the fields I could gather from your commands. My example documents look like this:

MongoDB shell version: 2.6.4
connecting to: test
> db.files.find().limit(3).pretty()
{
	"_id" : ObjectId("54009cacfe4dce05d22eee5b"),
	"unseenScans" : 84,
	"isFolder" : false,
	"fileAccessLevel" : 0,
	"fileStatus" : "CREATED",
	"appId" : 234,
	"isForeign" : false,
	"snapshotLastScanned" : NumberLong("1409316422223")
}
{
	"_id" : ObjectId("54009cacfe4dce05d32eee5b"),
	"unseenScans" : 8,
	"isFolder" : true,
	"fileAccessLevel" : 2,
	"fileStatus" : 4,
	"appId" : 73,
	"isForeign" : true,
	"snapshotLastScanned" : NumberLong("1409316426150")
}
{
	"_id" : ObjectId("54009cacfe4dce05d42eee5b"),
	"unseenScans" : 57,
	"isFolder" : true,
	"fileAccessLevel" : 3,
	"fileStatus" : 1,
	"appId" : 73,
	"isForeign" : true,
	"snapshotLastScanned" : NumberLong("1409316342568")
}

I've made sure that the aggregation and update command both match some documents.

I then ran both your commands in two different shells, and they worked fine, and I couldn't cause the database to crash.

Can you please share your list of indexes on the files collection, a few sample documents, and anything else you can think of how your scenario differs from mine? I will then try again to get a reproducer for the crashes.

Thanks,
Thomas

Comment by hari.khalsa@10gen.com [ 29/Aug/14 ]

mamoos1 Thanks for the investigation! We'll work on reproducing it locally now and get back to you shortly.

Comment by Roy [ 29/Aug/14 ]

I have fully reproduced the issue.
When running these two commands in different shells on the same mongo server, the server crashes:

Command 1:

while (true) { db.runCommand({ update: "files", ordered: true, updates: [ { q: { appId: 11114, fileStatus: { $nin: [ 2, "DELETED" ] }, snapshotLastScanned: { $lt: 1409316400697 } }, u: { $inc: { unseenScans: 1 } }, multi: true } ] }) }

Command 2:

while (true) { db.files.aggregate([ { $match: { isForeign: false, fileStatus: { $in: [ NumberInt(0), NumberInt(1) ] }, isFolder: false, appId: NumberInt(11114) } }, { $group: { count: { $sum: NumberInt(1) }, _id: "$fileAccessLevel" } } ]) }

Comment by Roy [ 29/Aug/14 ]

This is reoccurring in my environment again and again...
I was now able to catch (I think) the aggregation that causes this segmentation fault:

2014-08-29T12:15:58.474+0000 [conn5643966] command Tenant_2.$cmd command: aggregate { aggregate: "files", pipeline: [ { $match: { isForeign: false, fileStatus: { $in: [ 0, 1 ] }, isFolder: false, appId: 11770 } }, { $group: { count: { $sum: 1 }, _id: "$fileAccessLevel" } } ] } keyUpdates:0 numYields:77 locks(micros) r:250867 reslen:255 776ms

Generated at Thu Feb 08 03:36:47 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.