[SERVER-15559] Fatal Exception: Deeply nested $cond drops mongod process Created: 08/Oct/14  Updated: 06/Dec/22  Resolved: 28/Mar/17

Status: Closed
Project: Core Server
Component/s: Aggregation Framework, Security, Stability
Affects Version/s: 2.6.4
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: Mark Shaw Assignee: Backlog - Query Team (Inactive)
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File mongo.txt    
Issue Links:
Related
is related to SERVER-8433 Aggregating deeply-nested documents c... Closed
is related to SERVER-13661 Increase the maximum allowed depth of... Closed
Assigned Teams:
Query
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

Startup: Default parameters (nothing special)
Mongo Version: 2.6.4
Driver: I'm using the official c# driver.
Server: I have a 2 + 1 replica set and arbiter setup on 2 Win2012 R2 servers.
Size: The collection is 18 million objects @ 17.1 GB.

The query: A large aggregate query @ 76K on disk. The nature of the query is that I am sending up a very deeply nested $cond statement to offset mongos lack of a relation join construct.

Participants:

 Description   

Not sure if this is the cause but I can easily reproduce an error that ultimate terminates the mongos process.

The Log is below, not sure if it helps as it doesn't appear to dump much information.

2014-09-29T11:33:15.759-0400 [initandlisten] connection accepted from 172.16.56.76:56969 #9845 (10 connections now open)
2014-09-29T11:33:24.262-0400 [conn9840] command Risk.$cmd command: aggregate { $msg: "query not recording (too large)" } keyUpdates:0 numYields:7 locks(micros) r:261753 reslen:29159 1083ms
2014-09-29T11:33:33.184-0400 [conn9841] command Risk.$cmd command: aggregate { $msg: "query not recording (too large)" } keyUpdates:0 numYields:7 locks(micros) r:261737 reslen:29159 1088ms
2014-09-29T11:33:36.965-0400 [conn9844] end connection 172.16.56.77:60524 (9 connections now open)
2014-09-29T11:33:36.965-0400 [initandlisten] connection accepted from 172.16.56.77:60526 #9846 (10 connections now open)
2014-09-29T11:33:45.954-0400 [conn9845] end connection 172.16.56.76:56969 (9 connections now open)
2014-09-29T11:33:45.954-0400 [initandlisten] connection accepted from 172.16.56.76:56974 #9847 (10 connections now open)
2014-09-29T11:33:46.065-0400 [conn9840] *** unhandled exception 0xC00000FD at 0x00007FF9D70DB2D4, terminating
2014-09-29T11:33:46.065-0400 [conn9840] *** stack trace for unhandled exception:



 Comments   
Comment by Kyle Suarez [ 28/Mar/17 ]

I'm closing this as a duplicate of SERVER-26703, which will reject any command that exceeds the depth limit. This aggregation pipeline is too deeply nested and will now be rejected by the server upfront.

Comment by Asya Kamsky [ 08/Oct/14 ]

mark.shaw@point72.com I didn't see you ask about this type of structure on Google Groups but you can avoid the super-long embedded structure by using some of the new 2.6 operators like $let and $map. You can see an example on Stackoverflow how to create a lookup based on an array of key value pairs - I tested it just now on 500+ array of key/value pairs and it works just fine.

FYI, the test I ran had an array of multipliers in format:

var multipliers = [
	{
		"key" : ISODate("2014-09-01T04:00:00Z"),
		"value" : 0
	},
	{
		"key" : ISODate("2014-09-02T04:00:00Z"),
		"value" : -0.00336433801701963
	},
	{
		"key" : ISODate("2014-09-03T04:00:00Z"),
		"value" : 0.00158856235107225
	},
	{
		"key" : ISODate("2014-09-04T04:00:00Z"),
		"value" : 0
	},
	{
		"key" : ISODate("2014-09-05T04:00:00Z"),
		"value" : 0.0043616177636796
	},
	{
		"key" : ISODate("2014-09-08T04:00:00Z"),
		"value" : -0.00394788787998412
	},
... // 500 key-value pairs
]

And I used the pipeline that's derived from yours:

var proj2= {
	"TradeDate" : 1,
	"SecurityID" : 1,
	"Pnl" : 1,
	"BetaDeltaNmvPreviousDay" : 1,
	"norm" : {
		"$let" : {
			"vars" : {
				"norm" : multipliers
			},
			"in" : {
				"$setDifference" : [
					{
						"$map" : {
							"input" : "$$norm",
							"as" : "norm",
							"in" : {
								"$cond" : [
									{
										"$eq" : [
											"$$norm.key",
											"$TradeDate"
										]
									},
									"$$norm.value",
									false
								]
							}
						}
					},
					[
						false
					]
				]
			}
		}
	}
}
{$match : {TradeDate:{$gt:ISODate("2012-08-30")}}},
{$project : proj2}, 
{$unwind : "$norm"}, 
{$group : {   _id:{ "TradeDate" : "$TradeDate", "SecurityID" : "$SecurityID" }, 
           "Pnl" : { "$sum" : "$Pnl" }, 
           "PnlCountryIndexBetaWeighted" : { "$sum" : {$multiply:[ "$BetaDeltaNmvPreviousDay","$norm"} } 
}})

This will work to give you the same result without triggering the deeply nested object which is clearly not being caught by the server correctly which is what this ticket will track.

Comment by Thomas Rueckstiess [ 08/Oct/14 ]

Hi Mark,

Looking at the $project pipeline stage you provided in the mongo.txt file, it seems that your nesting level is at least 518 levels deep (counting the consecutive closing braces at the end of the stage). MongoDB supports nested BSON documents only to a depth of 100 levels, as documented on our MongoDB Limits page.

So while this query is not supported, we can use this ticket to track improvements to MongoDB's behavior (i.e. not shutting down) when encountering such an invalid document.

With regular queries, we do enforce the nesting limit and return an error (the limit there was recently increased from 20 to 100, see SERVER-13661). It appears though that the aggregation framework has its own document parsing code, which may not enforce the limit yet, and quit due to a stack overflow. To help us confirm this, can you please attach the stack trace that is cut off in the log file output above?

Thanks,
Thomas

Comment by Mark Shaw [ 08/Oct/14 ]

You bet. Attached is the query that I'm sending up.

Comment by J Rassi [ 08/Oct/14 ]

Could you post the complete aggregation operation that your application is sending which triggers this exception?

Generated at Thu Feb 08 03:38:21 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.