-
Type:
Question
-
Resolution: Done
-
Priority:
Critical - P2
-
None
-
Affects Version/s: 4.0.1
-
Component/s: Aggregation Framework
-
None
-
None
-
0
-
None
-
None
-
None
-
None
-
None
-
None
I am working on a project where we will have almost 5 million documents in a collection. And each document's size will be around 18571 bytes having 120 to 150 fields.
I have to return my response in less than 1 second. And mongo query will perform almost 10 to 15 group by in faceted query on maximum 3,00,000 documents.
This is my first time handling this much of data where I have to return responses in real time.
I have implemented indexes and they reduced response time to 5 to 6 seconds but I still need it in less then 1 second.
*Below is sample query:*
db.sample.aggregation(
"$match":{
"$and":[
,
,
{"year":{"$gte":<some-year>,"$lte":<some-year>}},
{"seller_id":{"$in":[<array-of-seller-ids-may-have-40,000-seller-ids>]}}
]
},
{
"$facet":{
"data":[{
"$project":
}],
"count":[{"$group":{"_id":"$_id"}},
],
"price":[{"$bucketAuto":{"groupBy":"$price", "buckets":1}}],
<will have 12-15 group by>
}
}
)
*Below is sample document:*
{
"_id" : "KNDMC5C11J6394584",
"brand_id" : 22,
"brand_name" : "XYZ",
"abc_id" : 1234567890,
"city" : "Gurgaon, IN",
"fluctuation" : 18,
"created_at" : ISODate("2018-08-17T06:08:12.940Z"),
"release_data" : "2018-06-29",
"seller_name" : "Seller name",
"seller_price" : 34890,
"seller_rating" : 4,
"seller_zip" : "12550",
"feature1" : "ABC",
"feature2" : 3300,
"feature3" : "AB",
"expected_price" : -1,
"exterior_color" : "Unknown",
"registered_dealer" : true,
"registered_brand" : "ABC",
"fluctuation_rate" : 20.700000000000003,
"fluctuation_type" : 2,
"fluc_type_name" : "Something",
"has_patents" : false,
"tested_frequency" : 24,
"interior_color" : "---",
"is_certified" : false,
"is_certified_iso" : false,
"is_featured" : false,
"is_new" : true,
"is_certified_bhel" : false,
"location" :
,
"max_input" : 8,
"feature4" : 3,
"feature5" : 206,
"feature6" : "Something",
"monthly_payment" : 649,
"msrp" : 34890,
"feature7" : false,
"seller_id" : 123567890,
"product_family_name" : "abc",
"product_id" : 15,
"product_name" : "Something",
"reflection" : "Something",
"fluc_id" : 2312,
"fluc_name" : "something something (abc) ac",
"updated_at" : ISODate("2018-09-11T17:59:36.889Z"),
"product_damage_category" : "None",
"year" : 2018,
"damage_check" : "-",
"team_size" : "-",
"Technology" :
}
*Below is the explain output*
{
"stages" : [
{
"$cursor" : {
"query" : {
"$and" : [
,
{
"year" :
},
{
"sp_id" :
}
]
},
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "test_collection.col",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [
{
"is_new" :
},
{
"year" :
},
{
"year" :
},
{
"sp_id" :
}
]
},
"winningPlan" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" :
,
"indexName" : "is_new_1_year_1_sp_id_1",
"isMultiKey" : false,
"multiKeyPaths" :
,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" :
}
},
"rejectedPlans" : [ ]
}
}
},
{
"$facet" : {
"data" : [
{
"$project" :
}
],
"count" : [
{
"$group" :
},
{
"$group" : {
"_id" :
,
"count" : {
"$sum" :
}
}
},
{
"$project" :
}
],
"feature1" : [
{
"$match" : {
"feature1" :
}
},
{
"$group" : {
"_id" : "$feature1",
"name" :
,
"count" : {
"$sum" :
}
}
},
{
"$sort" : {
"sortKey" :
}
}
],
"feature2" : [
{
"$match" : {
"feature2" :
}
},
{
"$group" : {
"_id" : "$feature2",
"name" :
,
"count" : {
"$sum" :
}
}
},
{
"$sort" : {
"sortKey" :
}
}
],
"feature3" : [
{
"$match" : {
"feature3" :
}
},
{
"$group" : {
"_id" : "$feature3",
"name" :
,
"count" : {
"$sum" :
}
}
},
{
"$sort" : {
"sortKey" :
}
}
],
"feature4" : [
{
"$match" : {
"feature4" :
}
},
{
"$group" : {
"_id" : "$feature4",
"name" :
,
"count" : {
"$sum" :
}
}
},
{
"$sort" : {
"sortKey" :
}
}
],
"feature5" : [
{
"$match" : {
"feature5" :
}
},
{
"$group" : {
"_id" : "$feature5",
"name" :
,
"count" : {
"$sum" :
}
}
},
{
"$sort" : {
"sortKey" :
}
}
],
"feature6" : [
{
"$match" : {
"feature6" :
}
},
{
"$group" : {
"_id" : "$feature6",
"name" :
,
"count" : {
"$sum" :
}
}
},
{
"$sort" : {
"sortKey" :
}
}
],
"feature7" : [
{
"$match" : {
"feature7" :
}
},
{
"$group" : {
"_id" : "$feature7",
"name" :
,
"count" : {
"$sum" :
}
}
},
{
"$sort" : {
"sortKey" :
}
}
],
"feature8" : [
{
"$match" : {
"feature8" :
}
},
{
"$group" : {
"_id" : "$feature8",
"name" :
,
"count" : {
"$sum" :
}
}
},
{
"$sort" : {
"sortKey" :
}
}
],
"feature9" : [
{
"$match" : {
"feature9" :
}
},
{
"$group" : {
"_id" : "$feature9",
"name" :
,
"count" : {
"$sum" :
}
}
},
{
"$sort" : {
"sortKey" :
}
}
],
"feature10" : [
{
"$match" : {
"feature10" :
}
},
{
"$group" : {
"_id" : "$feature10",
"name" :
,
"count" : {
"$sum" :
}
}
},
{
"$sort" : {
"sortKey" :
}
}
],
"feature11" : [
{
"$match" : {
"feature11" :
}
},
{
"$bucketAuto" : {
"groupBy" : "$feature11",
"buckets" : 1,
"output" : {
"count" : {
"$sum" :
}
}
}
}
],
"feature12" : [
{
"$bucketAuto" : {
"groupBy" : "$feature11",
"buckets" : 1,
"output" : {
"count" : {
"$sum" :
}
}
}
}
]
}
}
],
"ok" : 1
}
If this information is not complete for the solution. I will provide more.
I am stuck on this from last 1 month.
Any help would be appreciated.