Aggregation is slower than expected on relatively simple collection

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Done
    • Priority: Major - P3
    • 2.6.0-rc0
    • Affects Version/s: 2.5.5
    • Component/s: Aggregation Framework
    • Environment:
    • Fully Compatible
    • ALL
    • None
    • 3
    • None
    • None
    • None
    • None
    • None
    • None

      Consider collection researchfacetsflat:

      > db.researchfacetsflat.findOne();
      {
              "_id" : ObjectId("52d06cd3c5c759ec4da99339"),
              "type" : "analyst",
              "value" : "James",
              "num" : 2
      }
      > db.researchfacetsflat.count();
      1200296
      

      The following "do nothing" command takes ~350ms
      db.researchfacetsflat.aggregate([ {$group: {_id: ''} } ]);

      This command which actually does the grouping we seek takes ~1550ms:

      db.researchfacetsflat.aggregate([ {$group: {_id: { tt: '$type', vv: '$value' }, tot: { $sum: 1 }} } ]);
      

      No combination of indexes on type and/or value improved performance, which is not a surprise because we want all the data.

      The equivalent command against postgres take ~375ms:
      select ftype, fval, count(ftype) from researchfacets group by ftype,fval
      The customer does not have these timings (yet) but believes that standard relational tech should run faster than 1500ms and our tests confirm this.

      What is also interesting is the equivalent query against the original collection with the facets in an array inside a beefy doc takes the same amount of time, about 1500ms!

      $ db.researchtags.aggregate([ 
      	     {$unwind: '$Facets'},
      	     {$group: {_id: '$Facets', count: {$sum: 1}}},
      	     {$group: {_id: '$_id.type',
      		       values: {$push: {Value: '$_id.value', Count: '$count'}}, 
      		       options: {$sum: 1}}}
      					     ]);
      
      > db.researchtags.findOne();
      {
      	"Abstract" : "",
      	"AnalystCoverages" : [
      		{
      			"Rank" : 1,
      			"Name" : "Software (US)",
      			"NameJ" : "",
      			"Weight" : "OVERWEIGHT",
      			"_id" : ""
      		}
      	],
      	"Analysts" : [
      		{
      			"EmailNotify" : true,
      			"UserId" : "<placeholder>",
      			"Name" : "James",
      			"NameJ" : "<placeholder>",
      			"Phone" : "123 456 789",
      			"Email" : "firstname.lastname@companyname.com",
      			"Rank" : 1
      		},
      		{
      			"EmailNotify" : true,
      			"UserId" : "<placeholder>",
      			"Name" : "chuen",
      			"NameJ" : "<placeholder>",
      			"Phone" : "123 456 789",
      			"Email" : "firstname.lastname@companyname.com",
      			"Rank" : 1
      		},
      		{
      			"EmailNotify" : true,
      			"UserId" : "<placeholder>",
      			"Name" : "clem",
      			"NameJ" : "<placeholder>",
      			"Phone" : "123 456 789",
      			"Email" : "firstname.lastname@companyname.com",
      			"Rank" : 1
      		},
      		{
      			"EmailNotify" : true,
      			"UserId" : "<placeholder>",
      			"Name" : "buzz",
      			"NameJ" : "<placeholder>",
      			"Phone" : "123 456 789",
      			"Email" : "firstname.lastname@companyname.com",
      			"Rank" : 1
      		}
      	],
      	"Companies" : null,
      	"CoverDate" : ISODate("2012-06-06T00:00:00Z"),
      	"Dept" : "Equities",
      	"Facets" : [
      		{
      			"type" : "analyst",
      			"value" : "James"
      		},
      		{
      			"type" : "analyst",
      			"value" : "chuen"
      		},
      		{
      			"type" : "analyst",
      			"value" : "clem"
      		},
      		{
      			"type" : "analyst",
      			"value" : "buzz"
      		},
      		{
      			"type" : "sector",
      			"value" : "Software"
      		},
      		{
      			"type" : "regcntry",
      			"value" : "Americas"
      		},
      		{
      			"type" : "regcntry",
      			"value" : "United States"
      		}
      	],
      	"FrontPageBullets" : " Bottom Line. We continue to argue that INTC will adjust to structural challenges in the core PC market better than expected, providing a baseline of profitability to give investors a call option on: (1) Moore’s Law, (2) DCG and (3) Other IA. Our analysis continues to suggest LT EPS of $2.50 plus – well ahead of the dire consensus view of $1.00-1.50. While we are lowering our CY14E EPS to be more in-line with the Company guidance and consensus – the investor day provided more proof points than concerns – specifically, (1) EPS reduction in CY14E is being driven by INCREASED investments in Tablets (core PC EPS is expected to be flat y/y – i.e. STABLE), (2) INTC re-iterated their 15% CAGR target for DCG, underpinning our view that DCG EPS can grow from $0.80-0.85 to $1.50, (3) LTE baseband is poised to show meaningful growth in CY14E and (4) INTC provided more of a forward leaning strategy on foundry, Tablets and Smartphones. While CapEx was guided flat y/y versus expectation of down y/y, INTC continues to leverage Moore’s Law scaling WITHOUT adding excess wafer capacity – greatly minimizing the risk of structural underutilization and depressed GMs. While flat EPS y/y in CY14E is not inspiring, we believe linearity of revenue/EPS next year will begin to support a return to growth by C2Q14E, providing support and upside to current stock price – reiterate OP and TP of $30..",
      	"GPHs" : [
      		{
      			"SubIndustry" : "",
      			"Industry" : "",
      			"Sector" : "Software",
      			"IndustryCode" : "",
      			"SectorCode" : "GPH3_272",
      			"SubIndustryCode" : "",
      			"GPHName" : "Software",
      			"GPHCode" : "GPH3_272"
      		}
      	],
      	"Headline" : "",
      	"Language" : "",
      	"MongoDocs" : [
      		{
      			"_id" : ObjectId("5293817c010ed10bbcc2210c"),
      			"filename" : "MSFT 25 November 2013.docx",
      			"filetype" : "docx",
      			"filesize" : 204349,
      			"DocId" : ObjectId("5293817a010ed10bbcc22108")
      		}
      	],
      	"RaveChartOptions" : null,
      	"RaveCharts" : null,
      	"Region" : "Americas / United States",
      	"RegionCountrys" : [
      		{
      			"Rank" : 0,
      			"Name" : "Americas",
      			"NameJ" : "",
      			"Code" : "Americas"
      		},
      		{
      			"Rank" : 0,
      			"Name" : "United States",
      			"NameJ" : "",
      			"Code" : "US"
      		}
      	],
      	"ReportSubType" : false,
      	"ReportTitle" : "Microsoft Corporation",
      	"ReportType" : "Company",
      	"Sector" : "Software",
      	"SeqNum" : 0,
      	"ShortSummary" : "",
      	"Subject" : "Company Update",
      	"Subjects" : null,
      	"WordSections" : null,
      	"_id" : ObjectId("52d06cd3c5c759ec4da99345"),
      	"logoImage" : null
      }
      
      > db.researchtags.count();
      200000
      

        1. bctest.js
          2 kB
          Buzz Moschetti
        2. make_data.js
          8 kB
          Buzz Moschetti
        3. test4.js
          2 kB
          Buzz Moschetti

            Assignee:
            Mathias Stearn
            Reporter:
            Buzz Moschetti (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: