[SERVER-42020] Add SumOfProduct and ProductOfSum accumulators Created: 01/Jul/19  Updated: 30/Aug/19  Resolved: 03/Aug/19

Status: Closed
Project: Core Server
Component/s: Build
Affects Version/s: None
Fix Version/s: None

Type: New Feature Priority: Trivial - P5
Reporter: Vijaya Bhaskar Assignee: Asya Kamsky
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Microsoft Word SERVER-42020.docx    
Issue Links:
Related
related to SERVER-31991 Allow n-ary aggregation expressions t... Backlog
Participants:

 Description   

This is just a question.  I feel this type of functions are needed in a product like mongoDB.

Are there functions which help compute SOP(sumofproduct) and POS(product of sum)? 

I have built queries for the same and available on GITHUB.

https://github.com/vijayabhaskar79/mongodb-queries

May be this will be helpful to understand what i am trying to say.

 

 



 Comments   
Comment by Asya Kamsky [ 03/Aug/19 ]

Provided a response to the question (how to perform these functions) I'm closing this ticket.  If you have further questions feel free to start a discussion on MongoDB-user Google Group.

Comment by Asya Kamsky [ 25/Jul/19 ]

For completeness, currently productOfSums should be written like this:

db.pos.aggregate([{$addFields:{_id:"$$REMOVE", productOfSums: {
    $reduce:{
        input:{$map:{
            input:{$zip:{inputs:["$a1","$a2"]}},
            in:{$sum:"$$this"}
        }},
        initialValue: 1,
        in: {$multiply:["$$this","$$value"]}
    }
}}}])
{ "a1" : [ 2, 3, 4, 5, 3 ], "a2" : [ 4, 5, 4, 7, 5 ], "productOfSums" : 36864 }

Comment by Asya Kamsky [ 25/Jul/19 ]

vbonline79@gmail.com there is already a $zip expression which combines a given number of arrays into a single array (with your example of a1 and a2 the result would be {{[ [2,4], [3,5], [4,4], [5,7], [3,5]] }}

Combining that with existing $multiply and $sum, you get sumProd via:

db.pos.find({},{_id:0})
{"a1" : [ 2, 3, 4, 5, 3 ], "a2" : [ 4, 5, 4, 7, 5 ] }
db.pos.aggregate([{$addFields:{_id:"$$REMOVE", sumOfProduct: {
    $sum:{$map:{
        input:{$zip:{inputs:["$a1","$a2"]}},
        in:{$multiply:[{$arrayElemAt:["$$this",0]},{$arrayElemAt:["$$this",1]}]}
    }}
}}}])
{ "a1" : [ 2, 3, 4, 5, 3 ], "a2" : [ 4, 5, 4, 7, 5 ], "sumOfProduct" : 89 }

$zips has options to handle different length arrays and can accept multiple arrays (not just two).

I suspect that the ticket that would help writing sum of products or products of sum even more is SERVER-31991 that is if you could use $multiply (or $product) the way you can use $sum on an expression that's an array, because then the syntax would be simply:

// does not currently work because $multiply does not accept an array but rather an array of numbers SERVER-31991
sumOfProduct:{$sum:{$map:{input:{$zip:{inputs:["$a1","$a2"]}},in:{$multiply:"$$this"}}}}
productOfSums:{$multiply:{$map:{input:{$zip:{inputs:["$a1","$a2"]}},in:{$sum:"$$this"}}}}

Would you agree that if $multiply took an array expression the way $sum does, that would make it simple enough to write various expressions like you are asking for, or do you still think you would want a dedicated sum of products and product of sums? If so why those two and not all the other possible ones?

Comment by Asya Kamsky [ 25/Jul/19 ]

Asya – I was looking for functions similar to DataFrame.cumsum() and DataFrame.cumprod() available in python or function SUMPRODUCT() available in MS Excel where
1. Two arrays are multiplied over same axis and summed
2. Two arrays are summed on same axis and then multiplied
Screen shot of the calculation of SUMPRODUCT from excel
  
Array1   2  3  4  5  3
Array2   4  5  4  7  5
Multiply 8 15 16 35 15
SumProd     89
 
I could not see any equivalent operators in mongodb. Though using MAP and REDUCE expressions serve the purpose, however it operates on a single array.
I wanted to have a single function which will take two arrays (or more) and produce the said output (sumofproduct or productofsum).
For a financial analyst these functions come in handy. While calculating employee’s efficiency (amount of work produced across multiple subprocess and finally arrive at the total time taken) this was advantageous.
These two mathematical terms were mind boggling when we were in school
 

Comment by Vijaya Bhaskar [ 25/Jul/19 ]

asya.kamsky@mongodb.com - I have attached word file with the detailed description. Hope this helps.

Feel free to reach out to me in case you need clarification.

Comment by Asya Kamsky [ 24/Jul/19 ]

vbonline79@gmail.com I'm trying to understand what the application of this would be and what's missing in the aggregation language for you to be able to do this easily.

How do you see the proposed new operator/expression working?

Comment by Vijaya Bhaskar [ 04/Jul/19 ]

Hi all,

Here i have used $map and $reduce functions to arrive at the value and used same array field.

However, the requirement would be use 2 different field arrays of same size to compute SOP or POS. 

In POS, if the array is null, we will not get the desired output. Hence the first project stage to validate the size of array.

Comment by Vijaya Bhaskar [ 04/Jul/19 ]

 Thanks a lot eric.sedor@10gen.com

Appreciate yours and rest of the teams time in making effort to review and considering it.

Comment by Eric Sedor [ 03/Jul/19 ]

Thanks vbonline79@gmail.com. We will consider this request against our currently planned work. I am also transferring your GitHub content to this comment to preserve it here in Jira.

Current method of computing ProductOfSum:

//Sample collection
db.grades.insertMany([{ _id: 1, quizzes: [ 1, 2, 3 ] },
{ _id: 2, quizzes: [ ] },
{ _id: 3, quizzes: [ 3, 8, 9 ] }])
 
//Product of Sum query
db.grades.aggregate(
   [{$project:
		{quizzes:
			{$cond:
				[{$eq: [{$size: "$quizzes"}, 0]} , [0], "$quizzes"] 
			} 
		}
	},
 
	{ $project:
        { sumof:
            {
              $map:
                {
                   input: "$quizzes",
                   as: "grade",
                   in: { $sum :[ "$$grade", "$$grade" ] }
                }
            }
        }
    },
	
	{	$project :
		{ "productOfSum":
			{ $reduce :
				{ input : '$sumof',
					initialValue: 1,
				in : {$multiply :["$$value","$$this"]}
				}
			}
		}
	}	
	]
)
//Output of the above query
 
{"_id":1,"productOfSum":48}
{"_id":2,"productOfSum":0}
{"_id":3,"productOfSum":1728}
 

Current method of computing SumOfProduct:

//Sample collection
db.grades.insertMany([{ _id: 1, quizzes: [ 1, 2, 3 ] },
{ _id: 2, quizzes: [ ] },
{ _id: 3, quizzes: [ 3, 8, 9 ] }])
 
Need a query to compute sum of products on quizzes.
	  
// Sum Of Product
db.grades.aggregate([
	{$project :
		{ "product":
			{
				$map : 
				{ 
					input : '$quizzes',
					as : "sum1",
					in : {$multiply :['$$sum1','$$sum1']}
				}
			}
		}
	},
	{$project :
		{ "SumOfProduct":
			{ $reduce :
				{ input : '$product',
					initialValue: 0,
				in : {$add :["$$value","$$this"]}
				}
			}
		}
	}
]);
 
//Output 
 
{"_id":1,"SumOfProduct":14}
{"_id":2,"SumOfProduct":0}
{"_id":3,"SumOfProduct":154}

Comment by Vijaya Bhaskar [ 03/Jul/19 ]

Yes, this is a new feature which can be added.

Generated at Thu Feb 08 04:59:19 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.