[SERVER-71428] Expose the array index in a $map operation Created: 17/Nov/22  Updated: 18/Jan/23

Status: Open
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Peter Williamson Assignee: Backlog - Query Optimization
Resolution: Unresolved Votes: 1
Labels: expression
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Query Optimization
Participants:

 Description   

Currently the only way to know the element index in a $map is to create a "for loop" using $range:

{$map: {
  input: {$range: [0, {$size: "$arr"}]},
  as: "x",
  in: {index: "$x", value: {$arrayElemAt: ["$arr", "$$x"]}}

 
A simpler and more understandable option would be to have a meta variable such as $$index which is automatically set to the current array index. If would have no impact on existing aggregation pipelines but would simplify situations where the array index is needed, changing the example above to:

{$map: {input "$arr", as "x", in: {index: "$$index", value "$$x"}}}



 Comments   
Comment by Peter Williamson [ 05/Jan/23 ]

I still think includeArrayIndex: <variable name>  provides consistency with $unwind

Comment by Asya Kamsky [ 05/Jan/23 ]

We don't disallow this to be a user defined variable ... not sure what precedent that sets...

 

Comment by Charlie Swanson [ 12/Dec/22 ]

peter.williamson@mongodb.com how about something like this, emulating "destructured bindings/assignments" concepts of many languages (including the famed functional languages):

{$map: {
  input: {$zipWithIndex: "$myArr"},
  as: ["this", "idx"],
  in: /* whatever */
}}

docs from JS (first google result for me, a JS user)

Comment by Peter Williamson [ 11/Dec/22 ]

If aggregation had a more compact syntax for referencing array indexes this might be acceptable, but my client is working with 2 dimensional arrays so I don't see zipWithIndex as being any more convenient that the $range syntax described in the problem description.

Comment by Jacob Evans [ 11/Dec/22 ]

In some functional languages this is handled in a more general way by having a "zipWithIndex" function, rather than adding complexity to map. So for an array

["a", "b", "c"]

running {$zipWithIndex: ["a", "b", "c"]} produces

[["a", 0], ["b", 1], ["c", 2]]

which can than be mapped over without needing to bind variables. This also then produces useful input for other higher-order functions such as $reduce. Just thought I'd throw out a more composable alternative. I think this also obviates Charlie's concerns as long as folks find the syntax pleasant enough

Comment by Peter Williamson [ 05/Dec/22 ]

In $unwind includeArrayIndex is a string which is the field name, I'd vote for option #1 with includeArrayIndex: <variable name> working in the same was as  as: 

Comment by Charlie Swanson [ 05/Dec/22 ]

My opinion is that this should be pretty straightforward to achieve, but we'd need to be careful not to be backwards-breaking if someone was already using '$$index' as a declared variable in scope. I see three options, where I'd personally prefer the first:
1) Introduce this feature via an opt-in mechanism. Most obviously would be a parameter to $map like 'includeArrayIndex', or we could introduce a new $mapWithIndex or some other name if that felt more fluent/natural. We have a similar opt-in ability in $unwind with 'includeArrayIndex': docs.
2) Use a reserved variable name like '$$INDEX'. This would avoid name collisions since user variables are not allowed to start with capital letters, but it would introduce some strange asymmetry since we have the variable '$$this' in scope as the default name for 'as'.
3) Introduce this in a future API version, where we can document the danger of colliding with a pre-defined "$$index" variable. Maybe we could add a 'arrayIndexAs' option to easily give it a different name rather than forcing a programmer to update all their references to 'index'.

If we go the first route - it should be relatively straightforward to implement. It looks like we already have the index available in scope the classic engine's implementation, we'd just need to do the variable accounting like we do for the "as" variable. It looks like $map isn't supported in the new SBE engine yet, but I can't imagine this would be hard to do. At the very least we could translate with a $range like it is demonstrated in the description.

So my vote is that we could schedule this as a quick win and take option 1 above, re-using the name precedence from $unwind and calling it 'includeArrayIndex' as an optional boolean flag. Also worth considering allowing the user to name the variable via another parameter, but that could definitely be future work.

Generated at Thu Feb 08 06:18:58 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.