[SERVER-39057] Add distance expressions for image feature comparison Created: 16/Jan/19 Updated: 06/Dec/22 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | New Feature | Priority: | Major - P3 |
| Reporter: | Kelsey Schubert | Assignee: | Backlog - Query Optimization |
| Resolution: | Unresolved | Votes: | 1 |
| Labels: | pull-request | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Assigned Teams: |
Query Optimization
|
||||
| Participants: | |||||
| Description |
|
This ticket tracks the work contained in Pull Request #1291.
|
| Comments |
| Comment by Asya Kamsky [ 20/Sep/19 ] |
|
Marc, I'm sorry about the long delay to let you know that unfortunately, we will not be able to accept this pull request. I'd like to outline a few reasons why this can't be merged: The PR proposes several generic operations for computing the Euclidean, squared Euclidean, cosine similarity, Chi-squared and Manhattan distances between two N-dimensional vectors. Adding these particular vector operations would invariably produce subsequent requests to backfill more basic operations (vector addition, scalar times vector, dot product, etc.) Just considering distance measures, why those four in particular? SciPy provides a couple dozen, and OpenCV provides 4, albeit a different 4. These types of functions might be a great addition to enhancing our analytics capabilities, but we feel it should only be done as part of a broader effort to add more computational operations. This should be probably spec'ed out as part of full slate of related functions, e.g. numeric vector and matrix operations. Another related concern is about implementation for the new expressions – it is somewhat non-standard relative to existing expressions; for instance, the vectors themselves are passed in as raw float* with a separate parameter to indicate their length. In fact, it is likely that we would want to add something even for simple vectors that considers best storage format, possibly a new type of arrays that contain a single type, which is tracked in SERVER-9380. Again, I apologize that it took me so long to get back to you on this, and thank you for your interest in contributing to MongoDB! Asya Kamsky |