[SERVER-61814] How should 'numericOrdering: true' handle decimals and negatives? Created: 30/Nov/21  Updated: 06/Dec/22  Resolved: 28/Jan/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Question Priority: Major - P3
Reporter: David Percy Assignee: Backlog - Query Execution
Resolution: Won't Do Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
Assigned Teams:
Query Execution
Participants:

 Description   

'numericOrdering: true' lets you sort strings numerically rather than lexicographically. A typical use would be something like:

> db.chapters.find({}, {_id: 0}).sort({title: 1}).collation({locale: 'en_US', numericOrdering: true})
{ "title" : "Book 1 Chapter 1" }
{ "title" : "Book 1 Chapter 2" }
...
{ "title" : "Book 1 Chapter 9" }
{ "title" : "Book 1 Chapter 10" }
{ "title" : "Book 1 Chapter 11" }
{ "title" : "Book 2 Chapter 1" }
...
{ "title" : "Book 9 Chapter 1" }
...
{ "title" : "Book 10 Chapter 1" }

Lexicographically, "10" < "9", but with 'numericOrdering: true' it treats any numeric portion of the string as a number. There can be several numbers in one string.

This leads to some confusing behavior if you want to sort decimals or negative numbers:

> db.c.find({}, {_id: 0}).sort({n: 1}).collation({locale: 'en_US', numericOrdering: true})
{ "n" : "-1" }
{ "n" : "-2" }
{ "n" : "2.1" }
{ "n" : "2.2" }
{ "n" : "2.10" }
{ "n" : "2.20" }

The dot and minus sign are not interpreted as part of the number. So:

  • "-1" < "-2" because "1" < "2"
  • "2.2" < "2.10" because "2" < "10"

This behavior is useful for sorting version strings, where the dot is really a separator and not a decimal point. You would want "4.4.9" < "4.4.10" and "4.9" < "4.10".

We should maybe document this more clearly. We could also consider adding another kind of collation, that interprets more kinds of strings numerically.



 Comments   
Comment by Ana Meza [ 28/Jan/22 ]

As agreed during the Quick Wins Triage, we are closing this one as Won't do

Comment by David Percy [ 30/Nov/21 ]

In the docs we could mention that our current behavior is consistent with / inherited from the collation library:

A "digit" in this sense is a code point with General_Category=Nd, which does not include circled numbers, roman numerals, etc. Only a contiguous digit substring is considered, that is, non-negative integers without separators. There is no support for plus/minus signs, decimals, exponents, etc.

https://unicode-org.github.io/icu-docs/apidoc/dev/icu4c/ucol_8h.html#a583fbe7fc4a850e2fcc692e766d2826ca2efca83794416797ef04abda570c6f5b

Comment by David Percy [ 30/Nov/21 ]

Accepting a wider variety of number-like strings opens up more questions, like:

  • Does it accept floating-point literals? "1e10"
  • Does it accept fractions: "1/3", and does it round them before sorting?
  • Does it accept other bases? "0xba5eba11"
  • Does it behave the same as $convert?
Generated at Thu Feb 08 05:53:23 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.