In the JSON editor values inside arrays might lose "unnecessary" types

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Environment:
      OS:
      node.js / npm versions:
      Additional info:
    • None
    • None
    • Developer Tools

      This is a follow-up to https://jira.mongodb.org/browse/COMPASS-8337

       

      Background:

      Numbers in JavaScript are all stored as doubles under the hood. So a number in JSON is just a double. Therefore ordinarily a "round number" (ie. an integer) would be stored in JavaScript as a double. Our EJSON parsing plus HadronDocument storage goes out of its way to look at a number like `1` and decide if it should store it in mongodb as a Double or an Int32 or a Long and it prefers Int32. Efficiency, I guess.

      If you have a document that contains a field that is stored as a Double but it happens to be a round number, then it will be inferred to be an Int32. Similar for a Long if the number would fit into an Int32. I call these numbers "unnecessary" doubles or longs.

      Because we don't include this type information when we serialize to JSON, we re-infer when the user saves and then we might end up with type Int32 and therefore lose the "unnecessary" type.

       

      Values in arrays can still lose their "unnecessary" types:

      In https://github.com/mongodb-js/compass/pull/7119 I made changes so that for properties (either at the top level or inside a nested object) it will try and preserve Double or Long if it was one of those before and the value was inferred as an Int32. (Lots of context on that PR, please read it as I'm not copy/pasting it all here) BUT I left Int32 values inside arrays alone.

      This is because: The array order could have changed, values could have been added or removed and values could have been edited. How would we even detect if the user intends to have homogenous types in the array and if they care if it is a mix? But also what happens with edge cases like if the original had a mix of "unnecessary" Double s and Longs? Which type should we change the Int32s to? What if it had all three number types before, what do we do then? I ended up just leaving arrays alone.

       

      The one use case I can think of is in embedding or vector data where it is super unlikely that the values would happen to be round numbers, but it could probably happen in test data. And then it might be weird if there are ints mixed in with doubles. There was a ticket about that early on in vector search's life which I cannot find right now. I think they just ended up dealing with the fact that numbers might become integers.

       

      Idea:

      • if the entire array was doubles before, just turn all int32s into doubles in the new one
      • if the entire array was longs before, just turn all int32s into longs in the new one

      This won't work for heterogenous arrays. It will also mean we keep all the types even for new values. This is probably the simplest, safest case I can think of. Probably the most common use case where people would care deeply about the type too.

      We would just do this, probably 2 points. But it does send us further down this rabbit hole and is harder to understand than "arrays are left alone". So I wouldn't make the change without buy-in and being sure it will actually help.

            Assignee:
            Unassigned
            Reporter:
            Le Roux Bodenstein
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated: