Extended JSON encoders should reject un-encodable documents (e.g., $-prefixed strings)

XMLWordPrintableJSON

    • Type: Spec Change
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Component/s: Extended JSON
    • None
    • Needed
    • Hide

      Summary of necessary driver changes

      •  

      Commits for syncing spec/prose tests
      (and/or refer to an existing language POC if needed)

      •  

      Context for other referenced/linked tickets

      •  
      Show
      Summary of necessary driver changes   Commits for syncing spec/prose tests (and/or refer to an existing language POC if needed)   Context for other referenced/linked tickets  

      Summary

      Extended JSON cannot reliably encode documents with $-prefixed strings. Therefore, it should fail rather than returning an incorrect document.

      Motivation

      Consider:

      > EJSON.parse(EJSON.stringify({ "x": { "$date": "2020-01-01T00:00:00Z" } }))
      { x: ISODate('2020-01-01T00:00:00.000Z') }
      

      Here a document with a string value at /x/$date became, after an extJSON round-trip, a document with a BSON date at /x.

      This happened because extJSON cannot encode the original document. Out encoder instead outputs a document with a BSON date, which corrupts the given data.

      There is no corruption if the $-prefixed string doesn’t correspond to a reserved extJSON key. For example, this works:

      > EJSON.parse(EJSON.stringify({ "x": { "$foo": "2020-01-01T00:00:00Z" } }))
      { x: { '$foo': '2020-01-01T00:00:00Z' } }
      

      This inconsistency, however, is hard for end users to reason about. It’s saner & more consistent just to forbid $-prefixed strings categorically when encoding/marshaling extJSON, in the same way that extJSON already rejects field names with NUL bytes.

      Who is the affected end user?

      MongoDB internal teams, Atlas backups, end users

      How does this affect the end user?

      Views, validators, & partial index filters can be corrupted on dump/restore since that uses extJSON. (cf. TOOLS-3611)

      Any other downstream users who use extJSON will face similar frustrations.

      How likely is it that this problem or use case will occur?

      It’s not at all unreasonable. MongoDB 5.0 allows $-prefixed keys, and some users do use export/import to have easily-introspectable backups. See also: HELP-92217

      If the problem does occur, what are the consequences and how severe are they?

      Once TOOLS-4184 merges, any user with a field name like $date in an embedded document will lose data if they export then import their data.

      Likewise, any downstream users who round-trip data through extJSON can lose data in the same way.

      Is this issue urgent?

      Not urgent; this is a longstanding problem.

      Is this ticket required by a downstream team?

      It’s the best solution for problems like TOOLS-3611 and TOOLS-4184. (See also: GODRIVER-3273)

      Is this ticket only for tests?

      No.

      Acceptance Criteria

      Extended JSON encoders should refuse, at least by default, to try to marshal a document that extJSON cannot express.

            Assignee:
            Unassigned
            Reporter:
            Felipe Gasper
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: