Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-53024

Evaluate simdjson for reading JSON files in MQL queries

    XMLWordPrintableJSON

Details

    • Icon: Task Task
    • Resolution: Unresolved
    • Icon: Major - P3 Major - P3
    • None
    • None
    • Querying
    • None
    • Query Execution

    Description

      Atlas DataLake currently uses external data access agents (written in go) that parse data in various formats, convert to BSON and pass to the query processing process (written in cpp) over STDIN. For performance reasons, we are considering implementing parsing directly in cpp for the most common formats.

      JSON is one of the most popular used by our customers. At the moment, we use an external parser based on xdg-go/jibby. The point of this investigation is to measure performance of parsing such files with simdjson.

      We will model scanning files directly in the query processor with a new input MQL stage:

      {$collection: {path: <local path>, format: <format>}}

      we only consider format: 'json'.

      Attachments

        Activity

          People

            backlog-query-execution Backlog - Query Execution
            pawel.terlecki@mongodb.com Pawel Terlecki
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: