-
Type: Task
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Querying
-
None
-
Query Execution
Atlas DataLake currently uses external data access agents (written in go) that parse data in various formats, convert to BSON and pass to the query processing process (written in cpp) over STDIN. For performance reasons, we are considering implementing parsing directly in cpp for the most common formats.
JSON is one of the most popular used by our customers. At the moment, we use an external parser based on xdg-go/jibby. The point of this investigation is to measure performance of parsing such files with simdjson.
We will model scanning files directly in the query processor with a new input MQL stage:
{$collection: {path: <local path>, format: <format>}}
we only consider format: 'json'.