[SERVER-78439] Investigate IDL generated parser performance Created: 26/Jun/23  Updated: 28/Nov/23  Resolved: 24/Jul/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 7.1.0-rc0

Type: Task Priority: Major - P3
Reporter: Mark Benvenuto Assignee: Mark Benvenuto
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-81876 Improve IDL code generation for comma... Closed
related to SERVER-83671 Implement lookup tries for IDL parse ... In Progress
related to SERVER-82940 Use tries for IDL enumeration lookups Closed
Backwards Compatibility: Fully Compatible
Sprint: Security 2023-07-10, Security 2023-07-24, Security 2023-08-07
Participants:

 Description   

IDL generated parsers cost O(N x M) where N = number of fields in the document and M = number of fields defined in IDL. This is due to the fact that each field in a document is checked against each field in the IDL until one is found. Over releases, the value of M has grown larger and the IDL generated parsers are a source of release over release slow down.

I have briefly investigated using a generated trie instead of a linear scan. I should do more investigation since the results were inconclusive. I should consider a full trie for instance and maybe a hybrid trie (say 1 level deep) approach.

I should microbenchmark different sizes of M and on different documents of N. The most important two cases are findOne, small agg and insertOne. Slower commands will not be impacted by this slowdown.

Finally, one other issue in the generated IDL parsers for commands is how generic args are handled. The generic args are inefficiently checked today as they are always checked after all M IDL fields are checked. Now that generic args are part of IDL, we should evaluate merging them into the command parsers as ignored fields.



 Comments   
Comment by Githook User [ 24/Jul/23 ]

Author:

{'name': 'Mark Benvenuto', 'email': 'mark.benvenuto@mongodb.com', 'username': 'markbenvenuto'}

Message: SERVER-78439 Improve IDL generated parser performance
Branch: master
https://github.com/mongodb/mongo/commit/003346994ba24b12de67c4883263ceca393d5c32

Generated at Thu Feb 08 06:38:21 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.