-
Type: Investigation
-
Resolution: Done
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: GAI
Test using GPT 4 turbo for query and aggregation generation (instead of 3.5 turbo). We know that it is both more expensive and slower. It does however perform better on a number of benchmarks which indicate it can create more elaborate pipelines with more accuracy.
https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo
`gpt-4-turbo-preview` is trained on data up to December 2023, while our current `gpt-3.5-turbo` only has data up to September 2021.
To gauge the feasibility of switching models we will perform the generative AI accuracy tests a number of times with the new model and assess the:
Cost
How much does running this new model with our usual token usages cost in comparison? This is something we can mostly quantify outside of actually running the model as the tokenizer is publicly available, however we will not fully know until we get the token counts from the generations.
Accuracy
What % improvement on the accuracy tests does the newer model have? Does it support newer aggregation syntax?
Speed
Less than 2x speed at least. To be measured with updated accuracy test results.
Availability
What are the regions? Can we do the task of having a failover region like we planned for GPT 3.5
Fine-tuning capability (milestone 4 support)
This may be less required if the training data (up to December 2023) has knowledge of newer MongoDB syntax out of the box.
Once we’ve looked at the results of these various indicators we will make a decision on if we should use this newer model in our backend.