Uploaded image for project: 'Compass '
  1. Compass
  2. COMPASS-7130

Update schema format from mongodb-schema to pass highest probability

    • Type: Icon: Task Task
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • 1.41.0
    • Affects Version/s: None
    • Component/s: GAI, Schema
    • Labels:
      None
    • 5
    • Not Needed
    • Iteration Minmi, Iteration Nodosaurus

      We now pass only one type in our prompt. Let's update the getSimplifiedSchema in mongodb-schema to have the highest probability type first in the types array.

      Old description v
      This ticket involves a bit of an investigation.
      We're currently returning a simplified schema generated in `mongodb-schema`: https://github.com/mongodb-js/mongodb-schema/blob/c15e4b70972163182a1189ccb64533d5fee154b4/src/schema-analyzer.ts#L580 
      This schema representation takes up a decent amount of characters as it represents everything with an object like format, with an array of types and then possible subfields inside of it.
      We'd like to have the format take up less tokens if possible. Additionally, if possible, we'd like the ai to better understand the schema.
      This ticket involves seeing if a condensed format is better interpreted by the ai model, and using it if so.

      Something to reference, Maurizio did a bit of work on a more condensed schema representation in this branch which is on the receiving end of the schema: https://github.com/10gen/compass-mongodb-com/compare/mql-only-poc. It probably makes sense to keep the schema being sent to the server with some detail and then condense into prompt shape on the backend, so that we have more freedom over changing the prompt on the fly.

            Assignee:
            rhys.howell@mongodb.com Rhys Howell
            Reporter:
            rhys.howell@mongodb.com Rhys Howell
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: