[Spike] Updating vector search integrations to use BSON BinaryVector

XMLWordPrintableJSON

    • Type: Task
    • Resolution: Unresolved
    • Priority: Unknown
    • None
    • Affects Version/s: None
    • Component/s: AI/ML
    • None
    • Python Drivers
    • None
    • None
    • None
    • None
    • None
    • None

      Context

      Describe the background behind the problem.

      All of our existing work simply stores python floats. These are not contiguous in memory or on disk. They are also doubles (float64) whereas Lucene only uses float32 so each value in the list is downcast each time we perform a search.

      Definition of done

      What must be done to consider the task complete?

      THE TARGET LIBRARY IS LANGCHAIN

      Search and inventory our implementations of embedding vectors. Implement encoding outputs of embedding models to binary vectors. Perform benchmarks to compare.

      Pitfalls

      Python is slow. The benefits mentioned above may be offset by the cost to encode floats.

              Assignee:
              Unassigned
              Reporter:
              Casey Clements
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: