add support for new Auto Embedding index and queries

XMLWordPrintableJSON

    • 1
    • None
    • Hide

      1. What would you like to communicate to the user about this feature?
      2. Would you like the user to see examples of the syntax and/or executable code and its output?
      3. Which versions of the driver/connector does this apply to?

      Show
      1. What would you like to communicate to the user about this feature? 2. Would you like the user to see examples of the syntax and/or executable code and its output? 3. Which versions of the driver/connector does this apply to?
    • None
    • None
    • None
    • None
    • None
    • None

      This ticket was split from DRIVERS-3315, please see that ticket for a detailed description.


      Overview

      The server is rolling out a new Auto Embedding Index feature. This allows developers to automate vector generation for text fields, removing the need for external embedding pipelines.

      Goal: Enable drivers to support the new autoEmbed field type and the updated vectorSearch query syntax.

      Usage Example

      // Index Definition: Create an auto-embedding index
      await collection.createSearchIndex({
        name: 'product_auto_embed_idx',
        type: 'vectorSearch',
        definition: {
          fields: [
            {
              type: 'autoEmbed',  // New index type!
              modality: 'text',
              path: 'description',
              model: 'voyage-4'
            },
            {
              type: 'filter',
              path: 'author'
            }
          ]
        }
      });
      // Insert documents (no manual embeddings needed)
      await collection.insertOne({
        description: 'Wireless headphones with noise canceling',
        author: 'TechCorp'
      });
      
      // ============================================
      // Manual embedding generation (No longer needed!!)
      // ============================================
      // Previously, users had to:
      //
      // 1. Set up external embedding service
      // const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
      //
      // 2. Generate embeddings for documents
      // const docEmbedding = await openai.embeddings.create({
      //   model: 'text-embedding-ada-002',
      //   input: 'Wireless headphones with noise canceling'
      // });
      //
      // 3. Store embeddings in documents
      // await collection.updateOne(
      //   { description: 'Wireless headphones with noise canceling' },
      //   { $set: { embedding: docEmbedding.data[0].embedding } }
      // );
      //
      // 4. Generate embeddings for queries
      // const queryEmbedding = await openai.embeddings.create({
      //   model: 'text-embedding-ada-002',
      //   input: 'audio equipment'
      // });
      //
      // 5. Use vector in query
      // queryVector: queryEmbedding.data[0].embedding
      // ============================================
      
      // Query: Use text query instead of vector
      const results = await collection.aggregate([
        {
          $vectorSearch: {
            index: 'product_auto_embed_idx',
            path: 'description',
            query: { text: 'audio equipment' },
            numCandidates: 100,
            limit: 5,
          }
        },
        {
          $project: {
            description: 1,
          }
        }
      ]).toArray();
      // Results: Retrieved documents with similarity scores
      // [
      //   {
      //     description: 'Wireless headphones with noise canceling',
      //   }
      // ]
      
      
      

      Task Description

      1. Add TypeScript interfaces for search index field definitions (autoEmbed, filter, vector) and their union type. See specs
      2. Extend $vectorSearch stage interface to support query and model parameters, make queryVector optional. See specs
      3. Add test cases for auto-embedding index creation and query syntax
      4. Update documentation with auto-embedding examples

      Acceptance Criteria

      • Driver exposes TypeScript interface for the autoEmbed field type in search index definitions.
      • Driver successfully creates search indexes containing autoEmbed field definitions on supported server versions.
      • Driver exposes TypeScript interface for $vectorSearch queries using query and model parameters.
      • Driver successfully executes $vectorSearch queries with the new query and model syntax on supported server versions.

      References

            Assignee:
            Unassigned
            Reporter:
            TPM Jira Automations Bot
            None
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: