-
Type:
New Feature
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
Not Needed
-
None
-
Green
-
None
-
None
-
None
April 15th is the desired "soft" date to align with the Public Preview launch of Auto Embedding. Not a blocker.
Context
MongoDB Atlas is rolling out a new Auto Embedding Index feature that automates vector generation for text fields, eliminating the need for external embedding pipelines. See NODE-7245
LangChain.js is a modular SDK that abstracts the complexity of LLM orchestration. It integrates with MongoDB as a Vector Store, letting you run semantic searches inside your existing Atlas clusters.
We want LangChainJS to support the new Auto Embedding feature to provide developers with an even smoother vector search experience.
Sample Usage
Current user flow:
// 0. Assuming a Vector Search index is already created in MongoDB Atlas // 1. Configure your MongoDB connection const collection = new MongoClient(process.env.MONGODB_URI!) .db("my_ecommerce_db") .collection("products"); // 2. Instantiate a VectorStore const vectorStore = new MongoDBAtlasVectorSearch( new VoyageEmbeddings({ modelName: "voyage-4-large", apiKey: process.env.VOYAGEAI_API_KEY!, }), { collection }); // 3. Add documents to the VectorStore await vectorStore.addDocuments([ /* Your documents here */ ]); // 4. Perform a similarity search const results = await vectorStore.similaritySearch("Your product text search query"); console.log(results);
After:
// The other steps stay the same... // 2. Instantiate a VectorStore . // The first param (`embeddings`) will be optional now. const vectorStore = new MongoDBAtlasVectorSearch({ collection });
Expected User Experience
We want to update the methods inside the class MongoDBAtlasVectorSearch to reflect the changes from the new Auto Embedding feature. See the annotated class definition below:
declare class MongoDBAtlasVectorSearch extends VectorStore { // ----------------------------------------------- // The following methods/constructors should be updated. // The `embeddings` parameter will be optional and // Auto embeddings will be used when not provided. // ----------------------------------------------- constructor( embeddings: EmbeddingsInterface, // Optional args: MongoDBAtlasVectorSearchLibArgs, ); static fromTexts( texts: string[], metadatas: object[] | object, embeddings: EmbeddingsInterface, // Optional dbConfig: MongoDBAtlasVectorSearchLibArgs & { ids?: string[]; }, ): Promise<MongoDBAtlasVectorSearch>; static fromDocuments( docs: Document[], embeddings: EmbeddingsInterface, // Optional dbConfig: MongoDBAtlasVectorSearchLibArgs & { ids?: string[]; }, ): Promise<MongoDBAtlasVectorSearch>; // ------------------------------------------------------ // The following methods should fail when auto embeddings // are used since they would be redundant/conflicting // but are still required by the `VectorStore` interface. // ------------------------------------------------------ addVectors( vectors: number[][], documents: Document[], options?: { ids?: string[] }, ): Promise<any[]>; similaritySearchVectorWithScore( query: number[], k: number, filter?: MongoDBAtlasFilter, ): Promise<[Document, number][]>; // -------------------------------------------------------- // The following methods should keep working as expected. // They might be using auto embeddings under the hood now. // -------------------------------------------------------- addDocuments( documents: Document[], options?: { ids?: string[] }, ): Promise<any[]>; delete(params: { ids: any[] }): Promise<void>; static fixArrayPrecision(array: number[]): number[]; similaritySearch( query: string, k?: number, filter?: this["FilterType"] | undefined, _callbacks?: Callbacks | undefined, ): Promise<DocumentInterface[]>; similaritySearchWithScore( query: string, k?: number, filter?: this["FilterType"] | undefined, _callbacks?: Callbacks | undefined, ): Promise<[DocumentInterface, number][]>; asRetriever( kOrFields?: number | Partial<VectorStoreRetrieverInput<this>>, filter?: this["FilterType"], callbacks?: Callbacks, tags?: string[], metadata?: Record<string, unknown>, verbose?: boolean, ): VectorStoreRetriever<this>; maxMarginalRelevanceSearch( query: string, options: MaxMarginalRelevanceSearchOptions<this["FilterType"]> ): Promise<Document[]>; }
References
- Public Docs:
- Internal Docs:
Use Case
As a... Node Driver and LangChainJs user
I want... to use the Auto Embedding Index feature that automates vector generation for text fields
So that... I can eliminate the need for external embedding pipelines
User Experience
- VoyageEmbeddings are optional and the user does not have to construct them manually
- If the user constructs their own embeddings and passes them in, that experience will not change
- If the user doesn't pass in embeddings, that will now work
- If the user doesn't pass in embeddings but calls `addVectors` or `similaritySearchVectorWithScore`, those methods will now throw exceptions
Dependencies
- MongoDB must be version 8.2+ Community Edition
- Unsure if we need to make an update to Node Driver, verify as part of work
Risks/Unknowns
- This is a cross-driver alignment ticket
- A linked ticket is going to update our documentation on this subject [TICKET-TBD]
Acceptance Criteria
Implementation Requirements
- Make embeddings optional
- Verify that existing tests pass
- When optional embeddings are used, verify that `addVectors` or `similaritySearchVectorWithScore` throw an appropriate exception
- Add new tests that don't create embeddings but perform all the same actions as the existing tests
- Open a PR against https://github.com/langchain-ai/langchainjs/ to make embeddings optional
- Example update we made in this repo in the past: https://github.com/langchain-ai/langchainjs/commit/ac07cc746d480382526757921b1b7a4d68e754e1
Testing Requirements
- Existing LangChainJs tests are all passing
- New LangChainJs tests verify that embeddings are optional and the behavior is the same as the existing tests
- Add new tests to verify that the addVectors and similaritySearchVectorWithScore throw if embeddings are used
Documentation Requirements
- Update MongoDB tutorials/examples that talk about embeddings: DOCSP-56730
- Update LangChainJs tutorials for same: NODE-7507
Follow Up Requirements
- None