Update LangChain Vector Search Initialization for better tracking

XMLWordPrintableJSON

    • Type: Task
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Python Drivers
    • None
    • None
    • None
    • None
    • None
    • None

      Context

      Currently the way to initialize MongoDB Atlas Vector Search is via passing an initialized collection 

      *https://github.com/langchain-ai/langchain-mongodb/blob/main/libs/langchain-mongodb/langchain_mongodb/vectorstores.py#L207*

      vector_store = MongoDBAtlasVectorSearch(
       collection=MONGODB_COLLECTION,
       embedding=embeddings,
       index_name=ATLAS_VECTOR_SEARCH_INDEX_NAME,
       relevance_score_fn="cosine",
      )

      This requires the user to do a bunch of pre-work in initializing the collection, which involves steps like 

      # initialize MongoDB python client
      client = MongoClient(MONGODB_ATLAS_CLUSTER_URI)
      
      DB_NAME = "langchain_test_db"
      COLLECTION_NAME = "langchain_test_vectorstores"
      ATLAS_VECTOR_SEARCH_INDEX_NAME = "langchain-test-index-vectorstores"
      
      MONGODB_COLLECTION = client[DB_NAME][COLLECTION_NAME]

      If we can modify the MongoDBAtlasVectorSearch to also allow for passing the required params (collection_name, db_name, cluster_uri) we can initialize the client ourselves. 

      vector_store = MongoDBAtlasVectorSearch(
       collection=MONGODB_COLLECTION, :: OPTIONAL 
       collection_name= XXX
       db_name= XXX
       connection_string = XXX
      embedding=embeddings, 
      index_name=ATLAS_VECTOR_SEARCH_INDEX_NAME, 
      relevance_score_fn="cosine", )
      

      We want to maintain backward compatibility, and don't want to create a breaking change.

      To ideate on the options: 

      • The earlier 'collection' was a required field, which will can be made optional? The new parameters can be optional as well
      • Make another init method that can provide the desired param (and not have the "collection" param)?

       

      Definition of done

      What must be done to consider the task complete?

      Pitfalls

      What should the implementer watch out for? What are the risks?

            Assignee:
            Unassigned
            Reporter:
            Prakul Agarwal
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: