Update LangChain Vector Search Initialization for better tracking

XMLWordPrintableJSON

    • Type: Task
    • Resolution: Won't Do
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: AI/ML, LangChain
    • Python Drivers
    • Hide

      1. What would you like to communicate to the user about this feature?
      2. Would you like the user to see examples of the syntax and/or executable code and its output?
      3. Which versions of the driver/connector does this apply to?

      Show
      1. What would you like to communicate to the user about this feature? 2. Would you like the user to see examples of the syntax and/or executable code and its output? 3. Which versions of the driver/connector does this apply to?
    • None
    • None
    • None
    • None
    • None
    • None

      Context

      Currently the way to initialize MongoDB Atlas Vector Search is via passing an initialized collection 

      *https://github.com/langchain-ai/langchain-mongodb/blob/main/libs/langchain-mongodb/langchain_mongodb/vectorstores.py#L207*

      vector_store = MongoDBAtlasVectorSearch(
       collection=MONGODB_COLLECTION,
       embedding=embeddings,
       index_name=ATLAS_VECTOR_SEARCH_INDEX_NAME,
       relevance_score_fn="cosine",
      )

      This requires the user to do a bunch of pre-work in initializing the collection, which involves steps like 

      # initialize MongoDB python client
      client = MongoClient(MONGODB_ATLAS_CLUSTER_URI)
      
      DB_NAME = "langchain_test_db"
      COLLECTION_NAME = "langchain_test_vectorstores"
      ATLAS_VECTOR_SEARCH_INDEX_NAME = "langchain-test-index-vectorstores"
      
      MONGODB_COLLECTION = client[DB_NAME][COLLECTION_NAME]

      If we can modify the MongoDBAtlasVectorSearch to also allow for passing the required params (collection_name, db_name, cluster_uri) we can initialize the client ourselves. 

      vector_store = MongoDBAtlasVectorSearch(
       collection=MONGODB_COLLECTION, :: OPTIONAL 
       collection_name= XXX
       db_name= XXX
       connection_string = XXX
      embedding=embeddings, 
      index_name=ATLAS_VECTOR_SEARCH_INDEX_NAME, 
      relevance_score_fn="cosine", )
      

      We want to maintain backward compatibility, and don't want to create a breaking change.

      To ideate on the options: 

      • The earlier 'collection' was a required field, which will can be made optional? The new parameters can be optional as well
      • Make another init method that can provide the desired param (and not have the "collection" param)?

       

      Definition of done

      What must be done to consider the task complete?

      Pitfalls

      What should the implementer watch out for? What are the risks?

              Assignee:
              Unassigned
              Reporter:
              Prakul Agarwal
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: