Loading...

Type: Bug
Resolution: Unresolved
Priority: Unknown
Fix Version/s: None
Affects Version/s: None
Component/s: AI/ML, LangChain
Labels:
None

Quarter:
- FY27Q2-candidate
Confidence Status:
None

Assigned Teams:

Python Drivers

Documentation Changes Summary:

Hide

1. What would you like to communicate to the user about this feature?
2. Would you like the user to see examples of the syntax and/or executable code and its output?
3. Which versions of the driver/connector does this apply to?

Show
1. What would you like to communicate to the user about this feature? 2. Would you like the user to see examples of the syntax and/or executable code and its output? 3. Which versions of the driver/connector does this apply to?

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Link:
None
Goal Name(s):
None

Context

apoorva.joshi@mongodb.com writes the following:

Hi folks! I have been building a text-to-query agents course with Datacamp (launching soon!) so got a chance to extensively test our database toolkit in LangChain. I have seen this behavior a few times that I think we can improve:During query generation, field names are sometimes not wrapped in quotes. The query_checker tool does not fix this either and as a result, the query execution ultimately fails. See an example here:

================================== Ai Message ==================================
Tool Calls: mongodb_query_checker (call_eSVEmcuccLdQaQvuy1srGRLU) Call ID: call_eSVEmcuccLdQaQvuy1srGRLU Args: query: db.theaters.aggregate([ { "$group": { _id: "$location.address.state", count:
{ "$sum": 1 }
} }, { "$sort":
{ count: -1 }
}, { "$limit": 5 } ])
================================= Tool Message =================================
content='```javascript\ndb.theaters.aggregate([\n { $group: { _id: "$location.address.state", count:
{ $sum: 1 }
} },\n { $sort:
{ count: -1 }
},\n { $limit: 5 }\n])\n```' additional_kwargs={'refusal': None} response_metadata={'token_usage': {'completion_tokens': 56, 'prompt_tokens': 144, 'total_tokens': 200, 'completion_tokens_details':
{'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}
, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-2024-11-20', 'system_fingerprint': 'fp_ee1d74bde0', 'id': 'chatcmpl-CDYzuhpHrOspfJIv4Ejk0Lix0Rqmc', 'service_tier': None, 'prompt_filter_results': [{'prompt_index': 0, 'content_filter_results': {'hate':
{'filtered': False, 'severity': 'safe'}
, 'jailbreak': {'filtered': False, 'detected': False}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}}}], 'finish_reason': 'stop', 'logprobs': None, 'content_filter_results': {'hate':
{'filtered': False, 'severity': 'safe'}
, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}}} id='run--e6f77828-e495-47fa-8884-f51ed9709a22-0' usage_metadata={'input_tokens': 144, 'output_tokens': 56, 'total_tokens': 200, 'input_token_details':
{'audio': 0, 'cache_read': 0}
, 'output_token_details': {'audio': 0, 'reasoning': 0}} ================================== Ai Message ================================== Tool Calls: mongodb_query (call_rPA0Sch1Kl51q5epAfjzcsQL) Call ID: call_rPA0Sch1Kl51q5epAfjzcsQL Args: query: db.theaters.aggregate([ { "$group": { _id: "$location.address.state", count:
{ "$sum": 1 }
} }, { "$sort":
{ count: -1 }
}, { "$limit": 5 } ]) --------------------------------------------------------------------------- NameError Traceback (most recent call last) /usr/local/lib/python3.12/dist-packages/langchain_mongodb/agent_toolkit/database.py in _parse_command(self, command) 233 } --> 234 agg_pipeline = eval(agg_str, eval_globals) 235 if not isinstance(agg_pipeline, list): <string> in <module> NameError: name '_id' is not defined The above exception was the direct cause of the following exception: ValueError Traceback (most recent call last) 14 frames /usr/local/lib/python3.12/dist-packages/langchain_mongodb/agent_toolkit/database.py in _parse_command(self, command) 237 return agg_pipeline 238 except Exception as e: --> 239 raise ValueError(f"Failed to parse aggregation pipeline: {e}") from e 240 241 def run(self, command: str) -> Union[str, Cursor]: ValueError: Failed to parse aggregation pipeline: name '_id' is not defined

I think there's two places to improve this: * Include a line about adding quotes to field and operator names in the query generation system prompt

Add code to check and fix this in the mongodb_checker tool

Definition of done

Implement her two suggested improvements, one in the LLM Prompt, and the other in
QueryMongoDBCheckerTool.

Pitfalls

What should the implementer watch out for? What are the risks?

Details

Description

Context

Definition of done

Pitfalls

Attachments

Activity

People

Dates