Context
apoorva.joshi@mongodb.com writes the following:
Hi folks! I have been building a text-to-query agents course with Datacamp (launching soon!) so got a chance to extensively test our database toolkit in LangChain. I have seen this behavior a few times that I think we can improve:During query generation, field names are sometimes not wrapped in quotes. The query_checker tool does not fix this either and as a result, the query execution ultimately fails. See an example here:
================================== Ai Message ================================== Tool Calls: mongodb_query_checker (call_eSVEmcuccLdQaQvuy1srGRLU) Call ID: call_eSVEmcuccLdQaQvuy1srGRLU Args: query: db.theaters.aggregate([ { "$group": { _id: "$location.address.state", count: { "$sum": 1 } } }, { "$sort": { count: -1 } }, { "$limit": 5 } ]) ================================= Tool Message ================================= content='```javascript\ndb.theaters.aggregate([\n { $group: { _id: "$location.address.state", count: { $sum: 1 } } },\n { $sort: { count: -1 } },\n { $limit: 5 }\n])\n```' additional_kwargs={'refusal': None} response_metadata={'token_usage': {'completion_tokens': 56, 'prompt_tokens': 144, 'total_tokens': 200, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0} , 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-2024-11-20', 'system_fingerprint': 'fp_ee1d74bde0', 'id': 'chatcmpl-CDYzuhpHrOspfJIv4Ejk0Lix0Rqmc', 'service_tier': None, 'prompt_filter_results': [{'prompt_index': 0, 'content_filter_results': {'hate': {'filtered': False, 'severity': 'safe'} , 'jailbreak': {'filtered': False, 'detected': False}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}}}], 'finish_reason': 'stop', 'logprobs': None, 'content_filter_results': {'hate': {'filtered': False, 'severity': 'safe'} , 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}}} id='run--e6f77828-e495-47fa-8884-f51ed9709a22-0' usage_metadata={'input_tokens': 144, 'output_tokens': 56, 'total_tokens': 200, 'input_token_details': {'audio': 0, 'cache_read': 0} , 'output_token_details': {'audio': 0, 'reasoning': 0}} ================================== Ai Message ================================== Tool Calls: mongodb_query (call_rPA0Sch1Kl51q5epAfjzcsQL) Call ID: call_rPA0Sch1Kl51q5epAfjzcsQL Args: query: db.theaters.aggregate([ { "$group": { _id: "$location.address.state", count: { "$sum": 1 } } }, { "$sort": { count: -1 } }, { "$limit": 5 } ]) --------------------------------------------------------------------------- NameError Traceback (most recent call last) /usr/local/lib/python3.12/dist-packages/langchain_mongodb/agent_toolkit/database.py in _parse_command(self, command) 233 } --> 234 agg_pipeline = eval(agg_str, eval_globals) 235 if not isinstance(agg_pipeline, list): <string> in <module> NameError: name '_id' is not defined The above exception was the direct cause of the following exception: ValueError Traceback (most recent call last) 14 frames /usr/local/lib/python3.12/dist-packages/langchain_mongodb/agent_toolkit/database.py in _parse_command(self, command) 237 return agg_pipeline 238 except Exception as e: --> 239 raise ValueError(f"Failed to parse aggregation pipeline: {e}") from e 240 241 def run(self, command: str) -> Union[str, Cursor]: ValueError: Failed to parse aggregation pipeline: name '_id' is not defined
I think there's two places to improve this: * Include a line about adding quotes to field and operator names in the query generation system prompt
- Add code to check and fix this in the mongodb_checker tool
Definition of done
Implement her two suggested improvements, one in the LLM Prompt, and the other in
QueryMongoDBCheckerTool.
Pitfalls
What should the implementer watch out for? What are the risks?