[SERVER-82866] Some vector values are getting rounded to the nearest int, causing a BSON type error at $vectorSearch query time Created: 30/Oct/23  Updated: 22/Nov/23

Status: Needs Verification
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Henry Weller Assignee: Henry Weller
Resolution: Unresolved Votes: 0
Labels: internal-user
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File Screenshot from 2023-11-21 08-46-56.png    
Assigned Teams:
Server Triage
Participants:

 Description   

What problem are you facing?

Floating point numbers within vector embedding being converted to integers, causing BSON type issue at query time when executing $vectorSearch.

$vectorSearch:MongoServerError: BSON field '$vectorSearch.queryVector.117' is the wrong type 'int', expected type 'double'

{{}}

Manually replacing each instance of -1, 0, or 1 with a floating point number of similar value resolves the problem, which suggests that there is some implicit type conversion happening on the node side.

What driver and relevant dependency versions are you using?

 

{{}}

node --version
v20.5.1cat package.json 
{
"dependencies": {
"dotenv": "^16.1.3",
"mongodb": "^5.7.0",
"graphviz": "^0.0.9",
"canvas": "^2.11.2",
"@aws-sdk/client-bedrock-runtime": "^3.433.0"
}
}MongoDB 7.0.2 (on Atlas M10)AWS Bedrock Embeddings Model:  amazon.titan-embed-text-v1

Steps to reproduce?

Execute $vectorSearch on embeddings produced from the linked embedding model.



 Comments   
Comment by Paul Done [ 22/Nov/23 ]

Reproducer provided 

Comment by Henry Weller [ 20/Nov/23 ]

Reassigning to Paul Done to provide more details on how he was able to produce this for the sake of those following this ticket

Comment by Paul Done [ 15/Nov/23 ]

henry.weller@mongodb.com thoughts on this? ^

Comment by Paul Done [ 02/Nov/23 ]

henry.weller@mongodb.com this will be a problem for our users of $vectorSearch because when they receive the embeddings from the LLM for their "query string" they will have to write code that walks all 1536 elements in the array and BSON-ifies each number to BSON Doubles. This seems alike a BIG problem for our Vector Search story.

Comment by Durran Jordan [ 31/Oct/23 ]

Hi henry.weller@mongodb.com. The Node driver does not perform any special conversion of values for specific aggregation stages, so the application must force the correct types. -1, 0, 1 will always be assumed to be int32 so the app must force doubles by making them floating point numbers or specifically using the Double constructor. Example:

import { Double } from 'bson';
const value = new Double(1);

Comment by PM Bot [ 30/Oct/23 ]

Hi henry.weller@mongodb.com, thank you for submitting this ticket! The team is going to investigate and reply back with more info after the investigation is completed.

Generated at Thu Feb 08 06:50:30 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.