[SERVER-42133] $regex does not match ObjectIds Created: 10/Jul/19  Updated: 27/Oct/23  Resolved: 16/Jul/19

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor - P4
Reporter: Grégory NEUT Assignee: Eric Sedor
Resolution: Works as Designed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

I have a collection having an ObjectId field named idConnectedobject and I want to find every document matching the required regex.

The collection format :
 

{{ "_id" : ObjectId("5beac9c68bc93e15056021a9"), "idConnectedobject" : ObjectId("5beac9bd8bc93e1505601c4b"), "idParameter" : ObjectId("e30020000000000000000061"), "value" : "RER", "entryDate" : ISODate("2018-11-13T12:55:30.357Z"), "__v" : 0}

The non-working request :
 

db.ConnectedobjectValue.aggregate([{ $match: { idConnectedobject: RegExp("5d133.*a7") } }])

I've also tried :

db.ConnectedobjectValue.find({ idConnectedobject: RegExp("5d133.*a7") })

 
The working request :

db.ConnectedobjectValue.aggregate([{ $project: { idConnectedobject: { $toString: '$idConnectedobject' } } }, { $match: { idConnectedobject: RegExp("5d133.*a7") } }])

 
I have to cast the field i*dConnectedobject* to a String before I can apply the regex.

My proposal to fix this is either that mongodb returns a warning or an error about using a regex on an ObjectId, or make the regex to work along ObjectId fields.

PS: thank you for all your awesome work

PS2: I've put this into Bogue but maybe it's a proposal Idk

PS3: I've tried this in node.js using mongoose and also in mongo-cli on centos in mongodb version 4.0.4

 

 



 Comments   
Comment by Eric Sedor [ 16/Jul/19 ]

Thanks for clarifying orelsanpls

Packing information into an ObjectId can be useful but somewhat defeats the purpose of rich JSON documents. Another option is to store the separate information in a subdocument which is more JSONic in that it presents rich information in a legible way rather than packing it into a single length of binary data. Storing a subdocument would allow indexes for specific queries on specific fields.

That said, if you can force the information you will query on into the prefix of the ObjectId (rather than splitting it into two parts), you could use a range strategy like the one I mentioned above.

I am going to close this ticket as Working as Designed. For further discussion on how to best represent, index, and query data for your use-case, I encourage you to ask our community by posting on the mongodb-user group or on Stack Overflow with the mongodb tag.

Comment by Grégory NEUT [ 11/Jul/19 ]

Hi, thank you very much for your reply and help!

As you noticed, regexes only match strings. Unfortunately, we can't depend on the same field holding the same data type in any two given documents, so it's unlikely we would want to warn on or error out on a query when a field with an ObjectId is examined against a regex filter.

Your point makes perfect sense. Because mongoose is schema oriented it should be mongoose to display the warning then, I'll open a ticket on their GitHub 

 

About my use case, here is the real context  (I did simplified it before in order to make it focused about the $regex issue and not my personal case) :

I have three collection :

  • Connectedobject 
  • ConnectedobjectType
  • ConnectedobjectValue

 

The collection ConnectedobjectType is holding the list of parameters that defines a Connectedobject, in the following way :

 

{
  _id: ObjectId('eeeee0000000000000000001'),
 
  params: [{
     _id: ObjectId('b30010000000000000000201'),
     name: 'temperature',
  }, {      
   _id: ObjectId('b30010000000000000000201'),
    name: 'client identifier',
  } ... ]
}

 

Every parameter have an ObjectId that match a special pattern :

  > b30010000000000000000201

Category of the parameter

Type of the connected object

Parameter number

 

 

We used the ObjectId to be able to identify the parameter. This ObjectId is used when we communicate the parameter to other systems (from our node.js/mongoDb platform to our Java/SQL other program, or to the front end).

The collection ConnectedobjectValue holds the values we get from the connected object, like :

{
  _id: ObjectId('5beac9c68bc93e15056021a9'),
  idConnectedobject: ObjectId('5beac9bd8bc93e1505601c4b'),
  idParameter: ObjectId('b30010000000000000000201'),
  value: '17',
  entryDate: '2018-11-13T12:55:30.357+00:00',
}

 

There is a direct link from the value to the parameter it represents. In the given example, it is a temperature value.

 

 

 

Here is my need :

 

 

I need to extract from database the values corresponding to a specific range of connectedobject and parameters.

 

Example : I want the parameters temperature, screenshot, luminosity and serial number which have the following parameters values :

 

ObjectId Parameter name
bX0010000000000000000201 Temperature
aX0000000000000000000007 Screenshot
cX0020000000000000000002 Luminosity
cX0020000000000000000003 Serial number

 

I'll do :

 

db.ConnectedobjectValue.aggregate([{ 
   $project: { 
      idParameter: { 
          $toString: '$idParameter',
      },
   },
}, {
   $match: {
     $or: [{
       idParameter: RegExp("^b.*201$"),
     }, {
       idParameter: RegExp("^a.*007$"),
     }, { 
       idParameter: RegExp("^c.*(002|003)$"),
     }],
   },
}])

 

 

Unfortunately I don't think that I could use of :

 

{$match: {idParameter: {$gt: ObjectId("5d1330000000000000000000"), $lt: ObjectId("5d1340000000000000000000")}}}

 

 

Because I have an API function taking the regex from the front end about which parameters to get, example :

 

const requestParameters: any = {
 idsLanguage: [
    this.languageId,
 ],
 doWeWantParametersInformationsReduce: [
   // Disconnection
   // idCodeCustomer
   // idCodeJIRA
   // Location
   // Location precision
   '^e.*(25|68|67|11|12)$',
   // screenshot
   '^a.*007$',
  ],
  doWeWantToIgnoreNonActiveBreakdown: true,
  doWeWantTypeInformations: true,
  doWeWantStatusInformations: true,
  doWeWantCompaniesInformations: false,
  doWeWantParametersInformations: true,
  doWeWantDisponibilityInformations: false,
  doWeWantLocationStructureInformations: true,
  doWeWantConnectedobjectGroupInformations: false,
};

 

 

Thank you for your energy 

 

 

Comment by Eric Sedor [ 10/Jul/19 ]

And thank you for your kind words!

Comment by Eric Sedor [ 10/Jul/19 ]

Hi orelsanpls. We're glad you found a method to make this possible. I wanted to suggest that you could improve the resource use of the aggregation you've devised by adding an initial match clause like:

{$match: {idConnectedobject: {$gt: ObjectId("5d1330000000000000000000"), $lt: ObjectId("5d1340000000000000000000")}}}

That is: construct ObjectIds using the prefix of the regex, to narrow the number of documents brought into the aggregation pipeline.

As you noticed, regexes only match strings. Unfortunately, we can't depend on the same field holding the same data type in any two given documents, so it's unlikely we would want to warn on or error out on a query when a field with an ObjectId is examined against a regex filter.

That said, we'd like to understand more about your use-case to reason about other ways we might support operations like this. Can you tell us what meaning that matches on a regex like "5d133.*a7" have in your system?

Generated at Thu Feb 08 04:59:40 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.