[SERVER-63654] The $regex operation accepts invalid expressions when used inside an aggregate command Created: 15/Feb/22  Updated: 24/Feb/22  Resolved: 24/Feb/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 3.6.23, 4.2.17
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Riccardo Cardin Assignee: Eric Sedor
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-39698 Make $regexFind, $regexFindAll, and $... Closed
Operating System: ALL
Participants:

 Description   

TheĀ $regex operation accepts invalid expressions when used inside an aggregate command. For example, the regular expression ^+somestring is not valid. However, Mongo executes the following command without any error:

{
  "aggregate": "PayeeReputationByPayeeAndLegalEntity",
  "pipeline": [
    {
      "$match": {
        "_id": {
          "$regex": "^+somestring",
          "$options": ""
        }
      }
    }
  ],
  "explain": true
}

The problem is that the query plan for the above aggregation is the following:

{
  "waitedMS": NumberLong(0),
  "stages": [
    {
      "$cursor": {
        "query": {
          "_id": {
            "$regex": "^+somestring",
            "$options": ""
          }
        },
        "queryPlanner": {
          "plannerVersion": 1,
          "namespace": "db.some-collection"
          "indexFilterSet": false,
          "parsedQuery": {
            "_id": /^+somestring/
          },
          "winningPlan": {
            "stage": "FETCH",
            "inputStage": {
              "stage": "IXSCAN",
              "filter": {
                "_id": /^+somestring/
              },
              "keyPattern": {
                "_id": 1
              },
              "indexName": "_id_",
              "isMultiKey": false,
              "isUnique": true,
              "isSparse": false,
              "isPartial": false,
              "indexVersion": 1,
              "direction": "forward",
              "indexBounds": {
                "_id": [
                  "[\"\", {})",
                  "[/^+somestring/, /^+somestring/]"
                ]
              }
            }
          },
          "rejectedPlans": []
        }
      }
    }
  ],
  "ok": 1.0
}

As we can notice, the indexBounds information is completely nonsense:

"indexBounds": {
  "_id": [
    "[\"\", {})",
    "[/^+somestring/, /^+somestring/]"
  ]
}

The above bound makes the database perform a full index-scan. Adding an escape character in front of the + character doesn't change the situation. So, the expression ^+somestring produces the same effects.

The real problem is that the above command is the command generated by Spring Data MongoDB. In this case, there is no workaround

I didn't try to reproduce it on version 5.x of Mongo, but I'm quite sure that the bug persists.



 Comments   
Comment by Eric Sedor [ 24/Feb/22 ]

Hi rcardin, I'm glad you found a version where this is fixed. I'm not fully certain, but it could be this work was done in SERVER-39695 (per the deduplication of SERVER-39698)

Comment by Riccardo Cardin [ 16/Feb/22 ]

I continued the analysis, and it seems that the issue was resolved somewhere between version 4.0.4 and version 4.2.x.

Comment by Riccardo Cardin [ 15/Feb/22 ]

Maybe, the use cases I've mentioned fall under the same circumstances of the issue https://jira.mongodb.org/browse/SERVER-16622 ?

Comment by Riccardo Cardin [ 15/Feb/22 ]

I add that every malformed regex such as some+string produces the same wrong index bounds.

Comment by Riccardo Cardin [ 15/Feb/22 ]

I've tried to escape the expression using two backslash characters. In this case, Mongo starts treating the special charater has escaped. In all the other cases, the problem remains.

Generated at Thu Feb 08 05:58:19 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.