Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-43989

Regex query not matching correctly on compound index

    • Type: Icon: Bug Bug
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 3.6.14
    • Component/s: Index Maintenance, Querying
    • Labels:
      None
    • ALL
    • Hide

      *We have a *Measurements collection that consists of documents with the following form:
       

      {
       source: <number>,
       sourceId: <string>,
       timestamp: <date>,
       data: { readingType: <string>, value: <number> }
      }

      Also we have a compound index with the following fields:

      {
       "sourceId" : 1,
       "source" : 1,
       "timestamp" : 1,
       "data.readingType" : 1
      }

       
      Issuing a query with a direct match on the sourceId filter with the value 0013A200418A03F7, yields the expected results and the query completes in ms.

      However if the query is run using a regex expression:

      db.getCollection('Measurement').find({sourceId: /0013A200418A03F7/});

      it takes about 50 seconds to complete on a ~6 million document database.
      The query executionStats results file are attached as compound_regex.json

      Running the query while restricting the start of the regular expression as this

      db.getCollection('Measurement').find({sourceId: /^0013A200418A03F7/});

      filters out the documents properly using the index and the query executes quickly.
      The executionStats results are attached as compound_regex_restrict.json

      Finally just for proof of concept, creating a single field index on the sourceId field, and running the same problematic query again

      db.getCollection('Measurement').find({sourceId: /0013A200418A03F7/});

      now properly executs as it should.
      The executionStats results are attached as single_field.json

      Show
      * We have a *Measurements collection that consists of documents with the following form:   { source: <number>, sourceId: <string>, timestamp: <date>, data: { readingType: <string>, value: <number> } } Also we have a  compound index with the following fields: { "sourceId" : 1, "source" : 1, "timestamp" : 1, "data.readingType" : 1 }   Issuing a query with a direct match on the sourceId  filter with the value  0013A200418A03F7,  yields the expected results and the query completes in ms. However if the query is run using a regex expression: db.getCollection( 'Measurement' ).find({sourceId: /0013A200418A03F7/}); it takes about 50 seconds to complete on a ~6 million document database. The query executionStats results file are attached as compound_regex.json Running the query while restricting the start of the regular expression as this db.getCollection( 'Measurement' ).find({sourceId: /^0013A200418A03F7/}); filters out the documents properly using the index and the query executes quickly. The executionStats results are attached as compound_regex_restrict.json Finally just for proof of concept, creating a single field index on the sourceId field, and running the same problematic query again db.getCollection( 'Measurement' ).find({sourceId: /0013A200418A03F7/}); now properly executs as it should. The executionStats results are attached as single_field.json

      Hi  there,

      The problem is that when running a reqex query on a field indexed by a compound index, the IXSCAN stage incorrectly passes all the documents in the collection onto the next FETCH stage, which has to filter them all over again, resulting in a really slow execution of a what had to be a very fast query. 

      The strange this is that if we alter the regex query with a ^ or & Regular expression restricting character it works correctly.

      Also, if you have a single index using the queried field, issuing the same regex query , works as expected.

      Please note that the field is first on the compound index, so it should behave as a single one.

      Example details given below as steps to reproduce

       

        1. compound_regex_restrict.json
          4 kB
        2. compound_regex.json
          5 kB
        3. non-multikey.json
          4 kB
        4. single_field.json
          5 kB

            Assignee:
            eric.sedor@mongodb.com Eric Sedor
            Reporter:
            d.halatsis@centaur.ag Dimitris Xalatsis
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: