[SERVER-27557] $or and $and in aggregate's $match seem not to work correctly Created: 30/Dec/16  Updated: 10/Mar/23  Resolved: 02/Jan/17

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: 3.4.0, 3.4.1
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Michael Rommel Assignee: David Storch
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File get_sessions_for_id.js     File import.sh     File test.json    
Operating System: ALL
Steps To Reproduce:

sh ./import.sh
mongo test get_sessions_for_id.js

Sprint: Query 2017-01-23
Participants:

 Description   

The use of four filters in a $match clause include documents, which should be omitted.

Unless I have made a stupid mistake, all four match filters should be implicitly "$and"'ed together, however the two "$or" clauses seem to cause a conflict, although they are on different fields. I reproduced this on Ubuntu Linux 16.4 with 3.4.1 and on Mac OS X El Capitan with 3.4.0

const a_types = [
    "Service A1",
    "Service A2"
];
 
const b_types = [
    "Service B1",
    "Service B2"
];
 
db.test_in.aggregate([
  { "$match" :
    {
      "REMOTEID" : 123456,
      "$or" : [
          { "SERVICE" : { "$in" : a_types } },
          { "SERVICE" : { "$in" : b_types } }
      ],
      "$and" : [
          { "BEGINTIME" : { "$gte" : ISODate("2016-05-20T00:00:00+01:00") } },
          { "BEGINTIME" : { "$lt" : ISODate("2016-07-10T00:00:00+01:00") } }
      ],
      "$or" : [
          { "DATAVOLUMEIN"  : { "$gt" : 0 } },
          { "DATAVOLUMEOUT" : { "$gt" : 0 } }
      ]
    }
  },
  { "$out" : "test_out" }
]);
 
db.test_out.find().forEach(printjson);

There are two arrays of values, which define interesting documents that should be included and are given as criteria for the first {{ $or }} filter step.
The next {{ $and }} filter steps reduces the matching documents for a time range.
The last {{ $or }} step should include only documents where either incoming or outgoing data volume has been recorded.

Test data:

{"_id":{"$oid":"58517e5c6222df30f9c588e4"},"SERVICE":"Service A1","BEGINTIME":{"$date":"2016-06-01T21:47:17.000Z"},"ENDTIME":{"$date":"2016-06-01T21:47:52.000Z"},"REMOTEID":123456,"DATAVOLUMEIN":40135,"DATAVOLUMEOUT":26655}
{"_id":{"$oid":"58517f5e6222df30f9d294f5"},"SERVICE":"Service A1","BEGINTIME":{"$date":"2016-06-02T15:15:05.000Z"},"ENDTIME":{"$date":"2016-06-02T15:21:01.000Z"},"REMOTEID":123456,"DATAVOLUMEIN":157422,"DATAVOLUMEOUT":40293}
{"_id":{"$oid":"585184036222df30f9232c58"},"SERVICE":"Service A2","BEGINTIME":{"$date":"2016-06-07T14:03:54.000Z"},"ENDTIME":{"$date":"2016-06-07T14:06:48.000Z"},"REMOTEID":123456,"DATAVOLUMEIN":79908,"DATAVOLUMEOUT":53567}
{"_id":{"$oid":"585184036222df30f9232cb7"},"SERVICE":"Service A1","BEGINTIME":{"$date":"2016-06-07T14:03:58.000Z"},"ENDTIME":{"$date":"2016-06-07T14:04:00.000Z"},"REMOTEID":123456,"DATAVOLUMEIN":513,"DATAVOLUMEOUT":1415}
{"_id":{"$oid":"58518c006222df30f9d9d23e"},"SERVICE":"Service B3","BEGINTIME":{"$date":"2016-06-17T23:00:00.000Z"},"ENDTIME":{"$date":"2016-06-18T16:27:28.000Z"},"REMOTEID":123456,"DATAVOLUMEIN":3028368,"DATAVOLUMEOUT":2998838}
{"_id":{"$oid":"58518e9f6222df30f911d51b"},"SERVICE":"Service A1","BEGINTIME":{"$date":"2016-06-21T14:04:05.000Z"},"ENDTIME":{"$date":"2016-06-21T14:09:22.000Z"},"REMOTEID":123456,"DATAVOLUMEIN":88841,"DATAVOLUMEOUT":47682}
{"_id":{"$oid":"58518f8f6222df30f92c5513"},"SERVICE":"Service B3","BEGINTIME":{"$date":"2016-06-22T23:00:00.000Z"},"ENDTIME":{"$date":"2016-06-23T13:55:11.000Z"},"REMOTEID":123456,"DATAVOLUMEIN":745738,"DATAVOLUMEOUT":742730}
{"_id":{"$oid":"585190036222df30f939268a"},"SERVICE":"Service A2","BEGINTIME":{"$date":"2016-06-23T13:58:16.000Z"},"ENDTIME":{"$date":"2016-06-23T14:04:13.000Z"},"REMOTEID":123456,"DATAVOLUMEIN":80679,"DATAVOLUMEOUT":32105}
{"_id":{"$oid":"58516d4c1e92ee2b277445da"},"SERVICE":"Service A1","BEGINTIME":{"$date":"2016-07-05T14:04:33.000Z"},"ENDTIME":{"$date":"2016-07-05T14:05:23.000Z"},"REMOTEID":123456,"DATAVOLUMEIN":46317,"DATAVOLUMEOUT":23222}

The output shows that "Service B3" documents are included but they are not desired and not included in the b_types array:

{
	"_id" : ObjectId("58518c006222df30f9d9d23e"),
	"SERVICE" : "Service B3",
	"BEGINTIME" : ISODate("2016-06-17T23:00:00Z"),
	"ENDTIME" : ISODate("2016-06-18T16:27:28Z"),
	"REMOTEID" : 123456,
	"DATAVOLUMEIN" : 3028368,
	"DATAVOLUMEOUT" : 2998838
}



 Comments   
Comment by David Storch [ 03/Jan/17 ]

Hi michaelrommel, you're welcome. We've closed the ticket.

Best,
Dave

Comment by Michael Rommel [ 31/Dec/16 ]

Hi @david.storch - yes you are completely right! I did not view the argument as a JS object, but that makes sense. It also makes sense, that my last comment works, because then you can encapsulate the $or into sub-documents. Thank you, you can close the issue, I think I cannot do that myself.

Thank you for the quick reaction, this issue was hard for me to track down yesterday. Once I narrowed it down to the $or, the workaround (=the actual solution) was then easier to come up with, than to debug this on a collection with 175Mio entries...

Happy New Year!

Comment by David Storch [ 30/Dec/16 ]

Hi michaelrommel, I haven't given this a careful look yet, but I did notice that the repro script has a JavaScript object with two attribute names both called $or. JS does not allow duplicate attribute names, so the second will overwrite the first. The server is probably not receiving the first $or clause:

      "$or" : [
          { "SERVICE" : { "$in" : a_types } },
          { "SERVICE" : { "$in" : b_types } }
      ],

Does this explain the behavior you're observing?

Comment by Michael Rommel [ 30/Dec/16 ]

This also works, but not the implicit "$and"ing that should occur.

db.test_in.aggregate([
  { "$match" :
    { "$and" : [
          { "REMOTEID" : 123456 },
          { "$or" :
            [
              { "SERVICE" : { "$in" : a_types } },
              { "SERVICE" : { "$in" : b_types } }
            ]
          },
          { "$and" :
            [
              { "BEGINTIME" : { "$gte" : ISODate("2016-05-20T00:00:00+01:00") } },
              { "BEGINTIME" : { "$lt" : ISODate("2016-07-10T00:00:00+01:00") } }
            ]
          },
          { "$or" :
            [
              { "DATAVOLUMEIN"  : { "$gt" : 0 } },
              { "DATAVOLUMEOUT" : { "$gt" : 0 } }
            ]
          }
      ]
    }
  },
  { "$out" : "test_out" }
]);

Comment by Michael Rommel [ 30/Dec/16 ]

Rewriting the filter this way solves the problem:

db.test_in.aggregate([
  { "$match" :
    {
      "REMOTEID" : 123456,
      "$and" : [
          { "BEGINTIME" : { "$gte" : ISODate("2016-05-20T00:00:00+01:00") } },
          { "BEGINTIME" : { "$lt" : ISODate("2016-07-10T00:00:00+01:00") } },
          { "$or" : [
              { "SERVICE" : { "$in" : a_types } },
              { "SERVICE" : { "$in" : b_types } }
          ] },
          { "$or" : [
              { "DATAVOLUMEIN"  : { "$gt" : 0 } },
              { "DATAVOLUMEOUT" : { "$gt" : 0 } }
          ] }
      ]
    }
  },
  { "$out" : "test_out" }
]);

Generated at Thu Feb 08 04:15:30 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.