-
Type:
Bug
-
Resolution: Fixed
-
Priority:
Critical - P2
-
Affects Version/s: None
-
Component/s: None
-
None
-
Atlas SQL
-
Needed
-
Fixes the incorrect semantics of OR conditions in WHERE predicates which was filtering out documents for which one of the operand of the OR condition was NULL or MISSING.
ISSUE DESCRIPTION AND IMPACT
We have identified an issue in the SQL interface that changes the expected three-valued SQL semantics for OR and can lead to incorrect results for SQL queries with the following characteristics:
- The datasource queried contains a nullable field. It can be a simple datasource (collection, array or derived table) or a compound datasource (Join, Unwind or Flatten)
- The datasource queried contains NULL or missing values for the nullable field
- The SQL query, either hand-written or generated by a BI tool:
- Contains a WHERE clause with an OR condition
- One predicate of the OR condition evaluates to TRUE
- The other predicate resolves to UNKNOWN when the nullable field from the datasource evaluates to UNKNOWN (NULL or MISSING)
Rows for which all the conditions are met are evaluated as UNKNOWN instead of TRUE and are missing from the resultset.
DIAGNOSIS AND AFFECTED VERSIONS
The following versions of the SQL interface components are affected:
- JDBC 3.0.4 and earlier versions
- ODBC 2.0.3 and earlier versions
- MongoSQL 1.8.3 and earlier versions (deployed on Atlas Data Federation)
Only customers with business intelligence dashboards or native queries meeting ALL the criteria for the bug to occur are impacted.
To evaluate if a query is impacted, users can:
- Verify that there is a WHERE clause containing an OR condition in the SQL query
- Break the query around the OR condition into 2 queries. For example, break
SELECT {selectClause} from … WHERE condA OR condBinto
select count(*) from (SELECT {selectClause}, (condA) as condAField from … WHERE condB) as condAQuery where condAField is NULL; select count(*) from (SELECT {selectClause}, (condB) as condBField from … WHERE condA) as condBQuery where condBField is NULL;
If the count returned by either of the two queries is greater than zero, the original query is affected. The total number of missing rows is determined by summing the counts from both queries.
For example:
Break
SELECT _id, rated, metacritic FROM movies WHERE rated = 'UNRATED' OR metacritic >= 98
into
select count(*) from (select `_id`, rated, metacritic, (rated = 'UNRATED') as condAField from movies where metacritic >= 98) as condAQuery where condAField is NULL; select count(*) from (select `_id`, rated, metacritic, (metacritic >= 98) as condBField from movies where rated = 'UNRATED') as condBQuery where condBField is NULL;
REMEDIATION AND WORKAROUNDS
For EA SQL users, the first step is to upgrade their ODBC or JDBC driver(s) to the latest version available on the download center.
All users then need to refresh impacted dashboards and update native SQL query results by re-executing impacted SQL queries.
For custom SQL queries, users can use UNION instead to rewrite OR condition.
For example,
Rewrite
SELECT _id, rated, metacritic
FROM movies
WHERE rated = 'UNRATED' OR metacritic >= 98
to
SELECT _id, rated, metacritic
FROM movies
WHERE rated = 'UNRATED'
UNION
SELECT _id, rated, metacritic
FROM movies
WHERE metacritic >= 98
OTHER AFFECTED TOOLS
We are not aware of any other affected tool.
—-----------------------------------------------------
Original description
Problem Statement/Rationale
When translating SQL queries containing an OR condition, MongoSQL adds null-filtering predicates for each operand (e.g. $gt: [ "$a", null ], $gt: [ "$b", null ]).
This effectively filters out documents where either operand is null, which changes the expected three-valued SQL semantics for OR.
In SQL 3VL, TRUE OR NULL should evaluate to TRUE, but with these additional null filters, the row is removed before the $or is evaluated, resulting in unexpected behavior.
This behaviour appears to originate from match_null_filtering optimizer, which applies the same null-filtering strategy to all divergent operators. While this works correctly for comparisons and AND operations, it alters the semantics of OR. In the mql_null_semantics_diverge function (lines 316-327), OR is currently grouped with comparison operators and AND.
Steps to Reproduce
Run in mongosql-cli:
./target/debug/mongosql-cli "SELECT foo._id, foo.b, foo.a FROM foo WHERE b > 10 or a = 'B'"
Observe that the generated aggregation pipeline includes (simplified query):
{
$match: {
$expr: {
$and: [
{
$or: [
{ $gt: ["$b", { $literal: 10 }] },
{ $eq: ["$a", { $literal: "B" }] }
]
},
{ $gt: ["$a", { $literal: null }] },
{ $gt: ["$b", { $literal: null }] }
]
}
}
}
which filter out null values before evaluating $or.
Full example repro with the data and execution enviroment: Mongo playground
Data:
[
{ "a": "B", "b": 11, "name": "1" },
{ "a": null, "b": 11, "name": "2" },
{ "a": "B", "b": null, "name": "3" },
{ "a": null, "b": null, "name": "4" }
]
Expected Results
Documents where either operand of the OR condition is true should be included, even if the other operand is NULL:
[
{ "": { "a": "B", "b": 11, "name": "1" } },
{ "": { "a": null, "b": 11, "name": "2" } },
{ "": { "a": "B", "b": null, "name": "3" } }
]
Actual Results
[
{ "": { "a": "B", "b": 11, "name": "1" } }
]