Three-valued SQL semantics for OR and can lead to incorrect results

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Fixed
    • Priority: Critical - P2
    • libv-1.0.2, mongosql-rs-1.8.4
    • Affects Version/s: None
    • Component/s: None
    • None
    • Atlas SQL
    • Needed
    • Fixes the incorrect semantics of OR conditions in WHERE predicates which was filtering out documents for which one of the operand of the OR condition was NULL or MISSING.

      Issue Status as of February 12

      ISSUE DESCRIPTION AND IMPACT
      We have identified an issue in the SQL interface that changes the expected three-valued SQL semantics for OR and can lead to incorrect results for SQL queries with the following characteristics:

      • The datasource queried contains a nullable field. It can be a simple datasource (collection, array or derived table) or a compound datasource (Join, Unwind or Flatten)
      • The datasource queried contains NULL or missing values for the nullable field
      • The SQL query, either hand-written or generated by a BI tool:
        • Contains a WHERE clause with an OR condition
        • One predicate of the OR condition evaluates to TRUE
        • The other predicate resolves to UNKNOWN when the nullable field from the datasource evaluates to UNKNOWN (NULL or MISSING)

      Rows for which all the conditions are met are evaluated as UNKNOWN instead of TRUE and are missing from the resultset.

      DIAGNOSIS AND AFFECTED VERSIONS

      The following versions of the SQL interface components are affected:

      • JDBC 3.0.4 and earlier versions
      • ODBC 2.0.3 and earlier versions
      • MongoSQL 1.8.3 and earlier versions (deployed on Atlas Data Federation)

      Only customers with business intelligence dashboards or native queries meeting ALL the criteria for the bug to occur are impacted.

      To evaluate if a query is impacted, users can:

      1. Verify that there is a WHERE clause containing an OR condition in the SQL query
      2. Break the query around the OR condition into 2 queries. For example, break
        SELECT {selectClause} from … WHERE condA OR condB
        

        into

        select count(*) from (SELECT {selectClause}, (condA) as condAField from … WHERE condB) as condAQuery where condAField is NULL;
        select count(*) from (SELECT {selectClause}, (condB) as condBField from … WHERE condA) as condBQuery where condBField is NULL;
        

      If the count returned by either of the two queries is greater than zero, the original query is affected. The total number of missing rows is determined by summing the counts from both queries.

      For example:
      Break

      SELECT _id, rated, metacritic FROM movies WHERE rated = 'UNRATED' OR metacritic >= 98 
      

      into

      select count(*) from (select `_id`, rated, metacritic, (rated = 'UNRATED') as condAField from movies where metacritic >= 98) as condAQuery where condAField is NULL;
      select count(*) from (select `_id`, rated, metacritic, (metacritic >= 98) as condBField from movies where rated = 'UNRATED') as condBQuery where condBField is NULL;
      

      REMEDIATION AND WORKAROUNDS

      For EA SQL users, the first step is to upgrade their ODBC or JDBC driver(s) to the latest version available on the download center.
      All users then need to refresh impacted dashboards and update native SQL query results by re-executing impacted SQL queries.

      For custom SQL queries, users can use UNION instead to rewrite OR condition.
      For example,
      Rewrite

      SELECT _id, rated, metacritic
      FROM movies
      WHERE rated = 'UNRATED' OR metacritic >= 98
      

      to

      SELECT _id, rated, metacritic
      FROM movies
      WHERE rated = 'UNRATED'
      UNION
      SELECT _id, rated, metacritic
      FROM movies
      WHERE metacritic >= 98
      

      OTHER AFFECTED TOOLS
      We are not aware of any other affected tool.

      —-----------------------------------------------------

      Original description

      Problem Statement/Rationale
      When translating SQL queries containing an OR condition, MongoSQL adds null-filtering predicates for each operand (e.g. $gt: [ "$a", null ], $gt: [ "$b", null ]).
      This effectively filters out documents where either operand is null, which changes the expected three-valued SQL semantics for OR.

      In SQL 3VL, TRUE OR NULL should evaluate to TRUE, but with these additional null filters, the row is removed before the $or is evaluated, resulting in unexpected behavior.

      This behaviour appears to originate from match_null_filtering optimizer, which applies the same null-filtering strategy to all divergent operators. While this works correctly for comparisons and AND operations, it alters the semantics of OR. In the mql_null_semantics_diverge function (lines 316-327), OR is currently grouped with comparison operators and AND.

      Steps to Reproduce
      Run in mongosql-cli:

      ./target/debug/mongosql-cli "SELECT foo._id, foo.b, foo.a  FROM foo WHERE b > 10 or a = 'B'" 

      Observe that the generated aggregation pipeline includes (simplified query):

        {
          $match: {
            $expr: {
              $and: [
                {
                  $or: [
                    { $gt: ["$b", { $literal: 10 }] },
                    { $eq: ["$a", { $literal: "B" }] }
                  ]
                },
                { $gt: ["$a", { $literal: null }] },
                { $gt: ["$b", { $literal: null }] }
              ]
            }
          }
        }
      

      which filter out null values before evaluating $or.

      Full example repro with the data and execution enviroment: Mongo playground

      Data:

      [
        { "a": "B", "b": 11, "name": "1" },
        { "a": null, "b": 11, "name": "2" },
        { "a": "B", "b": null, "name": "3" },
        { "a": null, "b": null, "name": "4" }
      ]

      Expected Results

      Documents where either operand of the OR condition is true should be included, even if the other operand is NULL:

      [
        { "": { "a": "B", "b": 11, "name": "1" } },
        { "": { "a": null, "b": 11, "name": "2" } },
        { "": { "a": "B", "b": null, "name": "3" } }
      ] 

      Actual Results

       

      [
        { "": { "a": "B", "b": 11, "name": "1" } }
      ]

       

            Assignee:
            Jonathan Powell
            Reporter:
            Slav Babanin
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: