Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-68014

Eliminate the 70% performance regression when passing "UTC" to $dateTrunc

    • Type: Icon: Improvement Improvement
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 6.1.0-rc0
    • Affects Version/s: None
    • Component/s: None
    • None
    • Fully Compatible
    • QE 2022-08-08

      Let us consider running the following query over the collection of 1100000 documents with just one field "a", which has a date value:

      {
        "$group" : {
          "_id" : {
            "$dateTrunc" : {
              "date" : "$a",
              "unit" : "month",
              "binSize" : 5
             }
          },
          "total" : { "$sum" : 1 }
        }
      } 

      Note that since we did not pass the timezone parameter, $dateTrunc will default to UTC.

      This query runs at 1.53 sec / op. But if we pass timezone parameter as a string "UTC" (same semantics, just passing the default parameter explicitly), the query runs at 2.58 sec / op, which is 69% slower.

      A combination of the following behavior causes the regression:

      1. Our code which parses the timezone has a special branch. If no timezone parameter is passed, mongo::TimeZoneDatabase::utcZone() is returned. If the timezone parameter is passed, we lookup the timezone in the timezone database

      Later on repeatedly during runtime, the following code path is exercised:

      1. Timezones from the timezone database (including the UTC one) have TimeZone::isTimeZoneIDZone() set to true, which results in a call to timelib_set_timezone when TimeZone::adjustTimeZone() is called
      2.  timelib_set_timezone sets the timelib_time::tz_abbr field to a small string representing the timezone and timelib_time::zone_type to TIMELIB_ZONETYPE_ID
      3. There is a check for TIMELIB_ZONETYPE_ID in functions timelib_unixtime2local and timelib_update_from_sse, which are used quite often in our code
      4. If TIMELIB_ZONETYPE_ID is set, timelib_get_time_zone_info is called, which copies the timelib_time::tz_abbr field
      5. These small allocations quickly add up and kill the performance

      If the timezone is constructed using mongo::TimeZoneDatabase::utcZone() (which is the case for the absent timezone parameter for $dateTrunc), TIMELIB_ZONETYPE_ID is not set and no additional allocations are performed. 

      During our investigation, we have noticed as much as 2x performance regression because of these allocations when executing $dateTrunc in our time-series prototype for SBE.

      While the frequency of timezone parsing is reduced by the proposed change in SERVER-66521 across repeated calls to date time expression with the same timezone specification; however, if ID style timezone is used, the regression seen with "UTC" timezone from the runtime path 1-5 is still present. This ticket will attempt to address the latter. 

      There might be possibilities to also optimize for timezones other than UTC but attention must be paid with regard to DST. UTC has no DST concept (permanent offset == 0) so it does not need to be looked up again when the time value passed into the expression changes.

            Assignee:
            alberto.massari@mongodb.com Alberto Massari
            Reporter:
            nikita.lapkov@mongodb.com Nikita Lapkov (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: