[SERVER-68014] Eliminate the 70% performance regression when passing "UTC" to $dateTrunc Created: 13/Jul/22  Updated: 29/Oct/23  Resolved: 02/Aug/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 6.1.0-rc0

Type: Improvement Priority: Major - P3
Reporter: Nikita Lapkov (Inactive) Assignee: Alberto Massari
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Fully Compatible
Sprint: QE 2022-08-08
Participants:

 Description   

Let us consider running the following query over the collection of 1100000 documents with just one field "a", which has a date value:

{
  "$group" : {
    "_id" : {
      "$dateTrunc" : {
        "date" : "$a",
        "unit" : "month",
        "binSize" : 5
       }
    },
    "total" : { "$sum" : 1 }
  }
} 

Note that since we did not pass the timezone parameter, $dateTrunc will default to UTC.

This query runs at 1.53 sec / op. But if we pass timezone parameter as a string "UTC" (same semantics, just passing the default parameter explicitly), the query runs at 2.58 sec / op, which is 69% slower.

A combination of the following behavior causes the regression:

  1. Our code which parses the timezone has a special branch. If no timezone parameter is passed, mongo::TimeZoneDatabase::utcZone() is returned. If the timezone parameter is passed, we lookup the timezone in the timezone database

Later on repeatedly during runtime, the following code path is exercised:

  1. Timezones from the timezone database (including the UTC one) have TimeZone::isTimeZoneIDZone() set to true, which results in a call to timelib_set_timezone when TimeZone::adjustTimeZone() is called
  2.  timelib_set_timezone sets the timelib_time::tz_abbr field to a small string representing the timezone and timelib_time::zone_type to TIMELIB_ZONETYPE_ID
  3. There is a check for TIMELIB_ZONETYPE_ID in functions timelib_unixtime2local and timelib_update_from_sse, which are used quite often in our code
  4. If TIMELIB_ZONETYPE_ID is set, timelib_get_time_zone_info is called, which copies the timelib_time::tz_abbr field
  5. These small allocations quickly add up and kill the performance

If the timezone is constructed using mongo::TimeZoneDatabase::utcZone() (which is the case for the absent timezone parameter for $dateTrunc), TIMELIB_ZONETYPE_ID is not set and no additional allocations are performed. 

During our investigation, we have noticed as much as 2x performance regression because of these allocations when executing $dateTrunc in our time-series prototype for SBE.

While the frequency of timezone parsing is reduced by the proposed change in SERVER-66521 across repeated calls to date time expression with the same timezone specification; however, if ID style timezone is used, the regression seen with "UTC" timezone from the runtime path 1-5 is still present. This ticket will attempt to address the latter. 

There might be possibilities to also optimize for timezones other than UTC but attention must be paid with regard to DST. UTC has no DST concept (permanent offset == 0) so it does not need to be looked up again when the time value passed into the expression changes.



 Comments   
Comment by Githook User [ 02/Aug/22 ]

Author:

{'name': 'Alberto Massari', 'email': 'alberto.massari@mongodb.com', 'username': 'albymassari'}

Message: SERVER-68014 treat 'UTC' timezone the same as the default timezone
Branch: master
https://github.com/mongodb/mongo/commit/09e278d706b4062b53d5b614cfd6b63ec62661ee

Comment by Ana Meza [ 19/Jul/22 ]

rushan.chen@mongodb.com assigning you to find an assignee

Generated at Thu Feb 08 06:09:38 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.