Ensure we can always build 2dsphere index keys after platform changes

XMLWordPrintableJSON

    • Type: Task
    • Resolution: Unresolved
    • Priority: Critical - P2
    • None
    • Affects Version/s: None
    • Component/s: None
    • Query Integration
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      This covers cases where we cannot build index keys due to geo key extraction failures as a result of platform changes (see SERVER-117104 as the umbrella ticket, and AF-16732 as a specific example). This is similar to SERVER-74983 and SERVER-75392 (platform-sensitive S2 trig math), but a different failure mode. Those tickets cover the case where keys can still be generated and just come out slightly different across platforms, which can be fixed by an index rebuild. In AF-16732's case, keys cannot be generated at all on the cluster's current platform, so a rebuild or logical resync wouldn't help. This ticket tracks ensuring that we can always build keys if the customer could build them initially.

      The divergence is in S2LatLng::ToPoint, which turns lat/lng into a point using sin/cos. glibc's sin/cos differ by about 1 ULP (unit in the last place) between versions (AL2 uses glibc 2.26, whereas al2023 uses glibc 2.34), and for a polygon ring with near identical vertices, that one bit determines whether two vertices are distinct or the same point. This has surfaced in several clusters recently due to the Atlas AL2-to-AL2023 data-plane migration (CLOUDP-220663 on AWS): for non-NVMe clusters it is an in-place OS swap with no resync, so keys built under AL2 are carried over unchanged onto an AL2023 binary that can no longer reproduce them (the offline validation binary is the same platform as the customer cluster's current platform). I was able to reproduce the divergence on mongod 8.0.10 on the same aarch64, but with different OS's:

        hole ring:  A -> B -> B' -> A
          B  = [ 7.123456789012345, 48.5 ]
          B' = [ 7.123456789012346, 48.5 ]   // one ULP greater in longitude (~1e-15 deg, sub-nanometer)
      
        AL2    (glibc 2.26):  sin/cos keep B != B'  -> 3 distinct corners -> indexed
        AL2023 (glibc 2.34):  sin/cos round B == B' -> 2 distinct corners -> rejected:
                              "Loop must have at least 3 different vertices"
      

      The document is the same in both cases, but the S2Point is calculated differently due to math differences.

      One potential solution is to run a normalization of each point if it fails this loop validity check. We could normalize each point to a fixed grid that is coarser than the ~1 ULP noise but far below real geometry, merge the now identical vertices, and drop rings that collapse to zero area. This only changes keys for the originally affected documents, so they will need a reindex. Note that this approach would need to be backported to earlier binaries that customers are on so they can rebuild the keys.

            Assignee:
            Jinfeng Ni
            Reporter:
            Lynne Wang
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: