[SERVER-59631] Values of value::TypeTags::ArraySet type might compare incorrectly for equality Created: 26/Aug/21 Updated: 29/Oct/23 Resolved: 15/Sep/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Internal Code |
| Affects Version/s: | None |
| Fix Version/s: | 5.1.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Irina Yatsenko (Inactive) | Assignee: | Mohammad Dashti (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||
| Operating System: | ALL | ||||||||||||
| Sprint: | QE 2021-09-20 | ||||||||||||
| Participants: | |||||||||||||
| Linked BF Score: | 135 | ||||||||||||
| Description |
|
Add a unit test similar to this one:
The test creates two arraySets with a single item in each. The items would compare equal on their own, but when the containing sets are checked for equality, it would fail for pointer-based types of values such as bsonObject or long strings. The problem seems to be with the implementation of raw_hash_set::has_element() (see https://github.com/10gen/mongo/blob/defbe4582778e3da3abdc23c73ef7639543ab380/src/third_party/abseil-cpp-master/abseil-cpp/absl/container/internal/raw_hash_set.h#L1693 as, after matching the hashes, it compares the stored pairs literally, rather than using value::compareValue(). |
| Comments |
| Comment by Githook User [ 15/Sep/21 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Author: {'name': 'Mohammad Dashti', 'email': 'mdashti@gmail.com', 'username': 'mdashti'}Message: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Githook User [ 14/Sep/21 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Author: {'name': 'Uladzimir Makouski', 'email': 'uladzimir.makouski@mongodb.com', 'username': 'umakouski'}Message: Revert " This reverts commit 013ea42551ce36cc0b134e0ac52e221a681f0ecd. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Githook User [ 13/Sep/21 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Author: {'name': 'Mohammad Dashti', 'email': 'mdashti@gmail.com', 'username': 'mdashti'}Message: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Yoon Soo Kim [ 02/Sep/21 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Currently two fixes have been proposed. 1. Yoonsoo's proposal: As a workaround for SBE, we can modify the compareValue() for ArraySet comparison as follows.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Billy Donahue [ 02/Sep/21 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Kyle Suarez [ 02/Sep/21 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Per our discussion in SBE standup, it sounds like the next steps are to implement a proper value::TypeTags::ArraySet::operator==() that actually does the right thing. Assigning to mohammad.dashti to take a look for the next sprint. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Billy Donahue [ 01/Sep/21 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
So MSVC's stdlib is doing something that seems "sensible" IMO by saying that if I use unordered_set's equal_range (NOT unordered_multiset, just regular unordered_set), I KNOW I'm going to get a 1-element range. If I look up the contents of that element in the rhs container with equal_range, I'm going to get another 0-element or 1-element range. I can see why we wouldn't bother to check whether these 1 element ranges are permutations of each other. It looks like kind of a waste of time! It's already specified that key_eq has to have the same behavior on both containers. But MSVC's stdlib is wrong and libstdc++ is right. The equivalence of the keys under key_eq is not enough to determine container equality. The elements have to be equal to each other, not just equivalent. That's the spec. This is the same as the spec for std::set. The elements at corresponding positions have to be ==, not just equivalent under key_comp. Even the std::set relops like < ignore the key_comp and use lexicographical comparison using the key's own < operator. PS: Also I might be the Abseil contributor Ian was referring to earlier. PPS: I had the culprits swapped in the first version of this comment. Whoever is doing the == to compare elements is right. That's libstdc++ and I had it backwards at first. I updated the text of this comment. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Yoon Soo Kim [ 29/Aug/21 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Sorry but I could not help myself not write a reply. Thanks for finding the document. I'm not a language lawyer too but tried my best to interpret the document. First of all, there's no mentioning that the behavior is undefined if Key is not EqualityComparable. The second:
is_permutation() has two different versions.
The second version uses pred and we should assume that is_permutation() here means the second version since there's no is_permutation() taking 4 iterators and it certainly makes sense to use key_eq() for pred param for is_permutation() if a user specifies key_eq(). The third:
This complexity analysis implies that key_eq() is used along the way, not operator==. Lastly:
Here I believe that Pred means hash_function() and "the equality comparison function for key" is key_eq(), not operator==. Again, I could not think of any reason why the C++ standard defines such a confusing behavior that operator== for unordered container uses operator== for key equality comparison though a user specifies key_eq(), which basically overrides user's intention. If we still can't agree on analysis, we can get a help from language lawyers on Monday. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ian Boros [ 28/Aug/21 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Well if MSVC disagrees then we probably are in a bit of a murky area...on Monday we'll have to summon the C++ language lawyers to help us out In the document you linked to I found the following on page 817:
std::is_permutation does comparison via operator==, so, while I agree that the behavior is confusing, I believe it is what they intended.
Also:
If I understand right (an I am no C++ lawyer | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Yoon Soo Kim [ 28/Aug/21 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
This is a fun. Also I tested your code and it passed on MSVC compiler and library implementation and TestIntEq is used for ASSERT_TRUE(setA == setB). So, my conclusion is that the document you pointed out is probably about Linux C++ library implementation (g++ or clang++ or both?) and absl hash set's behavior is wrong. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ian Boros [ 28/Aug/21 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I'm hesitant to call this an absl issue. From what I could tell the documented and intended behavior is for the absl set's equality operator to rely on the equality operator of the underlying type rather than the ValueEq predicate. This is why we see this "shallow comparison" behavior. To me, the problem is our code, as compareValue() does not take this behavior into account. The solution of simply not using the hash set's equality operator and writing our own logic for set equality is a fine workaround, though it doesn't stop other people from running into this issue if they use operator== on this type. Right now I only see one other use of ValueSetType but there will likely be more. I think we should consider investing in a fix that prevents anyone else from spending time on this again. For example:
But maybe there's a better way? Also, I believe one of the authors/contributors to abseil actually works here. Maybe we can get their thoughts. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Yoon Soo Kim [ 27/Aug/21 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The absl hash set's has_element() has the following implementation.
And hash set's operator== is defined as...
It's using has_element(). But find() implementation is using key_equal()
I think that our ValueEq implementation is already correct and is a wrapper for compareValue() but it's an absl hash set issue. As a workaround for SBE, if we modify the compareValue() for ArraySet comparison as follows, Irina's test case passes.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ian Boros [ 26/Aug/21 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I tried this again with the standard library and got similar results.
I thought I was losing my mind but then I read the docs for unordered_set operator==:
and:
It seems like unordered_set's equality operator is intentionally built on the underlying type's equality operator and not the key_eq predicate. I assume the abseil implementation just uses the same semantics as they're intended as "near drop-in" replacements? (edited)
My inclination is to get around this by redefining ValueSetType and ValueMapType to use a custom "pair" class that has its own operator== which is a wrapper for compareValue(). | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ian Boros [ 26/Aug/21 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Here's a repro that doesn't involve SBE and may help diagnosing this:
|