convert StringData to std::string_view (SERVER-32422)

[SERVER-48429] Remove StringData::ComparatorInterface from StringData Created: 27/May/20  Updated: 02/Nov/23  Resolved: 02/Nov/23

Status: Closed
Project: Core Server
Component/s: Internal Code
Affects Version/s: None
Fix Version/s: 7.2.0-rc0

Type: Sub-task Priority: Minor - P4
Reporter: Billy Donahue Assignee: Billy Donahue
Resolution: Fixed Votes: 0
Labels: servicearch-wfbf-sprint
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-32422 convert StringData to std::string_view Backlog
is related to SERVER-77698 Replace MurmurHash3 with absl::Hash i... Blocked
Assigned Teams:
Service Arch
Backwards Compatibility: Fully Compatible
Participants:

 Description   

StringData::ComparatorInterface and most of its callers can be improved in several respects. I want to open this ticket to give the StringData::ComparatorInterface some attention. It's used in some very important places.

Move it out of the StringData class to be a first-class type. StringData::ComparatorInterface is not part of what StringData is. It's just one particular thing you can do with StringData. We already have a separate file for its definition, `mongo/base/string_data_comparator_interface.h`, so we can essentially rename `StringData::ComparatorInterface` to `StringDataComparator` in that file. We can remove "Interface" from the name. I'm looking at the typedefs:
StringDataUnorderedSet
StringDataUnorderedMap

And they are hard to use correctly because their hasher and key_eq functors are stateful, so a factory function is needed, and the functors provided have lifetime caveats on an underlying StringData::ComparatorInterface everybody just uses the singleton kInstance, so the whole thing can just be a family of matched stateless functors for every use in the codebase. Many can now be migrated to use stdx::unordered_map and stdx::unordered_set and be more efficient.

The StringDataComparatorInterface is missing an equal virtual function. It has a compare and a hash_combine but no equal. This means that doing hash container lookups will have to use compare, which is potentially far more expensive than == on StringData, as equality can short-circuit in the case of unequal string lengths.



 Comments   
Comment by Githook User [ 02/Nov/23 ]

Author:

{'name': 'Billy Donahue', 'email': 'billy.donahue@mongodb.com', 'username': 'BillyDonahue'}

Message: SERVER-48429 StringDataComparator
Branch: master
https://github.com/mongodb/mongo/commit/8505d6092a6680179d27e1b8bf103a4e7577c3e6

Comment by Billy Donahue [ 29/Oct/23 ]

Picking this up for Skunkworks as a minor subtask of the larger SERVER-32422 ticket.

Comment by Billy Donahue [ 10/May/22 ]

Reopening as this is still a performance and modularity win.

Comment by Lauren Lewis (Inactive) [ 24/Feb/22 ]

We haven’t heard back from you for at least one calendar year, so this issue is being closed. If this is still an issue for you, please provide additional information and we will reopen the ticket.

Comment by Andrew Morrow (Inactive) [ 27/May/20 ]

billy.donahue -

Yeah, I don't have any real objection to improvements in this space. But I do want to call out two things:

  • The use of Interface in the mongodb codebase has a long tradition, but is supposed to be reserved for things that consist entirely of pure virtual functions, perhaps with a sprinkling of constants or static methods. In any event, the renaming doesn't seem to be the key aspect here.
  • My recollection is that the interface/implementation partition was required to avoid circular dependencies and layer violations. But perhaps those have shaken out in other ways by now, especially since we have enforced the acyclic graph for a good while now.
Comment by Billy Donahue [ 27/May/20 ]

It's a base class. I don't think it's necessary to put the suffix "Interface" on base classes. It's redundant IMO. The BSON Comparators (which are also hashers) are similarly stateful, mandatory-polymorphic, and referential and could also be improved, yes. What I see in BSON and StringData comparator usage is that a caller often knows the concrete type of compare/hash they need, and the polymorphism doesn't actually help in those cases.

The common SimpleBSONObjComparator, SimpleBSONElementComparator, and SimpleStringDataComparator can get a fast non-polymorphic path. They are the most common cases. I don't think functors should point to outside objects with lifetime concerns. It makes them slower and fragile. They can be passed around by value and own their own state, with shared_ptr if necessary. A comparator is called a lot more than it is constructed or passed around, so construction cost should be insignificant. This is an improvement that could spread out to the BSON Comparators.

Comment by Billy Donahue [ 27/May/20 ]

Added benchmark to compare speed of
AbslFlatHashMap vs SimpleStringDataUnorderedMap.

https://gist.github.com/BillyDonahue/69ea95e128e1ebd1986b7689efaf6721

It's not close at all. AbslFlatHashMap is 2.5x faster!
The only difference between AbslFlatHashMap and SimpleStringDataUnorderedMap is the choice of Hash and KeyEq functors. They are instantiations of the same underlying absl::flat_hash_map container class template.

The StringDataComparatorInterface functors are a slow choice.

Generated at Thu Feb 08 05:17:05 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.