[SERVER-77840] Replace MurmurHash3 with absl::Hash if possible in FTSIndexFormat::_appendIndexKey Created: 06/Jun/23  Updated: 30/Oct/23  Resolved: 30/Oct/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Dan Larkin-York Assignee: Ivan Fefer
Resolution: Won't Fix Votes: 0
Labels: quick-tech-debt
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Query Execution
Participants:

 Description   

MurmurHash3 is a pretty old hash function that's slow and produces low-quality hashes compared to modern alternatives like absl::Hash (CityHash). We want to replace any uses of MurmurHash3 which are for in-memory use only, as we expect this to yield quick perf wins. Hash values that are persisted to disk across restarts or sent across the network between servers are likely unsafe to change, and should be annotated with a clear explanation for why the usage requires a stable hash computation.



 Comments   
Comment by Githook User [ 30/Oct/23 ]

Author:

{'name': 'Ivan Fefer', 'email': 'ivan.fefer@mongodb.com', 'username': 'Fefer-Ivan'}

Message: SERVER-77840 Remove unused murmur3.h includes
Branch: master
https://github.com/mongodb/mongo/commit/c9e7bd4c4af13c9d98a69de485a3a4c1466a7eb9

Comment by Ivan Fefer [ 30/Oct/23 ]

Murmur hash is only used in legacy version 2 of full text index: https://github.com/mongodb/mongo/blob/master/src/mongo/db/fts/fts_index_format.cpp#L198

I am closing this ticket as Won't Fix

Comment by Ivan Fefer [ 19/Oct/23 ]

Okay, funny thing. I didn't read the ticket title completely and started to replace murmur in other places. I actually almost completely removed murmur in favor of absl::Hash, but in FTS it must stay for legacy reasons.

However, murmur is used in text index version 2 and the current version is text index version 3. This version uses md5 (since 2015).
The main goal of v3 was to support non-ICU unicode: SERVER-19557

We can in theory create text index version 4 that uses CityHash and test it's performance.

Generated at Thu Feb 08 06:36:45 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.