-
Type: Improvement
-
Resolution: Won't Do
-
Priority: Major - P3
-
None
-
Affects Version/s: 3.0.12, 3.2.10, 3.4.0-rc0
-
Component/s: Stability
-
Service Arch
-
5
Certain versions of the Intel CPU microcode have TSX bugs which might lead to unexplained concurrency issues. We should include server startup warnings or if possible even refuse to start the server if we discover this situation.
More information on this was provided by user xiaost as part of SERVER-26018:
- can only be reproduced on servers with the new CPU(E5-2630 v4)
- can be easily reproduced by modification of unittests
- can only be reproduced under particular code execution sequence
- it works well if we add some debug codes into the lock context
after debugging, we started to focus on hardware issue, including memory / CPU.
With the help of Google, we found the TSX feature, speeding up execution of multi-threaded software through lock elision, seems to be evil of everything since 2014:
[1 [2 [3
In August 2014, Intel announced a bug in the TSX implementation on current steppings of Haswell, Haswell-E, Haswell-EP and early Broadwell CPUs, which resulted in disabling the TSX feature on affected CPUs via a microcode update.
we checkout our microcode changelog. In the latest release:
+ Likely fixes a recently identified, critical but low-hitting TSX erratum on Broadwell, Broadwell-E and related Xeons (Broadwell-DE/WS/EP: Xeon-D 1500, E3-v4 and E5-v4)
- related to
-
SERVER-24283 Invariant failure grantedCounts[mode] >= 1
- Closed
-
SERVER-26018 Inconsistency between the LockManager grantedList and grantedCount
- Closed