Certain versions of the Intel CPU microcode have TSX bugs which might lead to unexplained concurrency issues. We should include server startup warnings or if possible even refuse to start the server if we discover this situation.
- can only be reproduced on servers with the new CPU(E5-2630 v4)
- can be easily reproduced by modification of unittests
- can only be reproduced under particular code execution sequence
- it works well if we add some debug codes into the lock context
after debugging, we started to focus on hardware issue, including memory / CPU.
In August 2014, Intel announced a bug in the TSX implementation on current steppings of Haswell, Haswell-E, Haswell-EP and early Broadwell CPUs, which resulted in disabling the TSX feature on affected CPUs via a microcode update.
we checkout our microcode changelog. In the latest release:
+ Likely fixes a recently identified, critical but low-hitting TSX erratum on Broadwell, Broadwell-E and related Xeons (Broadwell-DE/WS/EP: Xeon-D 1500, E3-v4 and E5-v4)