Details
-
Bug
-
Resolution: Done
-
Major - P3
-
None
-
2.0.6
-
Linux <host> 2.6.18-238.12.1.el5 #1 SMP Sat May 7 20:18:50 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 44
model name : Intel(R) Xeon(R) CPU E5645 @ 2.40GHz
stepping : 2
cpu MHz : 1596.000
cache size : 12288 KB
physical id : 1
siblings : 6
core id : 0
cpu cores : 6
apicid : 32
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx pdpe1gb rdtscp lm constant_tsc ida nonstop_tsc arat pni monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr sse4_1 sse4_2 popcnt lahf_lm
bogomips : 4788.14
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management: [8]
^^ x12Linux <host> 2.6.18-238.12.1.el5 #1 SMP Sat May 7 20:18:50 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 44 model name : Intel(R) Xeon(R) CPU E5645 @ 2.40GHz stepping : 2 cpu MHz : 1596.000 cache size : 12288 KB physical id : 1 siblings : 6 core id : 0 cpu cores : 6 apicid : 32 fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx pdpe1gb rdtscp lm constant_tsc ida nonstop_tsc arat pni monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr sse4_1 sse4_2 popcnt lahf_lm bogomips : 4788.14 clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: [8] ^^ x12
-
ALL
Description
A RS-secondary (member of a 4-shard cluster) segfaulted abruptly during "normal" operation. At the time of the failure, nothing should have been querying the secondary – the live site doesn't do RS reads, and some batch iteration jobs run against the secondaries but at a different time. See first backtrace.
Removed lockfile and restarted, crashed again. See second backtrace (is removing the lockfile no longer recommended procedure? If so, oops)
Left lockfile alone and the secondary returned to normal operation.
Let me know what other information would help. This isn't super high-priority for me since the system returned to normal relatively quickly, but it is somewhat troublesome.