|
Hardware problems have been bugging me for a while.
For some time, there has been something not quite right with this webserver.
It hasn't been bad enough to cause any major issues (at least, not since
I ran sysctl -w machdep.ddb_on_nmi=0 ; sysctl -w
machdep.panic_on_nmi=0 ), but dmesg would keep revealing bursts
of the following during heavy disk activity:
NMI ISA 3c, EISA ff
NMI ISA 2c, EISA ff
NMI ISA 2c, EISA ff
NMI ISA 3c, EISA ff
Assuming that the bits AND together, this indicates a memory parity
error, an I/O error, and some undefined error.
These errors would show up from time to time, not causing any major
issues (except for the occasional seg-fault of make while doing the
tree cleanup stage of make buildworld, or while doing a make index
under /usr/ports) and I felt that they were not worth the trouble of
visiting the co-lo to take a look at the server.
Now they are increasing, and are being visited by their friend:
kernel trap 19 with interrupts disabled
As far as I can tell, this kernel trap indicates that an NMI came in,
while the system was already servicing another NMI.
Not good...
The other day, one of my nightly Memtest
runs showed an error.
This also happened a couple of weeks ago, with the same bit being
flipped, at the same memory location.
Even Worse...
So, tonight I'm going to visit the co-lo, swap out some memory for
testing, replace drive cables with known good cables (Just in case -
the faults occuring during heavy disk activity makes me want to change
them), maybe underclock the processor if things seem too warm, and
poke around with the rest of the system.
Hopefully I have better luck fixing this than Other
people who have had the same problem...
reply
|
| |