Choose a template
Friday, January 25. 2013
My 1Tb laptop drive started misbehaving a few weeks ago, just spending a lot of time spinning when it should have been reading frequently changed files (like the browser cache). I was tempted to blame the browser at that point as no other packages appeared affected.
Wednesday night, a routine package upgrade on unstable brought in a bunch of qt4 updates which I wanted and a virtual box update that I didn't see much point in delaying ... 15 minutes later I couldn't work out why the virtualbox dkms task was still running, spotted it spinning in depmod and found some alarming messages in dmesg about short reads and errors reading from the hard drive. Hmm. The hard drive was out of control at this point, it quickly became apparent that no disc access was going to be possible and I couldn't get to a terminal to kill the current tasks, so I killed the power. The fsck which followed reboot showed more of the errors I'd seen in dmesg and fell back to the manual intervention stage. A few hours of confirming attempts to fix the errors, fsck finally finished. A short amount of usage showed that although fsck had finished, the drive was not happy and was starting to give short reads on other parts of the filesystem, resulting in ~40% of the filesystem appearing to be read-only when the rest was read-write. Somehow, I didn't think this was a welcome feature as the areas affected appeared to be quite random.
The drive in question was a replacement for the original 300Gb drive supplied with the ThinkPad, so a quick bit of switching of drives into a caddy and I could rsync my data off the large drive onto the smaller one. The rsync itself took a lot longer than it should have done because it got lots of short reads too. (Principally in /lib/modules/3.2.0-4/ and in the browser cache directories which had been the original symptom as well as most other places where one could have expected processes to have open files when the drive failed).
Now the 1Tb drive was a pig to fit into the Thinkpad originally because it was too big for the bay but I fitted it anyway. Yes, that was probably a mistake. It certainly meant that in order to fit it I had to not put the drive into the useful caddy provided by Lenovo which makes removal of the drive simple. Indeed the drive was wedged into the bay so tightly that it wasn't going to come out with normal levels of persuasion. This probably contributed to the failure of the drive, so live and learn.
With help from Andy Simpkins (it's always handy to have a hardware engineer on hand at times like this), the keyboard was lifted out, the case was dismantled and just enough room was made to get a screwdriver in behind the sata drive and lever it out of the bay. OK, rebuild laptop, put replacement drive into the caddy (because the smaller capacity drive is also a lot smaller in height than the 1Tb and therefore has plenty of clearance between it and the bay) and move on to the software recovery stage.
Hint: if this happens again, before turning off the broken system for the last time, just remember to download a recent Debian ISO to a USB stick - it saves having to ask someone else or find another machine to do the download. (Thanks Andy...)
OK, so after the usual complaints on reboot that there was no operating system, F12 got the boot order menu up and I was in Debian Installer Rescue Mode. Reinstalling grub failed initially for a few reasons:
A few iterations later, I had a working /dev directory inside the /target chroot, bind mounted from the /dev outside the chroot, I was able to mount proc and sys, so grub was finally happy to reinstall itself and then update the initramfs setup for the new drive.
Reboot, another fsck, all appeared well. I was able to login via the terminal but not in X. Hmmm. Stop xdm, startx manually from the terminal, problems with /tmp/ - permission denied. Oops. Yes, it does help to create /tmp with the right permissions....
The final stage was to complete the
So now I'm back on the original drive, albeit temporarily without any swap whatsoever (because I didn't partition the replacement drive to create /dev/sda5 before doing the copy) and now I remember the second reason I wanted to replace the original drive with the 1Tb drive - the original drive is as NOISY as hell. The whole edge of the laptop vibrates constantly, to the point that I can feel the vibration under the keys as I type. It's not that the drive is loose in the bay, it's just a constant vibration.
But, I have kept all my data and I have a usable laptop for the BSP this weekend. I will be looking at an SSD drive to replace this one though and having also found my old Acer laptop with power supply, I can now reference this entry when I transfer the system a second time.
Display comments as (Linear | Threaded)
The author does not allow comments to this entry