Tag: Tritech Service System

Manually copying a RAID-0 striped array to a single drive for data recovery

This question was posed on a forum:

I have a customer who has a computer, 2 SATA disk (striped in RAID config. Windows won’t load. Diag reports bad hard drive. When I disconnect one, it kills the stripe and the computer appears to not have a hard drive at all. Seems kind of silly to have it set this way as it increases the risk of failure. Other than putting each hard drive in another computer, I’d like to determine which of the disk are bad.

Also, not quite sure how to attack data recovery as they are a stripe set and plugging in to a SATA to USB does not appear to be a valid method. If I put a third hard drive in as a boot drive, do i have to reconfig the stripe set and if i do, will it kill the data.

I have reassembled two RAID-0 “striped” drives to a single larger drive by hand before. It’s actually a programmatically simple operation, but you require a lot of low-level knowledge and some guesswork to do it. The specific pair I had appeared to store the metadata somewhere other than the start of the disk, and I was able to discover through a hex editor that the drive was on a 64KB stripe size. I also spotted which drive had a partition table and which didn’t, because that’s only on the first drive which contains the first stripe.

At a Linux command prompt, with the two RAID-0 disks (that were NOT being detected properly by the Linux LVM2 “FakeRAID” algorithms, by the way) and a disk of twice their size connected, I wrote a very simple script that looked something like this (sda/sdb as RAID-0, sdc as the destination disk, and this might work under ash or similar as well).

—- cut here —-

#/bin/bash

; X=sda position, Y=sdb position, Z=sdc position, STRIPE=stripe size
X=0; Y=0; Z=0; STRIPE=65536

; Retrieve the size of a RAID-0 disk so we can terminate at the end
SIZE=$(cat /proc/partitions | grep ‘sda$’ | awk ‘{print $3}’)
; Divide size by stripe, including any tail blocks (expr truncates)
SIZE=$(( SIZE + STRIPE – 1 ))
SIZE=$(expr $SIZE / $STRIPE ))
while [ “$Z” -lt “$SIZE” ]
do
dd if=/dev/sda of=/dev/sdc seek=$Z; skip=$X bs=$STRIPE count=1
Z=$(( Z + 1 ))
dd if=/dev/sdb of=/dev/sdc seek=$Z; skip=$Y bs=$STRIPE count=1
Z=$(( Z + 1 ))
X=$(( X + 1 ))
Y=$(( Y + 1 ))
done

—- cut here —-

Note that all it does is load 64K at a time from each disk and save it to the third disk in sequential order. This is untested, and requires modification to suit your scenario, and is only being written here as an example. It does not fail if a ‘dd’ command fails, so it will work okay for data recovery; you will lose any stripe that contains a bad block, though, and the algorithm could be improved to use dd_rescue (if you have it) or to copy smaller units in a stripe so that only a partial stripe loss occurs on bad data.

Linux PowerPC yaboot + initramfs/initrd woes solved; no more “unable to mount root fs” problem!

This one had me ripping my hair out for two days straight. Anyone who has tried to create a Linux bootable CD for a PowerPC system has either run into this problem, followed some kind of magic set of directions that don’t explain the details that could cause this problem, or do something crummy like using the CD as the root filesystem.

PowerPC systems are very different from i386/i686/x86_64 systems in how they boot, and because they are much less common, they garner less interest and also have less available documentation and Internet forum assistance. The specific problem that I ran into is this: using the Tritech Service System’s construction for x86 as a template, and gleaning information from other PPC bootable CD images, I was able to create a CD that would properly boot the iBook G3 I used for testing into yaboot, the PowerPC Linux loader. The process of figuring out how to pull this off took many hours of reading and dissection, and I could easily chalk a full day’s work up as wasted on this process due to the fact that it’s not well-documented. From there, yaboot was configured to load my kernel and initrd (in this case, an initramfs, not an initrd, but the loading process is the same.) However, I was greeted every single time with kernel output that showed no indication of any initrd/initramfs being loaded and handed off to the kernel. I was stumped. It seemed as if I had done everything that the others do, yet it didn’t work. I tried these things to resolve the problem, to no avail:

  • Copying the map.hfs file from another Linux distribution that seemed more complete
  • Editing yaboot.conf to add and remove things like ramdisk_size=16384 or device=cd: to the options
  • Recompiling the PPC32 kernel with initrd turned on (shouldn’t be needed for initramfs, but I was quite annoyed and desperate)
  • Playing with the ofboot.b text file to see if anything inside could make a difference (CHRP is becoming a dirty word in my book)
  • Booting the G3 to Open Firmware and typing excessively cryptic and obnoxious commands that make learning “sed” look like a cakewalk
  • Pondering the consumption of potent alcoholic beverages while at work to defer blame for not figuring this nonsense out

So, after two days of trying to go from a collection of packages and a kernel to a real-world bootable Tritech Service System 2.7.6 ISO for PowerPC Macs, and nearly losing my sanity in the process, I finally hit upon an obscure, nasty, rarely discussed, extremely STUPID, yet horribly important fact:

yaboot doesn’t load initrd or initramfs if the kernel image is compressed.

Yes, that’s it. That’s the source of my ills. The godforsaken bootloader will detect a compressed kernel and simply and quietly ignore the “initrd=/boot/initrd1.gz” parameter. The even simpler solution? Instead of using the compiled kernel at arch/powerpc/boot/zImage.pmac, one must use the compiled kernel at…well, you might not believe this…just plain vmlinux. The uncompressed raw kernel image produced immediately under the Linux kernel folder you build in. That’s all that I had to do, and I have never been so pissed off over such a small detail in my life.

All too often in the computer world, I see the “user” aspects of things documented repeatedly and done to death; entire volumes have been written just to explain how to perform basic functions or configure a program to the liking of the user. Even the process of compiling a Linux kernel is so thoroughly documented and explained that it’s fairly hard to fail to do it if you use a decent guide. Why is it, though, that these crucial points involving low-level details and bootloader quirks are overlooked and go largely undocumented? If I type “yaboot initramfs” into Google or Yahoo or Bing, why doesn’t the very first page that appears scream at me in bold text “YABOOT WILL NOT LOAD INITRD IMAGES IF THE KERNEL IMAGE IS COMPRESSED!” I know of approximately ZERO bootloaders that have this obnoxiously non-standard behavior. I’ve messed with LILO, SILO, GRUB, SYSLINUX, ISOLINUX, PXELINUX, BootX, and U-Boot on a $99 WM8650 ARM-based netbook, and not once have I run into this problem with ANY of those bootloaders AT ALL.

I hope that this information helps anyone trying to master a Linux on PowerPC bootable ISO to not waste two days and use their CD burner to create ten useless shiny silver coasters in the process. Also, could someone explain to me WHY the yaboot bootloader can’t load both images as compressed images?

Kernel panic – not syncing: Attempted to kill init! (glibc problem)

While working on the Tritech Service System, I made the mistake of using a glibc package compile for an i686 in the initramfs for an i586 kernel. I happened to do some searching to figure out the source of the problem, since all of my kernels would crash with this message in the exact same place, and thought I’d share it with everyone. This could frustrate custom Linux distro attempts easily.

In short: if the kernel panics because something “attempted to kill init!” then make sure your C library (glibc, eglibc, uclibc, dietlibc, whatever) is not compiled for a CPU higher than the CPU you’re trying to run on.

What’s happening is the system is attempting to execute “init” which immediately terminates due to the fact that the library uses invalid CPU instructions (the older processor doesn’t know about the newer instructions compiled into the library). The message “attempted to kill init!” is technically correct: init was killed because it tried to do something bad, but init is required to run anything else, so once init immediately crashes out, there’s nothing left for the system to do, and the kernel hangs itself up.

Tritech Service System 2.1 progress

So far, only a few bugs remain in TSS 2.1, which is currently at  version 2.1-alpha5.  There are some problems currently being worked out with KMS support, which is the biggest issue so far.  The entire “init” system has been rewritten to replace traditional “init” with the runit-init tools provided by BusyBox.  The move to a partially modular kernel has been done, and we’re testing that out on machines to make sure it behaves as expected.  (Modular support is necessary for drivers that must load firmware, like most wireless network adapters and some of the KMS video drivers in the kernel).

We haven’t set up the “beacon server” features yet, mainly because some security issues need to be addressed.  Persistent home support has been thrown out for now, since we will be replacing it with something more robust in the future.  We’re not too far from the 2.1 release.  It’s going to be pretty sweet!  Stay tuned!

Update (2011-04-08): Up to -alpha7b2, trying to help the Linux kernel developers with some serious framebuffer issues.  It seems that Intel i915 framebuffers (inteldrmfb) like to cause a completely black screen at boot time on plenty of computers (and that’s even using today’s most recent git pull of the kernel code).  Radeon and nVidia once in a while has the same issue.  In the meantime I’ve built a second kernel that has all the graphics stuff stripped out so that we can use console-only mode.

“socat” as a UDP beacon (a Tritech Service System technology preview)

One of the most novel ideas has been that of having a TSS CD which doesn’t require upgrading, because every upgrade cycle I’m forced to distribute new burned CDs to all of the technicians and rewrite all of our bootable USB flash drives for the new system. TSS 3.0 will have the ability to upgrade over a network automatically during early startup.

Tritech Service System Community Release Edition 1.3 is OUT!

Once upon a time, I mentioned my custom Linux distribution that I built almost entirely from scratch for use at Tritech Computer Solutions.  I still remember a time when what we’ve come to just call the “TSS” was spouting “Version 0.1 ALPHA” and was horribly rough around the edges.  Since then, the changes to the whole mess have been absolutely amazing, and yet despite running bleeding-edge versions of practically everything, the ISO for the TSS has remained well under 50MB.  I’d tell the Damn Small Linux to eat their hearts out if it wasn’t for the fact that their distro has more programs and has a vastly different set of goals: where DSL carries the challenge to “fit as much as possible into a 50MB business card CD” and uses particularly spartan and/or aging applications to pull it off (and wow, they really have done an amazing job meeting that goal!) the Tritech Service System was built for a very different and conflicting reason: we needed a Linux distro that ran bleeding-edge stuff (especially the latest Linux kernel), fit in a tiny space, didn’t depend on boot media being present, and most importantly, we needed to run software that most live distros don’t tend to come with…and because of shortcomings in existing systems, I certainly didn’t want to remaster or add to them as the solution.

You can find out all about the distro and the details on the Tritech Service System distribution page, but some unanswered questions remain, such as “what prompted you to release your super-special super-secret Linux secret weapon to the world en masse?  What about your competitors?  Aren’t you afraid that they’ll take your hard work and use it to put you out of business?”

I’d like to tackle the competitor question first.  As I’ve stated before, Tritech Computer Solutions doesn’t have any competitors that are capable enough to be considered competitors in the first place.  Those that might be tend to be very Windows-oriented, with limited Linux skills (or none at all; look at Geek Squad’s oft-pirated MRI CD, which is an ugly Windows PE abomination that takes forever to start and makes me wretch at the mere sight of a screenshot…)  Even if a skilled competitor came along that had some Linux background, they’d still have to be willing to invest the time and energy into figuring out how to USE the TSS like we do.  The Tritech Service System replaces numerous software products that other shops have to purchase, such as Symantec Ghost Solution Suite or Acronis Backup & Recovery 10 for system imaging, or Passware Kit Windows Key for resetting lost Windows account passwords.  It even makes it possible to do things that you can’t buy software (that I am aware of) to perform:  replacing a corrupt Windows XP registry hive with a copy from a System Restore point (without using the hackish Recovery Console method on Microsoft’s Knowledge Base), checking key system file hashes against a known-good hash list to find infected or damaged system files, priming a SYSTEM registry hive with the required disk controller driver service and critical device database entry to enable booting from that controller (i.e. switching a controller to RAID mode which sometimes requires a different driver you can’t forcibly install), and much more.  The problem is that, much like a welder, you have to know how to use the tool to accomplish the goal.  Experience is why my “competition” can’t use the Tritech Service System to beat me at my own game: they don’t know how and honestly, to get where I am now requires more work than any typical computer hobbyist would ever want to deal with.  I wonder how I got this far without giving up, because it’s HARD to keep your drive when things get exceedingly frustrating.  For someone who fixes computers “good enough” to get by comfortably, the need to learn how to fully exploit such an esoteric tool doesn’t exist.  They’d prefer to be out boating.

Sometimes I’d prefer to be out boating.  Or fishing.  Or anything else.  If you think computers make YOU stress, be a computer technician and you’ll never complain about being a normal user again.

Now that the unpleasant self-promotional filth and obligatory dihydrogen monoxide humor is out of the way, let’s talk about why I’m releasing the Tritech Service System to the public, and what the difference is between what we use in the shop and what I released, dubbed the “Community Release Edition.”  The reason is simple: I worked very hard and ended up making an extremely useful tool that filled a void in the live Linux distro world, and I wanted to contribute to the Open Source community for making it possible in the first place.  The whole idea behind the Free Software Movement is that we help each other out and contribute our innovations to the rest of the community.  If I’ve made something awesome from things other people shared with me, why not share it with them in return?

Another side reason is that I’d like to be recognized for my work, and I’d like to show my fellow man that I have something positive to contribute to society.  I can’t fix a lot of the problems in the world, but if my Linux distro makes just one person’s life easier and they thank me for it, that’s all I want.  If it helps a hundred people, that’s awesome too.  If I open up donations to continue working on it and can spend a chunk of time each day devoted exclusively to improving the system because of that, I’d be absolutely thrilled and more than happy to do it.  Software work is my true hidden passion, and if I can use that passion to help others, I absolutely will.

That’s why you can download the Tritech Service System Community Release Edition.

As for the rest of the questions…the difference between TSS and TSS CRE is the exclusion of “internal use only” custom software and scripts and custom graphics that we have full rights to redistribute; also, why is the first release numbered “1.3 CRE” anyway?  Internally, we started TSS with 0.1 alpha, which progressed to 0.1, 0.1B, 0.2, 0.2A, 2B, 0.2C, 0.2D, 0.2E, then when the GUI and extended system was added we had 1.0, 1.1, and our latest internal release version so far is 1.2.  The next minor version in the sequence is 1.3, and because some minor updates exist between TSS 1.2 and the TSS CRE, I figured it would make sense to just go up a number.  Internally, there is no TSS 1.3 at all.  I literally had to rewrite 40% of the build system and shuffle tons of files around to make the build system much more customizable and robust.  TSS CRE is what motivated me to add automatic inclusion of some customizations and create a “cleanup” script.  In fact, most of the 1.2 -> 1.3 differences lie in the build system being totally different, not so much in the software that makes up the final product.  I plan to further enhance the “gentss” script to include customizations in a more flexible manner in the future.

I’m already knee-deep in the 1.4 development process too.  Tritech Service System Community Release Edition 1.4 will feature the latest versions of many libraries and packages, better boot scripts, and possibly a few pieces of software that aren’t on 1.3.  Stay tuned and see what develops.

Time ensures that things rarely remain the same.

At Tritech, many things have changed since even just one month ago. Here’s a spiffy list of such things. By the way, my new favorite word is “terse.” The magic of the word “terse” is that practically all of its synonyms not as terse as “terse.” It’s a self-fulfilling definition! ^_^ So, what’s been going on during my silence, you ask? Read on!