Tag: TSS

The key to faster shell scripts: know your shell’s features and use them!

I have a cleanup program that I’ve written as a Bash shell script. Over the years, it has morphed from a thing that just deleted a few fixed directories if they existed at all (mostly temporary file directories found on Windows) to a very flexible cleanup tool that can take a set of rules and rewrite and modify them to apply to multiple versions of Windows, along with safeguards that check the rules and auto-rewritten rules to prevent the equivalent of an “rm -rf /*” from happening. It’s incredibly useful for me; when I back up a customer’s PC data, I run the cleaner script first to delete many gigabytes of unnecessary junk and speed up the backup and restore process significantly.

Unfortunately, having the internal rewrite and safety check rules has the side effect of massively slowing the process. I’ve been tolerating the slowness for a long time, but as the rule set increased in size over the past few years the script has taken longer and longer to complete, so I finally decided to find out what was really going on and fix this speed problem.

Profiling shell scripts isn’t quite as easy as profiling C programs; with C, you can just use a tool like Valgrind to find out where all the effort is going, but shell scripts depend on the speed of the shell, the kernel, and the plethora of programs executed by the script, so it’s harder to follow what goes on and find the time sinks. However, I observed that a lot of time was spent in the steps between deleting items; since each rewrite and safety check is done on-the-fly as deletion rules are presented for processing, those were likely candidates. The first thing I wanted to know was how many times the script called an external program to do work; you can easily kill a shell script’s performance with unnecessary external program executions. To gather this info, I used the strace tool:

strace -f -o strace.txt tt_cleaner

This produced a file called “strace.txt” which contains every single system call issued by both the cleaner script and any forked programs. I then looked for the execve() system call and gathered the counts of the programs executed, excluding “execve resumed” events which aren’t actual execve() calls:

grep execve strace.txt | sed ‘s/.*execve/execve/’ | cut -d\” -f2 | grep -v resumed | sort | uniq -c | sort -g

The resulting output consisted of numbers below 100 until the last two lines, and that’s when I realized where the bottleneck might be:

4157 /bin/sed
11227 /usr/bin/grep

That’s a LOT of calls to sed, but the number of calls to grep was almost three times bigger, so that’s where I started to search for ways to improve. As I’ve said, the rewrite code takes each rule for deletion and rewrites it for other possible interpretations; “Username\Application Data” on Windows XP was moved to “Username\AppData\Roaming” on Vista and up, while “All Users\Application Data” was moved to “C:\ProgramData” in the same, plus there is a potential mirror of every single rule in “Username\AppData\Local\VirtualStore”. The rewrite code handles the expansion of the deletion rules to cover every single one of these possible cases. The outer loop of the rewrite engine grabs each rewrite rule in order while the inner loop does the actual rewriting to the current rule AND and all prior rewrites to ensure no possibilities are missed (VirtualStore is largely to blame for this double-loop architecture). This means that anything done within the inner loop is executed a huge number of times, and the very first command in the inner loop looked like this:

if echo “${RWNAMES[$RWNCNT]}” | grep -qi “${REWRITE0[$RWCNT]}”

This checks to see if the rewrite rule applies to the cleaner rule before doing the rewriting work. It calls grep once for every single iteration of the inner loop. I replaced this line with the following:

if [[ “${RWNAMES[$RWNCNT]}” =~ .*${REWRITE0[$RWCNT]}.* ]]

I had to also tack a “shopt -s nocasematch” to the top of the shell script to make the comparison case-insensitive. The result was a 6x speed increase. Testing on an existing data backup which had already been cleaned (no “work” to do) showed a consistent time reduction from 131 seconds to 22 seconds! The grep count dropped massively, too:

97 /usr/bin/grep

Bash can do wildcard and regular expression matching of strings (the =~ comparison operator is a regex match), so anywhere your shell script uses the “echo-grep” combination in a loop stands to benefit greatly by exploiting these Bash features. Unfortunately, these are not POSIX shell features and using them will lead to non-portable scripts, but if you will never use the script on other shells and the performance boost is significant, why not use them?

The bigger lesson here is that you should take some time to learn about the features offered by your shell if you’re writing advanced shell scripts.

Update: After writing this article, I set forth to eliminate the thousands of calls to sed. I was able to change an “echo-sed” combination to a couple of Bash substring substitutions. Try it out:

FOO=${VARIABLE/string_to_replace/replacement}

It accepts $VARIABLES where the strings go, so it’s quite powerful. Best of all, the total runtime dropped to 10.8 seconds for a total speed boost of over 11x!

MiniTSS 2.9.0 released, plus c02ware site redesign

If you take a look at the page for the Tritech Service System distribution of Linux, you’ll notice a few new things. The most obvious is that I’m redoing the c02ware site design; there’s now a basic logo, proper site navigation, a mobile-friendly layout, and a cleaner-looking color scheme. Consistency across pages has been greatly improved, and lots of unnecessary old junk and confusing content has been completely tossed out.

This change is being driven by my push to release the Tritech Service System with all of our proprietary bits included as a commercial product, with regular updates, bug fixes, and support. I will continue to release TSS without any proprietary bits as a public and completely free system, but for anyone in the PC repair business, the paid-for stuff can easily pay for itself in workflow acceleration and productivity boosts within a month, and we want to be able to bring that advantage to other PC service shops and I.T. departments. If you are interested in being notified when the Tritech Service System becomes available for purchase, send me an email and I will keep you in the loop.

A major goal in TSS is keeping the system as small as possible without cutting out basic features. In the effort to move towards this goal, I have released MiniTSS 2.9.0! The download is a paltry eight megabytes in size, and includes includes the following software packages:

  • busybox 1.21.1
  • chntpw 110511
  • cifsmount (mount.cifs helper)
  • dd-rescue 1.28
  • dropbear 0.52
  • fuse 2.8.3
  • glibc 2.10.1
  • libblkid 1.1.0
  • libuuid 1.3.0
  • ncurses 5.6
  • ntfs-3g-ntfsprogs 2011.4.12
  • pv 1.2.0
  • rsync 3.0.7
  • socat 1.7.2.1
  • sysfsutils 2.1.0
  • tar 1.22
  • tss-base-fs-mini 101
  • tss-bootstrap
  • udev 163
  • xz-utils 5.0.4
  • zlib 1.2.3

MiniTSS is not just a live CD/USB system. You can download the “source” archive, unpack it on your Linux system, add or remove packages to initramfs as you see fit, and rebuild your own custom version with whatever software you actually need. The system only provides basic tools and a command line interface, and therefore is aimed at intermediate-level Linux users.

Manually copying a RAID-0 striped array to a single drive for data recovery

This question was posed on a forum:

I have a customer who has a computer, 2 SATA disk (striped in RAID config. Windows won’t load. Diag reports bad hard drive. When I disconnect one, it kills the stripe and the computer appears to not have a hard drive at all. Seems kind of silly to have it set this way as it increases the risk of failure. Other than putting each hard drive in another computer, I’d like to determine which of the disk are bad.

Also, not quite sure how to attack data recovery as they are a stripe set and plugging in to a SATA to USB does not appear to be a valid method. If I put a third hard drive in as a boot drive, do i have to reconfig the stripe set and if i do, will it kill the data.

I have reassembled two RAID-0 “striped” drives to a single larger drive by hand before. It’s actually a programmatically simple operation, but you require a lot of low-level knowledge and some guesswork to do it. The specific pair I had appeared to store the metadata somewhere other than the start of the disk, and I was able to discover through a hex editor that the drive was on a 64KB stripe size. I also spotted which drive had a partition table and which didn’t, because that’s only on the first drive which contains the first stripe.

At a Linux command prompt, with the two RAID-0 disks (that were NOT being detected properly by the Linux LVM2 “FakeRAID” algorithms, by the way) and a disk of twice their size connected, I wrote a very simple script that looked something like this (sda/sdb as RAID-0, sdc as the destination disk, and this might work under ash or similar as well).

—- cut here —-

#/bin/bash

; X=sda position, Y=sdb position, Z=sdc position, STRIPE=stripe size
X=0; Y=0; Z=0; STRIPE=65536

; Retrieve the size of a RAID-0 disk so we can terminate at the end
SIZE=$(cat /proc/partitions | grep ‘sda$’ | awk ‘{print $3}’)
; Divide size by stripe, including any tail blocks (expr truncates)
SIZE=$(( SIZE + STRIPE – 1 ))
SIZE=$(expr $SIZE / $STRIPE ))
while [ “$Z” -lt “$SIZE” ]
do
dd if=/dev/sda of=/dev/sdc seek=$Z; skip=$X bs=$STRIPE count=1
Z=$(( Z + 1 ))
dd if=/dev/sdb of=/dev/sdc seek=$Z; skip=$Y bs=$STRIPE count=1
Z=$(( Z + 1 ))
X=$(( X + 1 ))
Y=$(( Y + 1 ))
done

—- cut here —-

Note that all it does is load 64K at a time from each disk and save it to the third disk in sequential order. This is untested, and requires modification to suit your scenario, and is only being written here as an example. It does not fail if a ‘dd’ command fails, so it will work okay for data recovery; you will lose any stripe that contains a bad block, though, and the algorithm could be improved to use dd_rescue (if you have it) or to copy smaller units in a stripe so that only a partial stripe loss occurs on bad data.

Linux PowerPC yaboot + initramfs/initrd woes solved; no more “unable to mount root fs” problem!

This one had me ripping my hair out for two days straight. Anyone who has tried to create a Linux bootable CD for a PowerPC system has either run into this problem, followed some kind of magic set of directions that don’t explain the details that could cause this problem, or do something crummy like using the CD as the root filesystem.

PowerPC systems are very different from i386/i686/x86_64 systems in how they boot, and because they are much less common, they garner less interest and also have less available documentation and Internet forum assistance. The specific problem that I ran into is this: using the Tritech Service System’s construction for x86 as a template, and gleaning information from other PPC bootable CD images, I was able to create a CD that would properly boot the iBook G3 I used for testing into yaboot, the PowerPC Linux loader. The process of figuring out how to pull this off took many hours of reading and dissection, and I could easily chalk a full day’s work up as wasted on this process due to the fact that it’s not well-documented. From there, yaboot was configured to load my kernel and initrd (in this case, an initramfs, not an initrd, but the loading process is the same.) However, I was greeted every single time with kernel output that showed no indication of any initrd/initramfs being loaded and handed off to the kernel. I was stumped. It seemed as if I had done everything that the others do, yet it didn’t work. I tried these things to resolve the problem, to no avail:

  • Copying the map.hfs file from another Linux distribution that seemed more complete
  • Editing yaboot.conf to add and remove things like ramdisk_size=16384 or device=cd: to the options
  • Recompiling the PPC32 kernel with initrd turned on (shouldn’t be needed for initramfs, but I was quite annoyed and desperate)
  • Playing with the ofboot.b text file to see if anything inside could make a difference (CHRP is becoming a dirty word in my book)
  • Booting the G3 to Open Firmware and typing excessively cryptic and obnoxious commands that make learning “sed” look like a cakewalk
  • Pondering the consumption of potent alcoholic beverages while at work to defer blame for not figuring this nonsense out

So, after two days of trying to go from a collection of packages and a kernel to a real-world bootable Tritech Service System 2.7.6 ISO for PowerPC Macs, and nearly losing my sanity in the process, I finally hit upon an obscure, nasty, rarely discussed, extremely STUPID, yet horribly important fact:

yaboot doesn’t load initrd or initramfs if the kernel image is compressed.

Yes, that’s it. That’s the source of my ills. The godforsaken bootloader will detect a compressed kernel and simply and quietly ignore the “initrd=/boot/initrd1.gz” parameter. The even simpler solution? Instead of using the compiled kernel at arch/powerpc/boot/zImage.pmac, one must use the compiled kernel at…well, you might not believe this…just plain vmlinux. The uncompressed raw kernel image produced immediately under the Linux kernel folder you build in. That’s all that I had to do, and I have never been so pissed off over such a small detail in my life.

All too often in the computer world, I see the “user” aspects of things documented repeatedly and done to death; entire volumes have been written just to explain how to perform basic functions or configure a program to the liking of the user. Even the process of compiling a Linux kernel is so thoroughly documented and explained that it’s fairly hard to fail to do it if you use a decent guide. Why is it, though, that these crucial points involving low-level details and bootloader quirks are overlooked and go largely undocumented? If I type “yaboot initramfs” into Google or Yahoo or Bing, why doesn’t the very first page that appears scream at me in bold text “YABOOT WILL NOT LOAD INITRD IMAGES IF THE KERNEL IMAGE IS COMPRESSED!” I know of approximately ZERO bootloaders that have this obnoxiously non-standard behavior. I’ve messed with LILO, SILO, GRUB, SYSLINUX, ISOLINUX, PXELINUX, BootX, and U-Boot on a $99 WM8650 ARM-based netbook, and not once have I run into this problem with ANY of those bootloaders AT ALL.

I hope that this information helps anyone trying to master a Linux on PowerPC bootable ISO to not waste two days and use their CD burner to create ten useless shiny silver coasters in the process. Also, could someone explain to me WHY the yaboot bootloader can’t load both images as compressed images?

Kernel panic – not syncing: Attempted to kill init! (glibc problem)

While working on the Tritech Service System, I made the mistake of using a glibc package compile for an i686 in the initramfs for an i586 kernel. I happened to do some searching to figure out the source of the problem, since all of my kernels would crash with this message in the exact same place, and thought I’d share it with everyone. This could frustrate custom Linux distro attempts easily.

In short: if the kernel panics because something “attempted to kill init!” then make sure your C library (glibc, eglibc, uclibc, dietlibc, whatever) is not compiled for a CPU higher than the CPU you’re trying to run on.

What’s happening is the system is attempting to execute “init” which immediately terminates due to the fact that the library uses invalid CPU instructions (the older processor doesn’t know about the newer instructions compiled into the library). The message “attempted to kill init!” is technically correct: init was killed because it tried to do something bad, but init is required to run anything else, so once init immediately crashes out, there’s nothing left for the system to do, and the kernel hangs itself up.

Tritech Service System 2.1 progress

So far, only a few bugs remain in TSS 2.1, which is currently at  version 2.1-alpha5.  There are some problems currently being worked out with KMS support, which is the biggest issue so far.  The entire “init” system has been rewritten to replace traditional “init” with the runit-init tools provided by BusyBox.  The move to a partially modular kernel has been done, and we’re testing that out on machines to make sure it behaves as expected.  (Modular support is necessary for drivers that must load firmware, like most wireless network adapters and some of the KMS video drivers in the kernel).

We haven’t set up the “beacon server” features yet, mainly because some security issues need to be addressed.  Persistent home support has been thrown out for now, since we will be replacing it with something more robust in the future.  We’re not too far from the 2.1 release.  It’s going to be pretty sweet!  Stay tuned!

Update (2011-04-08): Up to -alpha7b2, trying to help the Linux kernel developers with some serious framebuffer issues.  It seems that Intel i915 framebuffers (inteldrmfb) like to cause a completely black screen at boot time on plenty of computers (and that’s even using today’s most recent git pull of the kernel code).  Radeon and nVidia once in a while has the same issue.  In the meantime I’ve built a second kernel that has all the graphics stuff stripped out so that we can use console-only mode.

Work toward Tritech Service System 2.1

I’ve officially started the ground work on version 2.1 of the Tritech Service System.  Major changes and bug fixes that are already in the works:

  • Changing startup scripts and Busybox “init” to Busybox “runit” style: The biggest benefit of this change is that system services will be able to start in parallel, which will lower startup time drastically.
  • Partially modular kernel: Some drivers in the kernel work better as modules, and some don’t work at all unless they’re modules because of missing firmware files (specifically many wireless cards). While most of the kernel will remain monolithic, select drivers will be modularized to increase usability of the system.
  • Wireless support that actually functions: Previous versions didn’t support wireless adapters in any way that could be considered usable for most configurations. TSS 2.1 is going to include working wireless support in the kernel and supporting software.  Actual “easy” wireless configuration tools are not planned until a 3.0 release.
  • Basic support for packages: In our private TSS 2.0.7, we’ve included early support for loading packages from local media and our internal network server. What we use currently is not good enough for public release, but it does lay the foundation for that type of support.  TSS 2.1 will have a working implementation of boot-time package loading support, which allows extending the system without completely rebuilding initrd packages.
  • Native KMS driver/X.org support: Fixes the long-standing issues with most nVidia and some Intel graphics controllers failing to work with TSS out-of-the-box. X.org auto-configuration is also being implemented.  At least one test system with two video cards automatically set itself up with dual displays after adding Intel KMS and Nouveau KMS to the kernel!
  • udev automatic mount point handling: No more /mnt folder with tons of mount points for nonexistent devices! Mount points will be created more like mainstream Linux distributions, with volume label mount points in /media and device name mount points  in /mnt, all automatically handled by udev.
  • Software updates across the board: New versions of important tools such as the NTFS-3G driver from Tuxera, the latest Xine media player, and more are included with Tritech Service System 2.1. Additionally, old unnecessary packages and general junk files have been removed.
  • Improved boot times: Parallel service startup, a smaller initrd file to boot, and use of faster compression technologies all contribute to a much quicker startup time than previous versions.

Visit http://c02ware.com/tss.php periodically so you don’t miss the release!

“socat” as a UDP beacon (a Tritech Service System technology preview)

One of the most novel ideas has been that of having a TSS CD which doesn’t require upgrading, because every upgrade cycle I’m forced to distribute new burned CDs to all of the technicians and rewrite all of our bootable USB flash drives for the new system. TSS 3.0 will have the ability to upgrade over a network automatically during early startup.

Still alive. Maybe.

The Tritech Service System (TSS) is coming along very slowly but very surely.  No worries.

I’m running into little things that I could post here, but haven’t had time for all that.  Apologies.

Looks like c02ware might end up taking off the ground for more than just TSS, too.  I’m already past the vaporware stage on two software projects that aren’t TSS.

Oh yeah, and Tritech Computer Solutions ended up not expanding to multiple stores after everyone started coming to us.  We’ve got people from Charlotte, Wake Forest, and even Virginia coming here to get computers worked on.  It’s crazy, man, absolutely crazy.

Now, back to vim and PuTTY with me!  Good night!

Time ensures that things rarely remain the same.

At Tritech, many things have changed since even just one month ago. Here’s a spiffy list of such things. By the way, my new favorite word is “terse.” The magic of the word “terse” is that practically all of its synonyms not as terse as “terse.” It’s a self-fulfilling definition! ^_^ So, what’s been going on during my silence, you ask? Read on!