Tag: windows

Windows batch file that converts all files in a directory to Apple ProRes

Windows batch file programming is terrible because the commands have poorly thought out syntax and peculiar requirements, such as double percent signs in batch files but not on the command line. I often make batch files that perform conversions with ffmpeg and when I decided to start trying out Final Cut Pro 7, I realized that I needed to mass convert MP4 video files to ProRes. My Windows machines are much more powerful than my Macs, so I wanted to be able to convert a whole directory to ProRes on Windows by dropping the directory onto a batch file. It took forever to hammer around cmd.exe’s stupid peculiarities but I finally got this together which works (note you’ll need to install ffmpeg somewhere in your PATH or specify an absolute path to it, and you must replace “C:\processed” with your desired output path):

for /F “tokens=*” %%D in (‘dir /b /a:-d %1’) do ffmpeg -y -i %1\%%D -c:v prores -profile:v 2 -vendor ap10 -pix_fmt yuv422p10le -c:a pcm_s16le -ar 48000 “C:\processed\%%~nD.mov”
pause

If you want different ProRes levels, change profile:v to 0 (Proxy), 1 (LT), or 3 (HQ); if you want ProRes 4444 you’ll have to change the encoder to prores_ks and set the level to 4444 and the pix_fmt to yuva444p10le.

The “pause” at the end is just in case you have problems and need to scroll up.

Finding Duplicates Faster: The story of ‘jdupes’, or how I unexpectedly became a better programmer

The problem of finding and handling duplicate files has been with us for a long time. Since the end of the year 1999, the de facto answer to “how can I find and delete duplicate files?” for Linux and BSD users has been a program called ‘fdupes’ by Adrian Lopez. This venerable staple of system administrators is extremely handy when you’re trying to eliminate redundant data to reclaim some disk space, clean up a code base full of copy-pasted files, or delete photos you’ve accidentally copied from your digital camera to your computer more than once. I’ve been quite grateful to have it around–particularly when dealing with customer data recovery scenarios where every possible copy of a file is recovered and the final set ultimately contains thousands of unnecessary duplicates.

Unfortunately, development on Adrian’s fdupes had, for all practical purposes, ground to a halt. From June 2014 to July 2015, the only significant functional changes to the code have been modification to compile on Mac OS X. The code’s stagnant nature has definitely shown itself in real-world tests; in February 2015, Eliseo Papa published “What is the fastest way to find duplicate pictures?” which contains benchmarks of 15 duplicate file finders (including an early version of my fork which we’ll ignore for the moment) that places the original fdupes dead last in operational speed and shows it to be heavily CPU-bound rather than I/O-bound. In fact, Eliseo’s tests say that fdupes takes a minimum of 11 times longer to run than 13 of the other duplicate file finders in the benchmark!

As a heavy user of the program on fairly large data sets, I had noticed the poor performance of the software and became curious as to why it was so slow for a tool that should simply be comparing pairs of files. After inspecting the code base, I found a number of huge performance killers:

  1. Tons of time was wasted waiting on progress to print to the terminal
  2. Many performance-boosting C features weren’t used (static, inline, etc)
  3. A couple of one-line functions were very “hot,” adding heavy call overhead
  4. Using MD5 for file hashes was slower than other hash functions
  5. Storing MD5 hashes as strings instead of binary data was inefficient
  6. A “secure” hash like MD5 isn’t needed; matches get checked byte-for-byte

I submitted a pull request to the fdupes repository which solved these problems in December 2014. Nothing from the pull request was discussed on Github and none of the fixes were incorporated into fdupes. I emailed Adrian to discuss my changes with him directly and there was some interest in certain changes, but in the end nothing was changed and my emails became one-way.

It seemed that fdupes development was doomed to stagnation.

In the venerable traditions of open source software. I forked it and gave my new development tree a new name to differentiate it from Adrian’s code: jdupes. I solved the six big problems outlined above with these changes:

  1. Rather than printing progress indication for every file examined, I added a delay counter to drastically reduce terminal printing. This was a much bigger deal when using SSH.
  2. I switched the code and compilation process to use C99 and added relevant keywords to improve overall performance.
  3. The “hot” one-line functions were changed to #define functions to chop function call overhead for them in half.
  4. (Also covers 5 and 6) I wrote my own hash function  and replaced all of the MD5 code with it, resulting in a benchmarked speed boost of approximately 17%. The resulting hashes are passed around as a 64-bit unsigned integer, not an ASCII string, which (on 64-bit machines) reduces hash comparisons to a single compare instruction.

 

After forking all of these changes and enjoying the massive performance boost they brought about, I felt motivated to continue looking for potential improvements. I didn’t realize at the time that a simple need to eliminate duplicate files more quickly would morph into spending the next half-year ruthlessly digging through the code for ways to make things better. Between the initial pull request that led to the fork and Eliseo Papa’s article, I managed to get a lot done:

 

At this point, Eliseo published his February 19 article on the fastest way to find duplicates. I did not discover the article until July 8 of the same year (at which time jdupes was at least three versions higher than the one being tested), so I was initially disappointed with where jdupes stood in the benchmarks relative to some of the other tested programs, but even the early jdupes (version 1.51-jody2) code was much faster than the original fdupes code for the same job.

1.5 months into development, jdupes was 19 times faster in a third-party test than the code it was forked from.

Nothing will make your programming efforts feel more validated than seeing something like that from a total stranger.

Between the publishing of the article and finding the article, I had continued to make heavy improvements:

 

When I found Eliseo’s article from February, I sent him an email inviting him to try out jdupes again:

I have benchmarked jdupes 1.51-jody4 from March 27 against jdupes 1.51-jody6, the current code in the Git repo. The target is a post-compilation directory for linux-3.19.5 with 63,490 files and 664 duplicates in 152 sets. A “dry run” was performed first to ensure all files were cached in memory first and remove variances due to disk I/O. The benchmarking was as follows:

$ ./compare_fdupes.sh -nrq /usr/src/linux-3.19.5/
Installed fdupes:
real 0m1.532s
user 0m0.257s
sys 0m1.273s

Built fdupes:
real 0m0.581s
user 0m0.247s
sys 0m0.327s

Five sequential runs were consistently close (about ± 0.020s) to these times.

In half a year of casual spare-time coding, I had made fdupes 32 times faster.

There’s probably not a lot more performance to be squeezed out of jdupes today. Most of my work on the code has settled down into working on new features and improving Windows support. In particular, Windows has supported hard linked files for a long time, and I’ve taken full advantage of Windows hard link support. I’ve also made the progress indicator much more informative to the user. At this point in time, I consider the majority of my efforts complete. jdupes has even gained inclusion as an available program in Arch Linux.

Out of the efforts undertaken in jdupes, I have gained benefits for other projects as well. For example, I can see the potential for using the string_table allocator in other projects that don’t need to free() string memory until the program exits. Most importantly, my overall experience with working on jdupes has improved my overall programming skills tremendously and I have learned a lot more than I could have imagined would come from improving such a seemingly simple file management tool.

If you’d like to use jdupes, feel free to download one of my binary releases for Linux, Windows, and Mac OS X. You can find them here.

Disable Windows Vista/7/8/8.1 Thumbnail Caches (Privacy, Performance, Paranoia, and Anti-Forensics)

By default, every version of Windows since XP creates thumbnail database files that store small versions of every picture in every folder you browse into with Windows Explorer. These files are used to speed up thumbnail views in folders, but they have some serious disadvantages:

  1. They are created automatically without ever asking you if you want to use them.
  2. Deleting an image file doesn’t necessary delete it from the thumbnail database. The only way to delete the thumbnail is to delete the database (and hope you deleted the correct one…and that it’s not stored in more than one database!)
  3. These files consume a relatively small amount of disk space.
  4. The XP-style (which is also Vista/7/8 style when browsing network shares) “Thumbs.db” and the Windows Media Center “ehthumbs_vista.db” files are marked as hidden, but if you make an archive (such as a ZIP file) or otherwise copy the folder into a container that doesn’t support hidden attributes, not only does the database increase the size of the container required, it also gets un-hidden!
  5. If you write software, it can interfere with software version control systems. They may also update the timestamp on the folder they’re in, causing some programs to think your data in the folder has changed when it really hasn’t.
  6. If you value your privacy (particularly if you handle any sort of sensitive information) these files leave information behind that can be used to compromise that privacy, especially when in the hands of anyone with even just a casual understanding of forensic analysis, be it the private investigator hired by your spouse or the authorities (police, FBI, NSA, CIA, take your pick).

To shut them off completely, you’ll need to change a few registry values that aren’t available through normal control panels (and unavailable in ANY control panels on any Windows version below a Pro, Enterprise, or Ultimate version). Fortunately, someone has already created the necessary .reg files to turn the local thumbnail caches on or off in one shot. The registry file data was posted by Brink to SevenForums. The files at that page will disable or enable this feature locally. These will also shut off (or turn on) Windows Vista and higher creating “Thumbs.db” files on all of your network drives and shares.

If you want to delete all of the “Thumbs.db” style files on a machine that has more than a couple of them, open a command prompt (Windows key + R, then type “cmd” and hit enter) and type the following commands (yes, the colon after the “a” is supposed to be followed by an empty space):

cd \

del /s /a: Thumbs.db

del /s /a: ehthumbs_vista.db

This will enter every directory on the system hard drive and delete all of the Thumbs.db files. You may see some errors while this runs, but such behavior is normal. If you have more drives that need to be cleaned, you can type the drive letter followed by a colon (such as “E:” if you have a drive with that letter assigned to it, for example) and hit enter, then repeat the above two commands to clean them.

The centralized thumbnail databases for Vista and up are harder to find. You can open the folder quickly by going to Start, copy-pasting this into the search box with CTRL+V, and hitting enter:

%LOCALAPPDATA%\Microsoft\Windows\Explorer

Close all other Explorer windows that you have open to unlock as many of the files as possible. Delete everything that you see with the word “thumb” at the beginning. Some files may not be deletable; if you really want to get rid of them, you can start a command prompt, start Task Manager, use it to kill all “explorer.exe” processes, then delete the files manually using the command prompt:

cd %LOCALAPPDATA%\Microsoft\Windows\Explorer

del thumb*

rd /s thumbcachetodelete

When you’re done, either type “explorer” in the command prompt, or in Task Manager go to File > New Task (Run)… and type “explorer”. This will restart your Explorer shell so you can continue using Windows normally.

Windows Registry FUSE Filesystem

Here’s some code which will allow you to mount Windows registry hive files as filesystems: https://github.com/jbruchon/winregfs

The README file says:

                       THE WINDOWS REGISTRY FUSE FILESYSTEM
                       ====================================

     If you have any questions, comments, or patches, send me an email:
                               jody@jodybruchon.com

One of the most difficult things to deal with in years of writing Linux
utilities to work with and repair Windows PCs is the Windows registry.
While many excellent tools exist to work with NTFS filesystems and to change
and remove passwords from user accounts, the ability to work with the
registry has always been severely lacking. Included in the excellent chntpw
package is a primitive registry editor "reged" which has largely been quite
helpful and I have been grateful for its existence, but it suffers from a
very limited interface and a complete lack of scriptability that presents a
major hurdle for anyone wanting to do more with the registry than wipe out a
password or change the "Start" flag of a system service.

Because of the serious limitations of "reged," the only practical way to do
anything registry-oriented with a shell script was to export an ENTIRE HIVE
to a .reg file, crudely parse the file for what you want, create a .reg file
from the script to import the changes, and import them. Needless to say, the
process is slow, complicated, and frustrating. I even wrote a tool called
"read_inf_section" to help my scripts parse INF/INI/REG files faster because
of this need (but also for an unrelated need to read .inf files from driver
packages.) This complexity became too excessive, so I came up with a much
better way to tweak the registry from shell scripts and programs.

Thus, the Windows Registry FUSE Filesystem "winregfs" was born. chntpw
( http://pogostick.net/~pnh/ntpasswd/ ) has an excellent library for
working with Windows NT registry hive files, distributed under the LGPL.
winregfs is essentially a glue layer between ntreg.c and FUSE, translating
Windows registry keys and values into ordinary directories and files.

winregfs features case-insensitivity and forward-slash escaping. A few keys
and value names in the Windows registry such as MIME types contain forward
slash characters; winregfs substitutes "_SLASH_" where a forward slash appears
in names.

To use winregfs, make a directory to mount on and point it to the registry
hive of interest:

---
$ mkdir reg
$ mount.winregfs /mnt/sdc2/Windows/System32/config/software reg
---

Now, you can see everything in that hive under "reg":

---
$ ls reg
7-Zip/                  Google/              Policies/
AVAST Software/         InstalledOptions/    Program Groups/
Adobe/                  Intel/               RegisteredApplications/
Analog Devices/         LibreOffice/         S3/
C07ft5Y/                Macromedia/          Schlumberger/
Classes/                Microsoft/           Secure/
Clients/                Mozilla/             Sigmatel/
Diskeeper Corporation/  MozillaPlugins/      The Document Foundation/
GNU/                    NVIDIA Corporation/  Windows 3.1 Migration Status/
Gabest/                 ODBC/                mozilla.org/
Gemplus/                Piriform/
---

Let's say you want to see some things that automatically run during startup.

---
$ ls -l reg/Microsoft/Windows/CurrentVersion/Run
total 0
-r--r--r-- 1 root root 118 Dec 31  1969 Adobe ARM.sz
-r--r--r-- 1 root root 124 Dec 31  1969 DiskeeperSystray.sz
-r--r--r-- 1 root root  60 Dec 31  1969 HotKeysCmds.sz
-r--r--r-- 1 root root  66 Dec 31  1969 IgfxTray.sz
-r--r--r-- 1 root root  70 Dec 31  1969 KernelFaultCheck.esz
-r--r--r-- 1 root root  66 Dec 31  1969 Persistence.sz
-r--r--r-- 1 root root 100 Dec 31  1969 SoundMAXPnP.sz
-r--r--r-- 1 root root 118 Dec 31  1969 avast.sz
---

You want to see what these values contain.

---
$ for X in reg/Microsoft/Windows/CurrentVersion/Run/*
> do echo -en "$X\n   "; cat "$X"; echo; done
reg/Microsoft/Windows/CurrentVersion/Run/Adobe ARM.sz
   "C:\Program Files\Common Files\Adobe\ARM\1.0\AdobeARM.exe"

reg/Microsoft/Windows/CurrentVersion/Run/DiskeeperSystray.sz
   "C:\Program Files\Diskeeper Corporation\Diskeeper\DkIcon.exe"

reg/Microsoft/Windows/CurrentVersion/Run/HotKeysCmds.sz
   C:\WINDOWS\system32\hkcmd.exe

reg/Microsoft/Windows/CurrentVersion/Run/IgfxTray.sz
   C:\WINDOWS\system32\igfxtray.exe

reg/Microsoft/Windows/CurrentVersion/Run/KernelFaultCheck.esz
   %systemroot%\system32\dumprep 0 -k

reg/Microsoft/Windows/CurrentVersion/Run/Persistence.sz
   C:\WINDOWS\system32\igfxpers.exe

reg/Microsoft/Windows/CurrentVersion/Run/SoundMAXPnP.sz
   C:\Program Files\Analog Devices\Core\smax4pnp.exe

reg/Microsoft/Windows/CurrentVersion/Run/avast.sz
   "C:\Program Files\AVAST Software\Avast\avastUI.exe" /nogui
---

Has anything hijacked the Windows "shell" value that runs explorer.exe?

---
$ cat reg/Microsoft/Windows\ NT/CurrentVersion/Winlogon/Shell.sz
Explorer.exe
---

How about the userinit.exe value?

---
$ cat reg/Microsoft/Windows\ NT/CurrentVersion/Winlogon/Userinit.sz
C:\WINDOWS\system32\userinit.exe,
---

Perhaps check if some system policies are set (note that REG_DWORD will
probably change in a future release to text files instead of raw data):

---
$ hexdump -C \
> reg/Policies/Microsoft/Windows/System/Allow-LogonScript-NetbiosDisabled.dw
00000000  01 00 00 00                                       |....|
00000004
---

You can probably figure out what to do with it from here. ;-)

My take on Windows 8, Metro, touchscreens, and other desktop disasters

Windows 8 has this shiny new user interface that’s known as “Metro.” I hate Metro. LOTS of people hate Metro. Metro is supposed to be easier for touchscreen usage, but Windows is a desktop operating system. I don’t want to re-hash everything that other people have written about why Metro is garbage, so I’ll just drop a few points to get my ideas across.

  • Metro is designed specifically with touchscreens in mind. Some all-in-one desktop computers are now touch-capable, and Windows 8 is supposed to become available for ARM architectures so that Windows 8 can be used on new tablets. However, there are two major problems: MOST desktop and laptop computers DO NOT HAVE TOUCH CAPABILITY AT ALL (that’s the vast majority of what it runs on) and TOUCH IS NOT PRACTICAL FOR DESKTOP USE.
  • Touchscreens require holding your arm up to manipulate what we traditionally would use a mouse and pointer to work with. That’s fine for a minute, but if you think your arm is NOT going to get tired ten minutes into touchscreen-centric hell, you’re fooling yourself.
  • Have you seen the Explorer windows? They brought that awful, terrible “ribbon UI” from Office 2010 into Windows 8. Not only is it annoying as hell to use, it’s counterintuitive: with monitors trending towards widescreen displays, vertical screen space is in much shorter supply, while ironically still being the most needed type of screen space for office applications and for seeing more files at once in Explorer’s “details” file view. Yet somehow, Microsoft’s logic is to replace one toolbar with something that’s three toolbars in height. Way to go, you idiots. (If a ribbon popped out of the left or right, it’d make more sense, but ribbon organization is actually less efficient than toolbars, in my opinion.)
  • Start button in desktop mode: GONE. WHY?! The Start button paradigm was revolutionary. There’s a reason that it’s persisted since the introduction of Windows 95, and is often imitated in many Linux desktops: it gets the job done, and does so pretty well, as long as you didn’t have 100 folders inside it (and Vista fixed that with the introduction of a scrolling Start menu programs list that ACTUALLY HAS A FREAKING SCROLL BAR…what took so long to come up with that?!)

If I was to advocate for a radical UI change, I’d want to see something more like Fluxbox on Linux systems. I can right-click anywhere on the “desktop” to get a program menu, with no Start button required. If I use a Fluxbox theme with rounded top corners on the windows, I can launch my mouse to the upper-left or upper-right corners of the screen (two of the most prominent “hotspots” as any skilled UI designer will tell you) and right-click to get said menu as well. Right-clicking on the title bar brings up all of the window management functions I could ever need. Fluxbox isn’t the prettiest thing in the world, and it’s a little weird to someone who is used to choosing between “Start menu” and “Mac dock” ways of working with programs, but being able to call up a Start menu of sorts without even needing the button in the first place isn’t hard to get used to, and is much faster than having to aim for a button.

Honestly, I’ve gotten spoiled by Fluxbox and Linux. I can’t believe how fast a huge application like Firefox starts up under Fluxbox. Ubuntu and other distributions with heavy full-blown desktop environments are on par with Windows, but with a minimalist one like Fluxbox, the world just seems so  much faster, even with an unaccelerated VESA video driver.

I digressed a bit, but the moral of the story is this: simple is beautiful, fast, and functional. All this metro/ribbon/touchy crap wastes screen space, slows things down, and frustrates users. I knew things were going sour when Windows had keyboard shortcut accelerator underlining disabled by default, but I didn’t know we would end up with this Metro disaster. I’m making a call out to everyone to advocate for a simpler desktop that doesn’t need to change for the sake of change because it’s functionally sound and easy to work with, without the eye candy and bells and whistles and massive tool ribbons.

Time ensures that things rarely remain the same.

At Tritech, many things have changed since even just one month ago. Here’s a spiffy list of such things. By the way, my new favorite word is “terse.” The magic of the word “terse” is that practically all of its synonyms not as terse as “terse.” It’s a self-fulfilling definition! ^_^ So, what’s been going on during my silence, you ask? Read on!

Sylvania G’s VIA C7-M versus Windows XP

I changed my Sylvania G (original, non-Meso) netbook to Windows XP/Linux dual-boot to test some software I’m working on, and discovered that while Windows XP certainly does boot and run in general on the G, some kind of system timer or timing loop is severely out of whack! I wanted to use my little G as a portable gaming machine from the Windows XP install, and to my horror, ZSNES couldn’t decide what speed it wanted to run! Now, I’ve never had a single issue with ZSNES on any computer I’ve ever tried it on, even preferring the Windows port of it over the Linux native one, and not once has a problem existed with ZSNES that I couldn’t find an easy fix for, until now. I’ve been researching the matter and gathering evidence, and I may have a potential answer to the problem.