Jump to menu and information about this site.

Wednesday·21·November·2012

zutils: zcat and friends on Steroids //at 01:18 //by abe

from the DWIM-again dept.

I recently wrote about tools to handle archives conveniently. If you just have to handle compressed text files, there are some widely known shortcut commands to mimic common commands on files compressed with a specific compression format.

  gzip bzip2 lzma xz
cat zcat bzcat lzcat xzcat
cmp zcmp bzcmp lzcmp xzcmp
diff zdiff bzdiff lzdiff xzdiff
grep zgrep bzgrep lzgrep xzgrep
egrep zegrep bzegrep lzegrep xzegrep
fgrep zfgrep bzfgrep lzfgrep xzfgrep
more zmore bzmore lzmore xzmore
less zless bzless lzless xzless

In Debian and derivatives, those tools are part of the according package for that compression utility, i.e. the zcat command is part of the gzip package and the xzfgrep command is part of the xz-utils package.

But despite this matrix is quite easy to remember, the situation has a few drawbacks:

  • Those tools can only handle the format they’re written for (which btw. means that all xz-tools can also handle lzma-compressed files as lzma is xz’s predecessor)
  • zcat and the other cat variants can’t even recognize non-compressed files and throw an error instead of just showing their contents.
  • I always tend to think that lzcat and friends are for lzip-based compression as xzcat can handle lzma-compressed files anyway.

This is where the zutils project comes in: zutils provides the functionality of most of these utilities, too, but with one big difference: You don’t have to remember, think about or type which compression method has been used for your data, just use zcat, zcmp, zdiff, zgrep, zegrep, or zfgrep and it works — independently of what compression method has been used — if any — or if there are different compression types mixed in the parameters to the same command:

$ zfgrep foobar bla.txt fnord.gz hurz.xz quux.lz bar.lzma

Especially if you use logrotate and let logrotate compress old logs, it’s very comfortable that one command suffices to concatenate all the available logfiles, including the current uncompressed one:

$ zcat /var/log/syslog* | …

Additionally, zutils’ versions of these tools also support lzip-compressed files.

The zutils package is available in Debian starting with Wheezy and in Ubuntu since Oneiric. When being installed, it replaces the original z* utilities from the gzip package by diverting them away.

The only drawback so far is that there neither a zless nor a zmore utility from the zutils project, so zless bla.txt fnord.gz hurz.xz quux.lz bar.lzma will not work as expected even after installing zutils as it is still the one from the gzip package and hence it will show you just the first two files in plain text, but not the remaining ones.

Saturday·17·November·2012

deepgrep: grep nested archives with one command //at 02:00 //by abe

from the grep-revisited dept.

Several months ago, I wrote about grep everything and listed grep-like tools which can grep through compressed files or specific data formats. The blog posting sparked several magazine articles and talks by Frank Hofmann and me.

Frank recently noticed that we though missed one more or less mighty tool so far. We missed it, because it’s mostly unknown, undocumented and hidden behind a package name which doesn’t suggest a real recursive “grep everything”:

deepgrep

deepgrep is part of the Debian package strigi-utils, a package which contains utilities related to the KDE desktop search Strigi.

deepgrep especially eases the searching through tar balls, even nested ones, but can also search through zip files and OpenOffice.org/LibreOffice documents (which are actually zip files).

deepgrep seems to support at least the following archive and compression formats:

  • tar
  • ar, and hence deb
  • rpm (but not cpio)
  • gzip/gz
  • bzip2/bz2
  • zip, and hence jar/war and OpenOffice.org/LibreOffice documents
  • MIME messages (i.e. files attached to e-mails)

A search in an archive which is deeply nested looks like this:

$ deepgrep bar foo.ar
foo.ar/foo.tar/foo.tar.gz/foo.zip/foo.tar.bz2/foo.txt.gz/foo.txt:foobar
foo.ar/foo.tar/foo.tar.gz/foo.zip/foo.tar.bz2/foo.txt.gz/foo.txt:bar

deepgrep though neither seems to support any LZMA based compression (lzma, xz, lzip, 7z), nor does it support lzop, rzip, compress (.Z suffix), cab, cpio, xar, or rar.

Further current drawbacks of deepgrep:

  • Nearly no commandline options, especially none of the common grep options
  • No man-page or other documentation
  • Exit code not related to search results, you have to check the output to see if something has been found

deepfind

If you just need the file names of the files in nested archives, the package also contains the tool deepfind which does nothing else than to list all files and directories in a given set of archives or directories:

$ deepfind foo.ar
foo.ar
foo.ar/foo.tar
foo.ar/foo.tar/foo.tar.gz
foo.ar/foo.tar/foo.tar.gz/foo.zip
foo.ar/foo.tar/foo.tar.gz/foo.zip/foo.tar.bz2
foo.ar/foo.tar/foo.tar.gz/foo.zip/foo.tar.bz2/foo.txt.gz
foo.ar/foo.tar/foo.tar.gz/foo.zip/foo.tar.bz2/foo.txt.gz/foo.txt

As with deepgrep, deepfind does not implement any common options of it’s normal sister tool find.

[The following part has been added on 17-Nov-2012]

As with deepgrep, it also doesn’t seem to support any of the more modern or more exotic compression formats, i.e. it fails on modern debian binary packages which use xz compression on the data part:

deepfind xulrunner-18.0_18.0\~a2+20121109042012-1_amd64.deb
xulrunner-18.0_18.0~a2+20121109042012-1_amd64.deb
xulrunner-18.0_18.0~a2+20121109042012-1_amd64.deb/debian-binary
xulrunner-18.0_18.0~a2+20121109042012-1_amd64.deb/control.tar.gz
xulrunner-18.0_18.0~a2+20121109042012-1_amd64.deb/control.tar.gz/triggers
xulrunner-18.0_18.0~a2+20121109042012-1_amd64.deb/control.tar.gz/preinst
xulrunner-18.0_18.0~a2+20121109042012-1_amd64.deb/control.tar.gz/md5sums
xulrunner-18.0_18.0~a2+20121109042012-1_amd64.deb/control.tar.gz/postinst
xulrunner-18.0_18.0~a2+20121109042012-1_amd64.deb/control.tar.gz/control
xulrunner-18.0_18.0~a2+20121109042012-1_amd64.deb/data.tar.xz

[End of part added at 17-Nov-2012]

Dependencies

The package strigi-utils doesn’t pull in the complete Strigi framework (i.e. no daemon), just a few libraries (libstreams, libstreamanalyzer, and libclucene). On Wheezy it also pulls in some audio/video decoding libraries which may make some server administrators less happy.

Conclusion

Both tools are quite limited to some basic use cases, but can be worth a fortune if you have to work with nested archives. Nevertheless the claim in the Debian package description of strigi-utils that they’re “enhanced” versions of their well known counterparts is IMHO disproportionate.

Most of the missing features and documentation can be explained by the primary purpose of these tools: Being backend for desktop searches. I guess, there wasn’t much need for proper commandline usage yet. Until now. ;-)

42.zip

And yes, I was curious enough to let deepfind have a look at 42.zip (the one from SecurityFocus, unzip seems not able to unpack 42.zip from unforgettable.dk due a missing version compatibility) and since it just traverses the archive sequentially, it has no problem with that, needing just about 5 MB of RAM and a lot of time:

[…]
42.zip/lib f.zip/book f.zip/chapter f.zip/doc f.zip/page e.zip
42.zip/lib f.zip/book f.zip/chapter f.zip/doc f.zip/page e.zip/0.dll
42.zip/lib f.zip/book f.zip/chapter f.zip/doc f.zip/page f.zip
42.zip/lib f.zip/book f.zip/chapter f.zip/doc f.zip/page f.zip/0.dll
deepfind 42.zip  11644.12s user 303.89s system 97% cpu 3:24:02.46 total

I though won’t try deepgrep on 42.zip. ;-)

Friday·16·November·2012

Useful but Unknown Unix Tools: dwdiff better than wdiff + colordiff //at 01:18 //by abe

from the colordiff-revisited dept.

A year ago I wrote in Useful but Unknown Unix Tools: How wdiff and colordiff help to choose the right Swiss Army Knife about using wdiff and colordiff together. Colordiff’ed wdiff output looks like this:

$ wdiff foobar.txt barfoo.txt | colordiff
[-foo-]bar fnord
gnarz hurz quux
bla {+foo+} fasel

But if you have colour, why still having these hard to read wdiff markers still in the text?

There exists a tool named dwdiff which can do word diffs in colour without textual markers and with even less to type (and without being git diff --color-words ;-). Actually it looks like git diff --color-words, just without the git:

$ dwdiff -c foobar.txt barfoo.txt
foo bar fnord
gnarz hurz quux
bla foo fasel

Another cool thing about dwdiff (and its name giving feature) is that you can defined what you consider whitespace, i.e. which character(s) delimit the words. So lets do the example above again, but this time declare that “f” is considered the only whitespace character:

$ dwdiff -W f -c foobar.txt barfoo.txt
foo bar bar fnord
gnarz hurz quux
bla foo fasel

dwdiff can also show line numbers:

$ dwdiff -c -L foobar.txt barfoo.txt
   1:1    foo bar fnord
   2:2    gnarz hurz quux
   3:3    bla foo fasel
$ dwdiff -c -L foobar.txt quux.txt
   1:1    foo bar fnord
   1:2    foobar floedeldoe
   2:3    gnarz hurz quux
   3:4    bla foo fasel

(coloured shell screenshots by aha)

Thursday·15·November·2012

Tools to handle archives conveniently //at 01:42 //by abe

from the DWIM dept.

TL;DR: There’s a summary at the end of the article.

Today I wanted to see why a dependency in a .deb-package from an external APT repository changed so that it became uninstallable. While dpkg-deb --info foobar.deb easily shows the control information, the changelog is in the filesystem part of the package.

I could extract that one dpkg-deb, too, but I’d have to extract either to some temporary directory or pipe it into tar which then can extract a single file from the archive and sent it to STDOUT:

dpkg-deb --fsys-tarfile foobar.deb | tar xOf - ./usr/share/doc/foobar/changelog.Debian.gz | zless

But that’s tedious to type. The following command is clearly less to type and way easier to remember:

acat foobar.deb ./usr/share/doc/foobar/changelog.Debian.gz | zless

acat stands for “archive cat” is part of the atool suite of commands:

als
lists files in an archive.
$ als foobar.tgz
drwxr-xr-x abe/abe           0 2012-11-15 00:19 foobar/
-rw-r--r-- abe/abe          13 2012-11-15 00:20 foobar/bar
-rw-r--r-- abe/abe          13 2012-11-15 00:20 foobar/foo
acat
extracts files in an archive to standard out.
$ acat foobar.tgz foobar/foo foobar/bar
foobar/bar
bar contents
foobar/foo
foo contents
adiff
generates a diff between two archives using diff(1).
$ als quux.zip
Archive:  quux.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
        0  2012-11-15 00:23   quux/
       16  2012-11-15 00:22   quux/foo
       13  2012-11-15 00:20   quux/bar
---------                     -------
       29                     3 files
$ adiff foobar.tgz quux.zip
diff -ru Unpack-3594/foobar/foo Unpack-7862/quux/foo
--- Unpack-3594/foobar/foo      2012-11-15 00:20:46.000000000 +0100
+++ Unpack-7862/quux/foo        2012-11-15 00:22:56.000000000 +0100
@@ -1 +1 @@
-foo contents
+foobar contents
arepack
repacks archives to a different format. It does this by first extracting all files of the old archive into a temporary directory, then packing all files extracted to that directory to the new archive. Use the --each (-e) option in combination with --format (-F) to repack multiple archives using a single invocation of atool. Note that arepack will not remove the old archive.
$ arepack foobar.tgz foobar.txz
foobar.tgz: extracted to `Unpack-7121/foobar'
foobar.txz: grew 36 bytes
apack
creates archives (or compresses files). If no file arguments are specified, filenames to add are read from standard in.
aunpack
extracts files from an archive. Often one wants to extract all files in an archive to a single subdirectory. However, some archives contain multiple files in their root directories. The aunpack program overcomes this problem by first extracting files to a unique (temporary) directory, and then moving its contents back if possible. This also prevents local files from being overwritten by mistake.

(atool subcommand descriptions from the atool man page which is licensed under GPLv3+. Examples by me.)

I though miss the existence of an agrep subcommand. Guess why?

atool supports a wealth of archive types: tar (gzip-, bzip-, bzip2-, compress-/Z-, lzip-, lzop-, xz-, and 7zip-compressed), zip, jar/war, rar, lha/lzh, 7zip, alzip/alz, ace, ar, arj, arc, rpm, deb, cab, gzip, bzip, bzip2, compress/Z, lzip, lzop, xz, rzip, lrzip and cpio. (Not all subcommands support all archive types.)

Similar Utilities

There are some utilities which cover parts of what atool does, too:

Tools from the mtools package

Yes, they come from the “handle MS-DOS floppy disks tool” package, don’t ask me why. :-)

uz
gunzips and extracts a gzip‘d tar‘d archives
Advantage over aunpack: Less to type. :-)
Disadvantage compared to aunpack: Supports only one archive format.
lz
gunzips and shows a listing of a gzip‘d tar‘d archive
Advantage over als: One character less to type. :-)
Disadvantage compared to als: Supports only one archive format.

unp

unp extracts one or more files given as arguments on the command line.

$ unp -s
Known archive formats and tools:
7z:           p7zip or p7zip-full
ace:          unace
ar,deb:       binutils
arj:          arj
bz2:          bzip2
cab:          cabextract
chm:          libchm-bin or archmage
cpio,afio:    cpio or afio
dat:          tnef
dms:          xdms
exe:          maybe orange or unzip or unrar or unarj or lha 
gz:           gzip
hqx:          macutils
lha,lzh:      lha
lz:           lzip
lzma:         xz-utils or lzma
lzo:          lzop
lzx:          unlzx
mbox:         formail and mpack
pmd:          ppmd
rar:          rar or unrar or unrar-free
rpm:          rpm2cpio and cpio
sea,sea.bin:  macutils
shar:         sharutils
tar:          tar
tar.bz2,tbz2: tar with bzip2
tar.lzip:     tar with lzip
tar.lzop,tzo: tar with lzop
tar.xz,txz:   tar with xz-utils
tar.z:        tar with compress
tgz,tar.gz:   tar with gzip
uu:           sharutils
xz:           xz-utils
zip,cbz,cbr,jar,war,ear,xpi,adf: unzip
zoo:          zoo

So it’s very similar to aunpack, just a shorter command and it supports some more exotic archive formats which atool doesn’t support.

Also part of the unp package is ucat which does more or less the same as acat, just with unp as backend.

dtrx

From the man page of dtrx:

In addition to providing one command to extract many different archive types, dtrx also aids the user by extracting contents consistently. By default, everything will be written to a dedicated directory that’s named after the archive. dtrx will also change the permissions to ensure that the owner can read and write all those files.

Supported archive formats: tar, zip (including self-extracting .exe files), cpio, rpm, deb, gem, 7z, cab, rar, and InstallShield. It can also decompress files compressed with gzip, bzip2, lzma, or compress.

dtrx -l lists the contents of an archive, i.e. works like als or lz.

dtrx has two features not present in the other tools mentioned so far:

  • It can extract metadata instead of the normal contents from .deb and .gem files.
  • It can extract archives recursively, i.e. can extract archives inside of archives.

Unfortunately you can’t mix those two features. But you can use the following tool for that purpose:

deepfind

deepfind is a command from the package strigi-utils and recursively lists files in archives, including archives in archives. I’ve already written a detailed blog-posting about deepfind and its friend deepgrep.

tardiff

tardiff was written to check what changed in source code tarballs from one release to another. By default it just lists the differences in the file lists, not in the files’ contents and hence works different than adiff.

Summary

atool and friends are probably the first choice when it comes to DWIM archive handling, also because they have an easy to remember subcommand scheme.

uz and lz and the shortest way to extract or list the contents of a .tar.gz file. But nothing more. And you have to install mtools even if you don’t have a floppy drive.

unp comes in handy for exotic archive formats atool doesn’t support. And it’s way easier to remember and type than aunpack.

dtrx is neat if you want to extract archives in archives or if you want to extract metadata from some package files with just a few keystrokes.

For listing all files in recursive archives, use deepfind.

Thursday·30·August·2012

Finding similar but not identical files //at 17:10 //by abe

from the whitespace-change dept.

There are quite some tools to find duplicate files in Debian (Ua is not even packaged for Debian!!!1!eleven! SCNRvia Chrütertee) and depending on the task I use either hardlink (see this blog posting), fdupes (if I need output with all identical files on one line; see example below), or duff (if it has to be performant).

But for code deduplication in historically grown code you sometimes need a tool which does not only find identical files, but also those which just differ in a few blanks or blank lines.

I found two tools in Debian which can give you some kind of percentage of similarity: simhash (which is btw. orphaned; upstream homepage) and similarity-tester (upstream homepage).

simhash has the shorter name and hecne sounds more usable on the command-line. But it seems only be able to compare two files at once and also only after first computing and writing down its similarity hash to a file. Not really usable for those one-liner cases on the command-line.

similarity-tester has the longer name (and one which made me suspect that it may be a GUI tool), but provides what I was looking for:

$ find . -type f | sim_text -ipTt 75

This lists all files in the current directory which have at 75% (“-t 75”) in common with another file in the list of files. The option “-i” causes sim_text to read the files to compare from standard input; “-p” causes sim_text to just output the similarity percentage; and “-T” suppresses the per-file list of found tokens.

I used similarity-tester’s “sim_text” tool to compare natural langauge as most of the files, I had to test, are shell scripts. But similarity-tester also provides tools to test the similarity of code in specific programming languages, namely C, Java, Pascal, Modula-2, Lisp and Miranda.

Example output from the xen-tools project (after I already did a lot of code deduplication):

./intrepid/30-disable-gettys consists for 100 % of ./edgy/30-disable-gettys material
./edgy/30-disable-gettys consists for 100 % of ./intrepid/30-disable-gettys material
./common/90-make-fstab-rpm consists for 98 % of ./centos-5/90-make-fstab material
./centos-5/90-make-fstab consists for 98 % of ./common/90-make-fstab-rpm material
./gentoo/55-create-dev consists for 91 % of ./dapper/55-create-dev material
./dapper/55-create-dev consists for 90 % of ./gentoo/55-create-dev material
./gentoo/55-create-dev consists for 88 % of ./common/55-create-dev material
./common/90-make-fstab-deb consists for 87 % of ./common/90-make-fstab-rpm material
./common/90-make-fstab-rpm consists for 85 % of ./common/90-make-fstab-deb material
./common/30-disable-gettys consists for 81 % of ./karmic/30-disable-gettys material
./intrepid/80-install-kernel consists for 78 % of ./edgy/80-install-kernel material
./edgy/30-disable-gettys consists for 76 % of ./karmic/30-disable-gettys material
./karmic/30-disable-gettys consists for 76 % of ./edgy/30-disable-gettys material
./common/50-setup-hostname-rpm consists for 76 % of ./gentoo/50-setup-hostname material

Depending on the length of possible filenames and amount of files this can be made more readable using the column utility from the bsdmainutils package and reversed by using tac from the coreutils package:

$ find . -type f | sim_text -ipTt 75 | tac | column -t
./common/50-setup-hostname-rpm  consists  for  76   %  of  ./gentoo/50-setup-hostname    material
./karmic/30-disable-gettys      consists  for  76   %  of  ./edgy/30-disable-gettys      material
./edgy/30-disable-gettys        consists  for  76   %  of  ./karmic/30-disable-gettys    material
./intrepid/80-install-kernel    consists  for  78   %  of  ./edgy/80-install-kernel      material
./common/30-disable-gettys      consists  for  81   %  of  ./karmic/30-disable-gettys    material
./common/90-make-fstab-rpm      consists  for  85   %  of  ./common/90-make-fstab-deb    material
./common/90-make-fstab-deb      consists  for  87   %  of  ./common/90-make-fstab-rpm    material
./gentoo/55-create-dev          consists  for  88   %  of  ./common/55-create-dev        material
./dapper/55-create-dev          consists  for  90   %  of  ./gentoo/55-create-dev        material
./gentoo/55-create-dev          consists  for  91   %  of  ./dapper/55-create-dev        material
./centos-5/90-make-fstab        consists  for  98   %  of  ./common/90-make-fstab-rpm    material
./common/90-make-fstab-rpm      consists  for  98   %  of  ./centos-5/90-make-fstab      material
./edgy/30-disable-gettys        consists  for  100  %  of  ./intrepid/30-disable-gettys  material
./intrepid/30-disable-gettys    consists  for  100  %  of  ./edgy/30-disable-gettys      material

Compared to that, fdupes only finds the two 100% identical files:

$ fdupes -r1 . 
./intrepid/30-disable-gettys ./edgy/30-disable-gettys 

But fdupes helped me already a lot to find the first bunch of identical files in xen-tools. :-)

Wednesday·14·March·2012

SSH Multiplexer: parallel-ssh //at 03:10 //by abe

from the one-long-line-but-one-line dept.

There are many SSH multiplexers in Debian and most of them have one or two features which make them unique and especially useful for that one use case. I use some of them regularily (I even maintain the Debian package of one of them, namely pconsole :-) and I’ll present then and when one of them here.

For non-interactive purposes I really like parallel-ssh aka pssh. It takes a file of hostnames and a bunch of common ssh parameters as parameters, executes the given command in parallel in up to 32 threads (by default, adjustable with -p) and waits by default for 60 seconds (adjustable with -t). For example to restart hobbit-client on all hosts in kiva.txt, the following command is suitable:

$ parallel-ssh -h kiva.txt -l root /etc/init.d/hobbit-client restart
[1] 19:56:03 [FAILURE] kiva6 Exited with error code 127
[2] 19:56:04 [SUCCESS] kiva
[3] 19:56:04 [SUCCESS] kiva4
[4] 19:56:04 [SUCCESS] kiva2
[5] 19:56:04 [SUCCESS] kiva5
[6] 19:56:04 [SUCCESS] kiva3
[7] 19:57:03 [FAILURE] kiva1 Timed out, Killed by signal 9

(Coloured “Screenshots” done with ANSI HTML Adapter from the package aha.)

You easily see on which hosts the command failed and partially also why: On kiva6 hobbit-client is not installed and therefore the init.d script is not present. kiva1 is currently offline so the ssh connection timed out.

If you want to see the output of the commands, you have a two choices. Which one to choose depends on the expected amount of output:

If you don’t expect a lot of output, the -i (or --inline) option for inline aggregated output is probably the right choice:

$ parallel-ssh -h kiva.txt -l root -t 10 -i uptime
[1] 20:30:20 [SUCCESS] kiva
 20:30:20 up 7 days,  5:51,  0 users,  load average: 0.12, 0.08, 0.06
[2] 20:30:20 [SUCCESS] kiva2
 20:30:20 up 7 days,  5:50,  0 users,  load average: 0.19, 0.08, 0.02
[3] 20:30:20 [SUCCESS] kiva3
 20:30:20 up 7 days,  5:49,  0 users,  load average: 0.10, 0.06, 0.06
[4] 20:30:20 [SUCCESS] kiva4
 20:30:20 up 7 days,  5:49,  0 users,  load average: 0.25, 0.17, 0.14
[5] 20:30:20 [SUCCESS] kiva6
 20:30:20 up 7 days,  5:49, 10 users,  load average: 0.16, 0.08, 0.02
[6] 20:30:21 [SUCCESS] kiva5
 20:30:21 up 7 days,  5:49,  0 users,  load average: 3.11, 3.36, 3.06
[7] 20:30:29 [FAILURE] kiva1 Timed out, Killed by signal 9

If you expect a lot of output you can give directories with the -o (or --outdir) and -e (or --errdir) option:

$ parallel-ssh -h kiva.txt -l root -t 20 -o kiva-output lsb_release -a
[1] 20:36:51 [SUCCESS] kiva
[2] 20:36:51 [SUCCESS] kiva2
[3] 20:36:51 [SUCCESS] kiva3
[4] 20:36:51 [SUCCESS] kiva4
[5] 20:36:53 [SUCCESS] kiva6
[6] 20:36:54 [SUCCESS] kiva5
[7] 20:37:10 [FAILURE] kiva1 Timed out, Killed by signal 9
$ ls -l kiva-output
total 24
-rw-r--r-- 1 abe abe  98 Aug 28 20:36 kiva
-rw-r--r-- 1 abe abe   0 Aug 28 20:36 kiva1
-rw-r--r-- 1 abe abe  98 Aug 28 20:36 kiva2
-rw-r--r-- 1 abe abe  98 Aug 28 20:36 kiva3
-rw-r--r-- 1 abe abe  98 Aug 28 20:36 kiva4
-rw-r--r-- 1 abe abe 102 Aug 28 20:36 kiva5
-rw-r--r-- 1 abe abe 100 Aug 28 20:36 kiva6
$ cat kiva-output/kiva5
Distributor ID:	Debian
Description:	Debian GNU/Linux 6.0.2 (squeeze)
Release:	6.0.2
Codename:	squeeze

The only annoying thing IMHO is that the host list needs to be in a file. With zsh, bash and the original ksh (but neither tcsh, pdksh nor mksh), you can circumvent this restriction with one of the following command lines:

$ parallel-ssh -h <(printf "host1\nhost2\nhost3\n…") -l root uptime
[…]
$ parallel-ssh -h <(echo host1 host2 host3 … | xargs -n1) -l root uptime
[…]

And in zsh there’s an even easier way to type this:

$ parallel-ssh -h <(print -l host1 host2 host3 …) -l root uptime
[…]

In addition to parallel-ssh the pssh package also contains some more ssh based tools:

  • parallel-scp and parallel-rsync for parallel copying files onto a set of hosts.
  • parallel-slurp for fetching files in parallel from a list of hosts.
  • parallel-nuke to kill a bunch of processes in parallel on a set of machines.

I though think that parallel-ssh is by far the most useful tool from the pssh package. (Probably no wonder as it’s the most generic one. :-)

Thursday·01·September·2011

Useful but Unknown Unix Tools: How wdiff and colordiff help to choose the right Swiss Army Knife //at 12:18 //by abe

from the colorful-diffs dept.

In light of the fact that it seems possible to fit the plastic caps of a Debian branded Swiss Army Knife (Last orders today!) on an existing Swiss Army Knife (German written howto as PDF), I started to think about which Victorinox Cybertool would be the best fitting for me.

And because the Victorinox comparison page doesn’t really show diffs, just columns with floating text which are not very helpful for generating diffs in your head, I used command line tools for that purpose:

wdiff

Because the floating texts are not line- but just whitespace-based, the tool of choice is not diff but wdiff, a word-based diff. It encloses additions and removals in {+…+} and [-…-] blocks. (No, those aren’t Japanese smileys although they look a lot like some. ^^).

The easiest and clearest way is to copy and paste the texts from Victorinox’ comparison page into some text files and compare them with wdiff:

$ wdiff cybertool34.txt cybertool41.txt
{+Schraubendreher 2.5mm,+} Pinzette, Nähahle mit Nadelöhr, {+Holzsäge,+} Bit-Schlüssel( 5 mm Innensechskant für die D-SUB Steckverbinder, 4 mm Innensechskant für Bits, Bit Phillips 0, Bit Phillips 1, Bit-Schlitzschrauben 4 mm, Bit Phillips 2, Bit Hex 4 mm, Bit Torx 8, Bit Torx 10, Bit Torx 15 ), Kombizange( Hülsenpresser, Drahtschneider ), Stech-Bohrahle, Kugelschreiber( auch zum DIP-Switch verstellen ), Mehrzweckhaken (Paketträger), {+Metallsäge( Metallfeile, Nagelfeile, Nagelreiniger ),+} Dosenöffner( kleiner Schraubendreher ), Kleine Klinge, Grosse Klinge, Ring, inox, Mini-Schraubendreher, Kapselheber( Schraubendreher, Drahtabisolierer ), {+Holzmeissel / Schaber,+} Bit-Halter, Stecknadel, inox, Schere, Korkenzieher, Zahnstocher

So this already extracted the information which are the seven tools which are in the Cybertool 41, but not in the Cybertool 34. Nevertheless the diff is still not easily recognizable on the first glance. There are several ways to help here.

First wdiff has an option --no-common (the according short option is -3) which just shows added and removed words:

$ wdiff -3 cybertool34.txt cybertool41.txt
======================================================================
{+Schraubendreher 2.5mm,+}
======================================================================
 {+Holzsäge,+}
======================================================================
 {+Metallsäge( Metallfeile, Nagelfeile, Nagelreiniger ),+}
======================================================================
 {+Holzmeissel / Schaber,+}
======================================================================

This is already way better to quickly recognize the actual differences.

But if you still also want to see the common tools of the two knifes you need some visual help:

One option is to use wdiff’s --terminal (or short -t) option. Added words are then displayed inverse and removed words are shown underlined (background and foreground colors hardcoded as there is no “invert colors” style in CSS or HTML):

$ wdiff -t cybertool34.txt cybertool41.txt
Schraubendreher 2.5mm, Pinzette, Nähahle mit Nadelöhr, Holzsäge, Bit-Schlüssel( 5 mm Innensechskant für die D-SUB Steckverbinder, 4 mm Innensechskant für Bits, Bit Phillips 0, Bit Phillips 1, Bit-Schlitzschrauben 4 mm, Bit Phillips 2, Bit Hex 4 mm, Bit Torx 8, Bit Torx 10, Bit Torx 15 ), Kombizange( Hülsenpresser, Drahtschneider ), Stech-Bohrahle, Kugelschreiber( auch zum DIP-Switch verstellen ), Mehrzweckhaken (Paketträger), Metallsäge( Metallfeile, Nagelfeile, Nagelreiniger ), Dosenöffner( kleiner Schraubendreher ), Kleine Klinge, Druckkugelschreiber, Grosse Klinge, Ring, inox, Mini-Schraubendreher, Kapselheber( Schraubendreher, Drahtabisolierer ), Holzmeissel / Schaber, Bit-Halter, Stecknadel, inox, Schere, Korkenzieher, Zahnstocher

But some still like to to use color instead of the contrast-rich inverse and the easily to oversee underlining. This is where colordiff comes into play:

colordiff

colordiff is like syntax highlighting for diffs on the command line. I works with classic and unified diffs as well as with wdiffs and debdiffs (the debdiff command is part of the devscripts package).

$ wdiff cybertool34.txt cybertool41.txt | colordiff
{+Schraubendreher 2.5mm,+} Pinzette, Nähahle mit Nadelöhr, {+Holzsäge,+} Bit-Schlüssel( 5 mm Innensechskant für die D-SUB Steckverbinder, 4 mm Innensechskant für Bits, Bit Phillips 0, Bit Phillips 1, Bit-Schlitzschrauben 4 mm, Bit Phillips 2, Bit Hex 4 mm, Bit Torx 8, Bit Torx 10, Bit Torx 15 ), Kombizange( Hülsenpresser, Drahtschneider ), Stech-Bohrahle, Kugelschreiber( auch zum DIP-Switch verstellen ), Mehrzweckhaken (Paketträger), {+Metallsäge( Metallfeile, Nagelfeile, Nagelreiniger ),+} Dosenöffner( kleiner Schraubendreher ), Kleine Klinge, Grosse Klinge, Ring, inox, Mini-Schraubendreher, Kapselheber( Schraubendreher, Drahtabisolierer ), {+Holzmeissel / Schaber,+} Bit-Halter, Stecknadel, inox, Schere, Korkenzieher, Zahnstocher

$ wdiff cybertool29.txt cybertool41.txt | colordiff
{+Schraubendreher 2.5mm,+} Pinzette, Nähahle mit Nadelöhr, {+Holzsäge,+} Bit-Schlüssel( 5 mm Innensechskant für die D-SUB Steckverbinder, 4 mm Innensechskant für Bits, Bit Phillips 0, Bit Phillips 1, Bit-Schlitzschrauben 4 mm, Bit Phillips 2, Bit Hex 4 mm, Bit Torx 8, Bit Torx 10, Bit Torx 15 ), {+Kombizange( Hülsenpresser, Drahtschneider ),+} Stech-Bohrahle, {+Kugelschreiber( auch zum DIP-Switch verstellen ), Mehrzweckhaken (Paketträger), Metallsäge( Metallfeile, Nagelfeile, Nagelreiniger ),+} Dosenöffner( kleiner Schraubendreher ), Kleine Klinge, [-Druckkugelschreiber,-] Grosse Klinge, Ring, inox, Mini-Schraubendreher, Kapselheber( Schraubendreher, Drahtabisolierer ), {+Holzmeissel / Schaber,+} Bit-Halter, Stecknadel, inox, {+Schere,+} Korkenzieher, Zahnstocher

(Coloured “Screenshots” done with ANSI HTML Adapter from the package aha.)

Some, especially those who are used to git, are probably confused by the default choice of diff colors. This is easily fixable by writing the following into you ~/.colordiffrc:

newtext=green
oldtext=red
diffstuff=darkblue
cvsstuff=darkyellow

(See also /etc/colordiff for the defaults and hints.)

colordiff has by the way two operating modes:

  • Without parameter it reads diffs from standard input as seen above.
  • With parameters it works as drop-in diff replacement including all diff options as shown below.

So now let us compare the Cybertool 29 with Cybertool 34 in a normal diff (by using the texts from above and replacing all commata with newline characters) with git-like colors:

$ colordiff cybertool29-lines.txt cybertool34-lines.txt
12a13,14
> Kombizange( Hülsenpresser
> Drahtschneider )
13a16,17
> Kugelschreiber( auch zum DIP-Switch verstellen )
> Mehrzweckhaken (Paketträger)
16d19
< Druckkugelschreiber
25a29
> Schere

Or as unifed diff with some context:

$ colordiff -u cybertool29-lines.txt cybertool34-lines.txt
--- cybertool29-lines.txt     2011-08-31 20:55:37.195546238 +0200
+++ cybertool34-lines.txt   2011-08-31 20:55:11.667710504 +0200
@@ -10,10 +10,13 @@
 Bit Torx 8
 Bit Torx 10
 Bit Torx 15 )
+Kombizange( Hülsenpresser
+Drahtschneider )
 Stech-Bohrahle
+Kugelschreiber( auch zum DIP-Switch verstellen )
+Mehrzweckhaken (Paketträger)
 Dosenöffner( kleiner Schraubendreher )
 Kleine Klinge
-Druckkugelschreiber
 Grosse Klinge
 Ring
 inox
@@ -23,5 +26,6 @@
 Bit-Halter
 Stecknadel
 inox
+Schere
 Korkenzieher
 Zahnstocher

So if you want nicely colored diffs with Subversion like you’re used to with git, you can use svn diff | colordiff.

Tag Cloud

Current filter: »UUUT« (Click tag to exclude it or click a conjunction to switch them.)

2CV, aha, Apache, APT, aptitude, ASUS, Automobiles, autossh, Berlin, bijou, Blogging, Blosxom, Blosxom Plugin, Browser, BSD, CDU, Chemnitz, Citroën, CLI, CLT, Conkeror, CSS, CX, deb, Debian, Doofe Parteien, E-Mail, eBay, EeePC, Emacs, Epiphany, Etch, ETH Zürich, Events, Experimental, Firefox, Fläsch, FreeBSD, Freitagstexter, FVWM, Galeon, Gecko, git, GitHub, GNOME, GNU, GNU Coreutils, GNU Screen, Google, GPL, grep, grml, gzip, Hackerfunk, Hacks, Hardware, Heise, HTML, identi.ca, IRC, irssi, Jabber, JavaShit, Kazehakase, Lenny, Liferea, Linux, LinuxTag, LUGS, Lynx, maol, Meme, Microsoft, Mozilla, Music, mutt, Myon, München, nemo, Nokia, nuggets, Open Source, OpenSSH, Opera, packaging, Pentium I, Perl, Planet Debian, Planet Symlink, Quiz, Rant, ratpoison, Religion, RIP, Sarcasm, Sarge, Schweiz, screen, Shell, Sid, Spam, Squeeze, SSH, Stoeckchen, Stöckchen, SuSE, Symlink, Symlink-Artikel, Tagging, Talk, taz, Text Mode, ThinkPad, Ubuntu, USA, USB, UUUCO, UUUT, VCFe, Ventilator, Vintage, Wahlen, WAP, Wheezy, Wikipedia, Windows, WML, Woody, WTF, X, Xen, zsh, Zürich, ÖPNV

Calendar

← 2025 →
Months
SepOct Nov Dec
← September →
Mo Tu We Th Fr Sa Su
7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30          

Tattletale Statistics

Blog postings by posting time
Blog posting times this month



Search


Advanced Search


Categories


Recent Postings

13 most recent of 289 postings total shown.


Recent Comments

Hackergotchi of Axel Beckert

About...

This is the blog or weblog of Axel Stefan Beckert (aka abe or XTaran) who thought, he would never start blogging... (He also once thought, that there is no reason to switch to this new ugly Netscape thing because Mosaïc works fine. That was about 1996.) Well, times change...

He was born 1975 at Villingen-Schwenningen, made his Abitur at Schwäbisch Hall, studied Computer Science with minor Biology at University of Saarland at Saarbrücken (Germany) and now lives in Zürich (Switzerland), working at the Network Security Group (NSG) of the Central IT Services (Informatikdienste) at ETH Zurich.

Links to internal pages are orange, links to related pages are blue, links to external resources are green and links to Wikipedia articles, Internet Movie Database (IMDb) entries or similar resources are bordeaux. Times are CET respective CEST (which means GMT +0100 respective +0200).


RSS Feeds


Identity Archipelago


Picture Gallery


Button Futility

Valid XHTML Valid CSS
Valid RSS Any Browser
This content is licensed under a Creative Commons License (SA 3.0 DE). Some rights reserved. Hacker Emblem
Get Mozilla Firefox! Powered by Linux!
Typed with GNU Emacs Listed at Tux Mobil
XFN Friendly Button Maker

Blogroll

People I know personally


Other blogs I like or read


Independent News


Interesting Planets


Web comics I like and read

Stalled Web comics I liked


Blogging Software

Blosxom Plugins I use

Bedside Reading

Just read

  • Bastian Sick: Der Dativ ist dem Genitiv sein Tod (Teile 1-3)
  • Neil Gaiman and Terry Pratchett: Good Omens (borrowed from Ermel)

Currently Reading

  • Douglas R. Hofstadter: Gödel, Escher, Bach
  • Neil Gaiman: Keine Panik (borrowed from Ermel)

Yet to read

  • Neil Stephenson: Cryptonomicon (borrowed from Ermel)

Always a good snack

  • Wolfgang Stoffels: Lokomotivbau und Dampftechnik (borrowed from Ermel)
  • Beverly Cole: Trains — The Early Years (getty images)

Postponed