Saturday·17·November·2012
deepgrep: grep nested archives with one command //at 02:00 //by abe
Several months ago, I wrote about grep everything and listed grep-like tools which can grep through compressed files or specific data formats. The blog posting sparked several magazine articles and talks by Frank Hofmann and me.
Frank recently noticed that we though missed one more or less mighty tool so far. We missed it, because it’s mostly unknown, undocumented and hidden behind a package name which doesn’t suggest a real recursive “grep everything”:
deepgrep
deepgrep
is part of the Debian package strigi-utils, a package which contains utilities related to the
KDE desktop search Strigi.
deepgrep
especially eases the searching through tar
balls, even nested ones, but can also search through zip files and
OpenOffice.org/LibreOffice documents (which are actually zip files).
deepgrep
seems to support at least the following archive
and compression formats:
- tar
- ar, and hence deb
- rpm (but not cpio)
- gzip/gz
- bzip2/bz2
- zip, and hence jar/war and OpenOffice.org/LibreOffice documents
- MIME messages (i.e. files attached to e-mails)
A search in an archive which is deeply nested looks like this:
$ deepgrep bar foo.ar foo.ar/foo.tar/foo.tar.gz/foo.zip/foo.tar.bz2/foo.txt.gz/foo.txt:foobar foo.ar/foo.tar/foo.tar.gz/foo.zip/foo.tar.bz2/foo.txt.gz/foo.txt:bar
deepgrep
though neither seems to support any LZMA based
compression (lzma, xz, lzip, 7z), nor does it support lzop, rzip,
compress (.Z suffix), cab, cpio, xar, or rar.
Further current drawbacks of deepgrep
:
- Nearly no commandline options, especially none of the common grep options
- No man-page or other documentation
- Exit code not related to search results, you have to check the output to see if something has been found
deepfind
If you just need the file names of the files in nested archives, the
package also contains the tool deepfind
which does
nothing else than to list all files and directories in a given set of
archives or directories:
$ deepfind foo.ar foo.ar foo.ar/foo.tar foo.ar/foo.tar/foo.tar.gz foo.ar/foo.tar/foo.tar.gz/foo.zip foo.ar/foo.tar/foo.tar.gz/foo.zip/foo.tar.bz2 foo.ar/foo.tar/foo.tar.gz/foo.zip/foo.tar.bz2/foo.txt.gz foo.ar/foo.tar/foo.tar.gz/foo.zip/foo.tar.bz2/foo.txt.gz/foo.txt
As with deepgrep
, deepfind
does not
implement any common options of it’s normal sister tool
find
.
[The following part has been added on 17-Nov-2012]
As with deepgrep, it also doesn’t seem to support any of the more modern or more exotic compression formats, i.e. it fails on modern debian binary packages which use xz compression on the data part:
deepfind xulrunner-18.0_18.0\~a2+20121109042012-1_amd64.deb xulrunner-18.0_18.0~a2+20121109042012-1_amd64.deb xulrunner-18.0_18.0~a2+20121109042012-1_amd64.deb/debian-binary xulrunner-18.0_18.0~a2+20121109042012-1_amd64.deb/control.tar.gz xulrunner-18.0_18.0~a2+20121109042012-1_amd64.deb/control.tar.gz/triggers xulrunner-18.0_18.0~a2+20121109042012-1_amd64.deb/control.tar.gz/preinst xulrunner-18.0_18.0~a2+20121109042012-1_amd64.deb/control.tar.gz/md5sums xulrunner-18.0_18.0~a2+20121109042012-1_amd64.deb/control.tar.gz/postinst xulrunner-18.0_18.0~a2+20121109042012-1_amd64.deb/control.tar.gz/control xulrunner-18.0_18.0~a2+20121109042012-1_amd64.deb/data.tar.xz
[End of part added at 17-Nov-2012]
Dependencies
The package strigi-utils doesn’t pull in the complete Strigi framework (i.e. no daemon), just a few libraries (libstreams, libstreamanalyzer, and libclucene). On Wheezy it also pulls in some audio/video decoding libraries which may make some server administrators less happy.
Conclusion
Both tools are quite limited to some basic use cases, but can be worth a fortune if you have to work with nested archives. Nevertheless the claim in the Debian package description of strigi-utils that they’re “enhanced” versions of their well known counterparts is IMHO disproportionate.
Most of the missing features and documentation can be explained by the primary purpose of these tools: Being backend for desktop searches. I guess, there wasn’t much need for proper commandline usage yet. Until now. ;-)
42.zip
And yes, I was curious enough to let deepfind
have a look
at 42.zip (the one from SecurityFocus, unzip seems not
able to unpack 42.zip from unforgettable.dk due a missing version compatibility)
and since it just traverses the archive sequentially, it has no
problem with that, needing just about 5 MB of RAM and a lot of time:
[…] 42.zip/lib f.zip/book f.zip/chapter f.zip/doc f.zip/page e.zip 42.zip/lib f.zip/book f.zip/chapter f.zip/doc f.zip/page e.zip/0.dll 42.zip/lib f.zip/book f.zip/chapter f.zip/doc f.zip/page f.zip 42.zip/lib f.zip/book f.zip/chapter f.zip/doc f.zip/page f.zip/0.dll deepfind 42.zip 11644.12s user 303.89s system 97% cpu 3:24:02.46 total
I though won’t try deepgrep
on 42.zip. ;-)
Tagged as: 42.zip, ar, bzip2, CLI, CLucene, deb, deepfind, deepgrep, efho, find, grep, gzip, jar, KDE, LibreOffice, Lucene, odt, OpenOffice.org, Rant, rpm, strigi, tar, UUUT, war, zip
// show without comments // write a comment
Related stories
Thursday·15·November·2012
Tools to handle archives conveniently //at 01:42 //by abe
TL;DR: There’s a summary at the end of the article.
Today I wanted to see why a dependency in a .deb
-package
from an external APT repository changed so that it became
uninstallable. While dpkg-deb --info foobar.deb
easily
shows the control information, the changelog is in the filesystem part
of the package.
I could extract that one dpkg-deb
, too,
but I’d have to extract either to some temporary directory or pipe it
into tar which then can extract a single file from the archive and
sent it to STDOUT:
dpkg-deb --fsys-tarfile foobar.deb | tar xOf - ./usr/share/doc/foobar/changelog.Debian.gz | zless
But that’s tedious to type. The following command is clearly less to type and way easier to remember:
acat foobar.deb ./usr/share/doc/foobar/changelog.Debian.gz | zless
acat
stands for “archive cat” is part of the atool suite of commands:
- als
- lists files in an archive.
$ als foobar.tgz drwxr-xr-x abe/abe 0 2012-11-15 00:19 foobar/ -rw-r--r-- abe/abe 13 2012-11-15 00:20 foobar/bar -rw-r--r-- abe/abe 13 2012-11-15 00:20 foobar/foo
- acat
- extracts files in an archive to standard out.
$ acat foobar.tgz foobar/foo foobar/bar foobar/bar bar contents foobar/foo foo contents
- adiff
- generates a diff between two archives using diff(1).
$ als quux.zip Archive: quux.zip Length Date Time Name --------- ---------- ----- ---- 0 2012-11-15 00:23 quux/ 16 2012-11-15 00:22 quux/foo 13 2012-11-15 00:20 quux/bar --------- ------- 29 3 files $ adiff foobar.tgz quux.zip diff -ru Unpack-3594/foobar/foo Unpack-7862/quux/foo --- Unpack-3594/foobar/foo 2012-11-15 00:20:46.000000000 +0100 +++ Unpack-7862/quux/foo 2012-11-15 00:22:56.000000000 +0100 @@ -1 +1 @@ -foo contents +foobar contents
- arepack
- repacks archives to a different format. It does this by first
extracting all files of the old archive into a temporary directory,
then packing all files extracted to that directory to the new archive.
Use the
--each
(-e
) option in combination with--format
(-F
) to repack multiple archives using a single invocation ofatool
. Note thatarepack
will not remove the old archive. $ arepack foobar.tgz foobar.txz foobar.tgz: extracted to `Unpack-7121/foobar' foobar.txz: grew 36 bytes
- apack
- creates archives (or compresses files). If no file arguments are specified, filenames to add are read from standard in.
- aunpack
- extracts files from an archive. Often one wants to extract all files in an archive to a single subdirectory. However, some archives contain multiple files in their root directories. The aunpack program overcomes this problem by first extracting files to a unique (temporary) directory, and then moving its contents back if possible. This also prevents local files from being overwritten by mistake.
(atool subcommand descriptions from the atool man page which is licensed under GPLv3+. Examples by me.)
I though miss
the existence of an agrep
subcommand. Guess why?
atool
supports a wealth of archive types: tar (gzip-,
bzip-, bzip2-, compress-/Z-, lzip-, lzop-, xz-, and 7zip-compressed),
zip, jar/war, rar, lha/lzh, 7zip, alzip/alz, ace, ar, arj, arc, rpm,
deb, cab, gzip, bzip, bzip2, compress/Z, lzip, lzop, xz, rzip, lrzip
and cpio. (Not all subcommands support all archive types.)
Similar Utilities
There are some utilities which cover parts of what atool does, too:
Tools from the mtools package
Yes, they come from the “handle MS-DOS floppy disks tool” package, don’t ask me why. :-)
- uz
gunzip
s and extracts agzip
‘dtar
‘d archives- Advantage over
aunpack
: Less to type. :-) - Disadvantage compared to
aunpack
: Supports only one archive format. - lz
gunzip
s and shows a listing of agzip
‘dtar
‘d archive- Advantage over
als
: One character less to type. :-) - Disadvantage compared to
als
: Supports only one archive format.
unp
unp
extracts one or more files given as arguments on the
command line.
$ unp -s Known archive formats and tools: 7z: p7zip or p7zip-full ace: unace ar,deb: binutils arj: arj bz2: bzip2 cab: cabextract chm: libchm-bin or archmage cpio,afio: cpio or afio dat: tnef dms: xdms exe: maybe orange or unzip or unrar or unarj or lha gz: gzip hqx: macutils lha,lzh: lha lz: lzip lzma: xz-utils or lzma lzo: lzop lzx: unlzx mbox: formail and mpack pmd: ppmd rar: rar or unrar or unrar-free rpm: rpm2cpio and cpio sea,sea.bin: macutils shar: sharutils tar: tar tar.bz2,tbz2: tar with bzip2 tar.lzip: tar with lzip tar.lzop,tzo: tar with lzop tar.xz,txz: tar with xz-utils tar.z: tar with compress tgz,tar.gz: tar with gzip uu: sharutils xz: xz-utils zip,cbz,cbr,jar,war,ear,xpi,adf: unzip zoo: zoo
So it’s very similar to aunpack
, just a shorter command
and it supports some more exotic archive formats which
atool
doesn’t support.
Also part of the unp package is ucat
which does
more or less the same as acat
, just with unp
as backend.
dtrx
From the man page of dtrx
:
In addition to providing one command to extract many different archive types,
dtrx
also aids the user by extracting contents consistently. By default, everything will be written to a dedicated directory that’s named after the archive. dtrx will also change the permissions to ensure that the owner can read and write all those files.Supported archive formats: tar, zip (including self-extracting .exe files), cpio, rpm, deb, gem, 7z, cab, rar, and InstallShield. It can also decompress files compressed with gzip, bzip2, lzma, or compress.
dtrx -l
lists the contents of an archive, i.e. works like
als
or lz
.
dtrx has two features not present in the other tools mentioned so far:
- It can extract metadata instead of the normal contents from .deb and .gem files.
- It can extract archives recursively, i.e. can extract archives inside of archives.
Unfortunately you can’t mix those two features. But you can use the following tool for that purpose:
deepfind
deepfind is a command from the package strigi-utils and recursively lists files in archives, including archives in archives. I’ve already written a detailed blog-posting about deepfind and its friend deepgrep.
tardiff
tardiff
was written to check what changed in source code
tarballs from one release to another. By default it just lists the
differences in the file lists, not in the files’ contents and hence
works different than adiff
.
Summary
atool
and friends are probably the first choice when it comes to
DWIM archive handling, also
because they have an easy to remember subcommand scheme.
uz
and lz
and the shortest way to extract or
list the contents of a .tar.gz file. But nothing more. And you have to
install mtools even if you don’t have a floppy drive.
unp
comes in handy for exotic archive formats atool
doesn’t support. And it’s way easier to remember and type than
aunpack
.
dtrx
is neat if you want to extract archives in archives
or if you want to extract metadata from some package files with just a
few keystrokes.
For listing all files in recursive archives, use
deepfind
.
Tagged as: 7zip, acat, adiff, als, apack, archives, atool, aunpack, bzip, bzip2, deb, deepfind, dtrx, floppy, gem, grep, gzip, lha, lrzip, lz, lzip, lzop, MS-DOS, mtools, rar, rpm, rzip, strigi-utils, tar, tardiff, ucat, unp, UUUT, uz, xz, zip
// show without comments // write a comment
Related stories
Thursday·27·October·2011
Daily Snapshot .debs of Conkeror //at 22:57 //by abe
Keeping track with packaging software which is under heavy development can be time-consuming. I noticed this while packaging Conkeror, because there was quite a demand for up-to-date packages, especially from upstream themself.
So recently on the IRC channel #conkeror the idea of automatically built Debian packages came up. After a few hours of experimenting and a few days of steadily optimizing, I can proudly present daily built snapshot packages of Conkeror for currently Lenny and Sid, ready to be included in your sources.list:
deb http://noone.org/conkeror-nightly-debs lenny main deb-src http://noone.org/conkeror-nightly-debs lenny main deb http://noone.org/conkeror-nightly-debs sid main deb-src http://noone.org/conkeror-nightly-debs sid main
The binary package conkeror-spawn-process-helper is currently only built for the i386 architecture, but other architectures may follow.
The packages probably work also on any other Debian based distribution (e.g. Ubuntu) which includes XULRunner version 1.9.
Surely they are not of the usual Debian quality, but they should do it for staying up-to-date with the Conkeror development just by using your favourite APT frontend.
The script which generates those packages is also available in the Conkeror git repository at repo.or.cz.
The APTable archive is generated with reprepro. Packages and the repository are signed with the passphrase-less GnuPG key 373B76B4 which is used only for the Conkeror nightly builds. (If anyone knows a better solution for automatic builds than a passphrase-less key, please tell me. :-)
P.S.: I really like the new keybindings “<<”, “>>” and
“G”. :-)
Tagged as: APT, Browser, build, Conkeror, daily, deb, Debian, git, GnuPG, gpg, i386, IRC, keybindings, Lenny, nightly, packaging, pgp, repo.or.cz, repository, reprepro, Sid, signing, snapshot, Ubuntu, XULRunner
// show without comments // write a comment
Related stories
Sunday·08·March·2009
Sedating irssi’s nick highlight for microblogging messages //at 17:13 //by abe
My favourite IRC client is irssi. I like it so much that I even use it for all my instant messaging needs. The gateway of choice between irssi and mostly Jabber is Bitlbee.
I also microblog on identi.ca, a free (free as in AGPL) microblogging service based on laconi.ca. In comparsion to the non-free and proprietary Twitter microblogging, identi.ca has all the features which Twitter turned off already again.
For me the most important feature of Twitter was tweeting via XMPP (aka Jabber). Since Twitter turned off that feature, Twitter increasingly fast became unimportant for me. Identi.ca still has this feature and cultivates it further. So usually don’t visit the identi.ca website that often anymore but get the microblogging stream of my friends via XMPP and Bitlbee directly into my irssi.
Although this is very convenient, it has one big disadvantage: In comparison to an IRC channel, not only notices directed to me personally but every incoming notice beeps, because Bitlbee sends them either as /MSG or prepends my nick name. For normal IRC communication /MSG should beep, and you can’t make exceptions for that so easily in irssi.
I asked on #bitlbee (OFTC) and on #irssi (IRCNet). On #irssi funnily the first answer was “I tried that yesterday, no success” from Shrike. — So I’m not alone, although Shrike uses Jaiku and not identi.ca. Then I had the idea to get Bitlbee to not prepend my nick name for all those identi.ca notices which go into the &bitlbee channel — but I didn’t find a way to configure this in Bitlbee. But Shrike found a way to do this with already existing irssi plugins:
The trigger.pl plugin (available e.g. in Debian’s irssi-scripts package or on scripts.irssi.org) can add triggers which replace parts of the message. So the following three lines helped me to reduce the noise microblogging causes in my irssi:
/script load trigger /trigger add -publics -masks 'identica!update@identi.ca' -channels '&bitlbee' -regexp "^XTaran: " -replace '' /trigger save
And on the command line I just needed a symlink to automatically start the trigger plugin on irssi startup:
ln -vis /usr/share/irssi/scripts/trigger.pl ~/.irssi/scripts/autorun/
So now again only the important messages beep. :-)
Tagged as: #bitlbee, #irssi, AGPL, Bitlbee, deb, highlight, identi.ca, IM, IRC, IRCNet, irssi, Jabber, jaiku, laconi.ca, microblogging, OFTC, Perl, plugin, Shrike, trigger, Twitter, XMPP
// show without comments // write a comment
Related stories
Thursday·22·January·2009
Tablet Amora runs on the OpenMoko FreeRunner (updated) //at 00:13 //by abe
Amora (“A MObile Remote Assistant”) is a client/server suite which allows you to remote control an X desktop using a bluetooth enabled mobile phone. Initially there was only a Symbian client (running e.g. on nearly all Nokia E and N series phones), but J2ME clients are under developement, too.
Then there is Tablet Amora (aka Tamora), an Amora “proof of concept” client for the Maemo platform which runs on internet tablets like e.g. the linux based Nokia N770, N800, and N810. Since Maemo isn’t that far away from what runs on the OpenMoko, getting Tamora working on the OpenMoko, too, suggested itself.
Maemo seems to use the deb package format, too, just slightly extended (e.g. by package icons), so it wasn’t even that hard work to adapt the existing Maemo packaging to build, install and run on Debian, too.
So that’s how Tamora looks on the OpenMoko:
The packaging is still far away from Debian standards (throws tons of lintian warnings and the source package generation is b0rked), so yet there are no prebuilt debs available, but you can checkout amora-client from the Subversion repository and build the package from there:
$ svn checkout http://amora.googlecode.com/svn/trunk/amora-client/maemo/ amora-client $ cd amora-client $ debuild -uc -us $ cd .. # dpkg -i amora-client_0.1-2maemo+openmoko_all.deb
For running and installing tamora you need packages from the pkg-fso APT repository on alioth. And to build it, you need the libedje-bin which is available from the pkg-fso repository for at least the armel architecture, or else from Debian experimental. You can add these repositories to your sources.list as follows:
# PKG FSO repository deb http://pkg-fso.alioth.debian.org/debian unstable main deb-src http://pkg-fso.alioth.debian.org/debian unstable main # Debian Experimental deb http://ftp.ch.debian.org/debian experimental main deb-src http://ftp.ch.debian.org/debian experimental main
Since Tamora is yet only a “proof of concept” client, currently only the following remote functions are available:
- pressing arrow key right/left
- pressing F5 (fullscreen for the OpenOffice.org Presenter)
This should though at least suffice for a presentation with the OpenOffice.org Presenter.
To use Tamora to remote control your Debian laptop, you need a bluetooth dongle (or builtin bluetooth support) and amora-server installed as with the Symbian S60 (3rd Edition) Amora client, too.
Update, 23:51
libedje-bin seem not available in the pkg-fso repository for every
architecture. You’ll also find it in Debian experimental. Updated the
sources.list section above appropriately. Thanks to Sebastian Montini
for pointing out this problem.
Tagged as: Amora, bluetooth, deb, Debian, experimental, FreeRunner, FSO, internet tablet, Linux, Maemo, N770, N800, N810, Nokia, OpenMoko, OpenOffice.org, packaging, PoC, Python, S60, Sid, Symbian, Tamora
// show without comments // write a comment
Related stories
Thursday·28·September·2006
wApua 0.06 released //at 03:33 //by abe
I today released version 0.06 of my WAP browser wApua (Release announcement at Freshmeat).
The one big new thing is user friendly documentation: wApua and wbmp2xbm (which has been renamed from wbmp2xbm.pl) now have POD documentation and therefore also man pages. Besides that a lot of minor bugfixes and enhancements complete the new version.
The other big new thing is that there now is a Debian package of wApua. The package should work fine on Debian Woody (3.0), Sarge (3.1) and Etch (upcoming 4.0) and probably also works on other Debian-based distributions like Ubuntu.
Thanks to sponsoring by Christoph “Myon” Berg the Debian package is also in
the Debian
New Queue and hopefully will be included in Debian Etch.
Tagged as: Bugfix, deb, Documentation, Etch, Freshmeat, Hacks, Open Source, Perl, POD, Sarge, Tk, Ubuntu, WAP, wApua, WML, Woody
// show without comments // write a comment