Jump to menu and information about this site.


Automatically hardlinking duplicate files under /usr/share/doc with APT //at 20:43 //by abe

from the no-space-left-on-device dept.

On my everyday netbook (a very reliable first generation ASUS EeePC 701 4G) the disk (4 GB as the product name suggests :-) is nearly always close to full.

TL;DWTR? Jump directly to the HowTo. :-)

So I came up with a few techniques to save some more disk space. Installing localepurge was one of the earliest. Another one was to implement aptitude filters to do interactively what deborphan does non-interactively. Yet another one is to use du and friends a lot – ncdu is definitely my favourite du-like tool in the meantime.

Using du and friends I often noticed how much disk space /usr/share/doc takes up. But since I value the contents of /usr/share/doc a lot, I condemn how Nokia solved that on the N900: They let APT delete all files and directories under /usr/share/doc (including the copyright files!) via some package named docpurge. I also dislike Ubuntu’s “solution” to truncate the shipped changelog files (you can still get the remainder of the files on the web somewhere) as they’re an important source of information for me.

So when aptitude showed me that some package suddenly wanted to use up quite some more disk space, I noticed that the new package version included the upstream changelog twice. So I started searching for duplicate files under /usr/share/doc.

There are quite some tools to find duplicate files in Debian. hardlink seemed most appropriate for this case.

First I just looked for duplicate files per package, which even on that less than four gigabytes installation on my EeePC found nine packages which shipped at least one file twice.

As recommended I rather opted for an according Lintian check (see bugs. Niels Thykier kindly implemented such a check in Lintian and its findings are as reported as tags “duplicate-changelog-files” (Severity: normal, from Lintian 2.5.2 on) and “duplicate-files” (Severity: minor, experimental, from Lintian 2.5.0 on).

Nevertheless, some source packages generate several binary packages and all of them (of course) ship the same, in some cases quite large (Debian) changelog file. So I found myself running hardlink /usr/share/doc now and then to gain some more free disk space. But as I run Sid and package upgrades happen more than daily, I came to the conclusion that I should run this command more or less after each aptitude run, i.e. automatically.

Having taken localepurge’s APT hook as example, I added the following content as /etc/apt/apt.conf.d/98-hardlink-doc to my system:

// Hardlink identical docs, changelogs, copyrights, examples, etc

Post-Invoke {"if [ -x /usr/bin/hardlink ]; then /usr/bin/hardlink -t /usr/share/doc; else exit 0; fi";};

So now installing a package which contains duplicate files looks like this:

~ # aptitude install perl-tk
The following NEW packages will be installed:
0 packages upgraded, 1 newly installed, 0 to remove and 0 not upgraded.
Need to get 2,522 kB of archives. After unpacking 6,783 kB will be used.
Get: 1 http://ftp.ch.debian.org/debian/ sid/main perl-tk i386 1:804.029-1.2 [2,522 kB]
Fetched 2,522 kB in 1s (1,287 kB/s)  
Selecting previously unselected package perl-tk.
(Reading database ... 121849 files and directories currently installed.)
Unpacking perl-tk (from .../perl-tk_1%3a804.029-1.2_i386.deb) ...
Processing triggers for man-db ...
Setting up perl-tk (1:804.029-1.2) ...
Mode:     real
Files:    15423
Linked:   3 files
Compared: 14724 files
Saved:    7.29 KiB
Duration: 4.03 seconds
localepurge: Disk space freed in /usr/share/locale: 0 KiB
localepurge: Disk space freed in /usr/share/man: 0 KiB
localepurge: Disk space freed in /usr/share/gnome/help: 0 KiB
localepurge: Disk space freed in /usr/share/omf: 0 KiB

Total disk space freed by localepurge: 0 KiB

Sure, that wasn’t the most space saving example, but on some installations I saved around 100 MB of disk space that way – and I still haven’t found a case where this caused unwanted damage. (Use of this advice on your own risk, though. Pointers to potential problems welcome. :-)

Tag Cloud

2CV, aha, Apache, APT, aptitude, ASUS, Automobiles, autossh, Berlin, bijou, Blogging, Blosxom, Blosxom Plugin, Browser, BSD, CDU, Chemnitz, Citroën, CLI, CLT, Conkeror, CSS, CX, deb, Debian, Doofe Parteien, E-Mail, eBay, EeePC, Emacs, Epiphany, Etch, ETH Zürich, Events, Experimental, Firefox, Fläsch, FreeBSD, Freitagstexter, FVWM, Galeon, Gecko, git, GitHub, GNOME, GNU, GNU Coreutils, GNU Screen, Google, GPL, grep, grml, gzip, Hackerfunk, Hacks, Hardware, Heise, HTML, identi.ca, IRC, irssi, Jabber, JavaShit, Kazehakase, Lenny, Liferea, Linux, LinuxTag, LUGS, Lynx, maol, Meme, Microsoft, Mozilla, Music, mutt, Myon, München, nemo, Nokia, nuggets, Open Source, OpenSSH, Opera, packaging, Pentium I, Perl, Planet Debian, Planet Symlink, Quiz, Rant, ratpoison, Religion, RIP, Sarcasm, Sarge, Schweiz, screen, Shell, Sid, Spam, Squeeze, SSH, Stoeckchen, Stöckchen, SuSE, Symlink, Symlink-Artikel, Tagging, Talk, taz, Text Mode, ThinkPad, Ubuntu, USA, USB, UUUCO, UUUT, VCFe, Ventilator, Vintage, Wahlen, WAP, Wheezy, Wikipedia, Windows, WML, Woody, WTF, X, Xen, zsh, Zürich, ÖPNV


Mo Tu We Th Fr Sa Su

Tattletale Statistics

Blog postings by posting time
Blog posting times this month


Advanced Search


Recent Postings

13 most recent of 289 postings total shown.

Recent Comments

Hackergotchi of Axel Beckert


This is the blog or weblog of Axel Stefan Beckert (aka abe or XTaran) who thought, he would never start blogging... (He also once thought, that there is no reason to switch to this new ugly Netscape thing because Mosaïc works fine. That was about 1996.) Well, times change...

He was born 1975 at Villingen-Schwenningen, made his Abitur at Schwäbisch Hall, studied Computer Science with minor Biology at University of Saarland at Saarbrücken (Germany) and now lives in Zürich (Switzerland), working at the Network Security Group (NSG) of the Central IT Services (Informatikdienste) at ETH Zurich.

Links to internal pages are orange, links to related pages are blue, links to external resources are green and links to Wikipedia articles, Internet Movie Database (IMDb) entries or similar resources are bordeaux. Times are CET respective CEST (which means GMT +0100 respective +0200).

RSS Feeds

Identity Archipelago

Picture Gallery

Button Futility

Valid XHTML Valid CSS
Valid RSS Any Browser
This content is licensed under a Creative Commons License (SA 3.0 DE). Some rights reserved. Hacker Emblem
Get Mozilla Firefox! Powered by Linux!
Typed with GNU Emacs Listed at Tux Mobil
XFN Friendly Button Maker


People I know personally

Other blogs I like or read

Independent News

Interesting Planets

Web comics I like and read

Stalled Web comics I liked

Blogging Software

Blosxom Plugins I use

Bedside Reading

Just read

  • Bastian Sick: Der Dativ ist dem Genitiv sein Tod (Teile 1-3)
  • Neil Gaiman and Terry Pratchett: Good Omens (borrowed from Ermel)

Currently Reading

  • Douglas R. Hofstadter: Gödel, Escher, Bach
  • Neil Gaiman: Keine Panik (borrowed from Ermel)

Yet to read

  • Neil Stephenson: Cryptonomicon (borrowed from Ermel)

Always a good snack

  • Wolfgang Stoffels: Lokomotivbau und Dampftechnik (borrowed from Ermel)
  • Beverly Cole: Trains — The Early Years (getty images)