Wednesday·02·October·2013
How to make wget honour Content-Disposition headers //at 16:12 //by abe
Download links often point to CGI scripts which actually generate (or
just fetch, i.e. proxy) the actual file to be downloaded, e.g. URLs
like http://www.example.com/download.cgi?file=foobar.txt
.
Most of such CGI scripts send the real file name in the Content-Disposition
header as specified
in the MIME Specification.
All browsers I know (well, at least those I use regularily :-) handle that perfectly and propose the file name sent in the Content-Disposition header as file name for saving the downloaded name which is usually exactly what I want.
All browsers do that, …, just not my favourite commandline download tool GNU Wget … Downloading the above URL with wget would look like this with default settings:
$ wget 'http://www.example.com/download.cgi?file=foobar.txt' --2013-10-02 16:04:16-- http://www.example.com/download.cgi?file=foobar Resolving www.example.com (www.example.com)... 93.184.216.119, 2606:2800:220:6d:26bf:1447:1097:aa7 Connecting to www.switch.ch (www.example.com)|2606:2800:220:6d:26bf:1447:1097:aa7|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 2020 (2.0K) [text/plain] Saving to: `download.cgi?file=foobar.txt' 100%[============================================>] 2,020 --.-K/s in 0s 2013-10-02 16:04:24 (12.5 MB/s) - `download.cgi?file=foobar.txt' saved [2020/2020]
Meh!
But luckily Wget can do that, it’s just not enabled by default — because it’s an experimental and possibly buggy feature, at least according to the man page. Well, works for me! :-)
You can easily enabled it by default for either your user or the whole
system by placing the following line in your ~/.wgetrc
or /etc/wgetrc
:
content-disposition = on
Given the CGI script sends an appropriate Content-Disposition header, the above output now looks like this:
$ wget 'http://www.example.com/download.cgi?file=foobar.txt' --2013-10-02 16:04:16-- http://www.example.com/download.cgi?file=foobar Resolving www.example.com (www.example.com)... 93.184.216.119, 2606:2800:220:6d:26bf:1447:1097:aa7 Connecting to www.switch.ch (www.example.com)|2606:2800:220:6d:26bf:1447:1097:aa7|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 2020 (2.0K) [text/plain] Saving to: `foobar.txt' 100%[============================================>] 2,020 --.-K/s in 0s 2013-10-02 16:04:24 (12.5 MB/s) - `foobar.txt' saved [2020/2020]
Now Wget does what I mean!
You can also set this as flag on the commandline, but typing
wget --content-disposition …
everytime is surely
not what I want. ;-)
Tagged as: CGI, CLI, Content-Disposition, download, howto, HTTP, Shell, UUUCO, wget
// show without comments // write a comment
Related stories
Sunday·25·May·2008
Google Open Source Jam and Webtuesday Hackday //at 22:45 //by abe
I was at two geek events in Zurich this week: At the Google Open Source Jam Zurich on Thursday evening and at the first Webtuesday Hackday on Saturday.
Somehow I expected both events to be quite similar, but they weren’t.
Google Open Source Jam
When I read “Jam” or “Jam Session” I think of Jazz musicians spontaneously playing together. So for me “Open Source Jam” sounded like a hack session where some spontaneous coding is done. But there was no spontaneous collaboration at Open Source Jam at all. It’s just (more or less spontaneous) talks about different topics and chatting. So I was quite disappointed from that event.
There were though quite a lot of people I knew from e.g. Webtuesday, Chaostreff or Debian. I even met some people I just knew from IRC until then.
Half of the talks were sole propaganda talks though, e.g. for Webtuesday Hackday, OpenExpo and Soaring as a geek sport. Not really wrongly placed talks, but not what I expected in talks at Open Source Jam.
The few rooms and floors I saw reminded me very much to IKEA Children’s Paradies, just even more motley. Though it felt all sterile and wasn’t by far as cool as I expected after what I read elsewhere of Google offices.
I also think that several of the Google employees showed some contrived friendlyness, and questions I asked e.g. why I have to give them my e-mail address and employer’s name (what do unemployed or self-employed people do?) got answered with answers I do not really believe – like “for security”. A leopard doesn’t change its spots. A data squid probably neither, even not at events labeled with OSS and said to be for the community.
I suspect that finding new employees is one of the reasons behind such events at Google. But after my first visit at one of their locations, this company still makes me feel uncomfortable. And I’m even more sure than before that I wouldn’t want to work there.
Not sure if I’ll attend the Google Open Source Jam a second time.
Webtuesday Hackday
Webtuesday Hackday also was not as I expected, but still more close to my expectations: the Webtuesday crowd gathers for hacking instead of having long talks. :-)
There were surprisingly many people from outside Zurich, from Munich and Belgium, from Lake Constance and Lausaunne – not only the usual suspects (who were there anyway ;-).
The event took place at Liip’s new office. They still look a little bit empty and steril, but all the toys (mini rugby balls, Wii, plush figures on floor lamps) and people around made them very alive. And they had very cool lamps in the form of their company logo in the office. They sure have a good interior designer. :-)
Although most participants found time to do some hacking, many found less time than they expected so we hope that we can glue the talks a little bit more together in regards of timing to cause less interruptions of the hacking.
The food was also better at Hackday, too, but mostly because we ate outside. ;-) For lunch we were at Lily’s Stomach Supply at Langstrasse (very recommendable!) and in 6he evening we were at Pizzeria Grottino 79 near Helvetiaplatz. Had a Pizza Vesuvio with Gruyère cheese there.
Hackday also had a surprise for me: The IRC channel at Hackday was but when I entered the channel there were someone in I didn’t expect there: tklauser aka Tobias Klauser aka tuxedo. Even more surprising, he read about my project idea for Hackday – a semantic feed cache proxy – and liked it, so he decided to come over to Zurich and join the project.
We didn’t came that far until Tobias had to leave again, but the progamming language and partially also libraries had been nailed: Ruby and it’s WEBrick framework. After the Hackday I worked on it a few more hours and it now already saves feeds to a cache. The Mercurial repository is at http://noone.org/hg/sfc-proxy.
There were several reasons which spoke for using Ruby instead of Perl (my favourite progamming language and the one I’m most experienced in): Ruby brings HTTP and RSS support already in it’s standard classes and Tobias is more experienced in Ruby than Perl. I started to learn Ruby a few years ago to look beyond my own nose and to get my hands dirty on some object-oriented and nice programming language, but I hadn’t found an appropriate project until now, so this was one more reason to not do it in Perl.
I also worked on my Debian package of Conkeror during Hackday. It’s already usable and I now use Conkeror as primary web browser on my EeePC, but e.g. the man page is still missing. As soon as I have the minimum in necessary documentation ready I’ll let it upload to Debian Experimental (since its dependency XULRunner 1.9 is also only in Debian Experimental yet). The Mercurial repository for the Debian packaging of Conkeror is at http://noone.org/hg/conkeror/debian
Those who were still at Hackday in the evening decided that the
Webtuesday Hackday should become a regular institution and should take
place approximately every two months, but stay a one day event (for
now). I already look forward to the next Webtuesday Hackday.
Tagged as: Atom, Conkeror, data squid, Debian Experimental, Die Welt ist klein, Events, Freenode, Google, Hackday, Hacks, hg, HTTP, IRC, liip, Mercurial, NDA, Open Source, Open Source Jam, Other Blogs, Perl, Planet Webtuesday, proxy, RDF, RSS, Ruby, SFC, tuxedo, WEBrick, Webtuesday, XULRunner, Zürich
// show without comments // write a comment
Related stories
Thursday·26·April·2007
FTP and port 80? //at 14:25 //by abe
Hmmm, I never thought that a URL could look some kind of schizophrenic
or paradox, but this one truly does: ftp://ftp.port80.se/.
(Found in ftp://ftp.*.debian.org/debian/README.non-US.)
Tagged as: FTP, HTTP, Made my day
// show without comments // write a comment
Monday·18·September·2006
Fixing server bugs on client side //at 15:35 //by abe
On my new job at ETH Zurich I stumbled over a lot of HTTP requests in
the web server log file, obviously trying to fetch the automatic proxy
configuration file (usually called proxy.pac
) but requesting it with the last
character missing and therefore requesting the nonexistent file proxy.pa
:
195.176.XX.AB - - [16/May/2006:11:12:56 +0200] "GET /proxy.pa HTTP/1.1" 404 5261 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Win32)" 195.176.YY.CD - - [16/May/2006:11:16:32 +0200] "GET /proxy.pa HTTP/1.0" 404 5235 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Win32)" 195.176.ZZ.EF - - [16/May/2006:11:18:38 +0200] "GET /proxy.pa HTTP/1.0" 404 5235 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Win32)" 195.176.YY.CD - - [16/May/2006:11:24:16 +0200] "GET /proxy.pa HTTP/1.0" 404 5235 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Win32)" 195.176.ZZ.GHI - - [16/May/2006:11:31:44 +0200] "GET /proxy.pa HTTP/1.0" 404 5235 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Win32)" 195.176.XX.J - - [16/May/2006:11:33:35 +0200] "GET /proxy.pa HTTP/1.1" 404 5261 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Win32)" 195.176.ZZ.LMN - - [16/May/2006:11:35:18 +0200] "GET /proxy.pa HTTP/1.1" 404 5261 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Win32)"
WTF happend here? When I found a bunch of those request from a single host last night, I expected a local cut and waste typo on a single box. But during the day I got the same sort of defective requests from over 30 hosts in our network. So we looked at our dhcpd.conf, but all appearances of “proxy.pac” had its “c” at the right place.
WTF is happening here? After googling for a moment I found this mail on the squid users mailing list, stating the following:
WPAD worked reasonably well for WindowsNT and Windows2000; however, there was a problem with the file name in Windows2000 and the initial release of WindowsXP. The Microsoft DHCP Service returned the wrong byte count for the string returned for option 252. The DHCP Client compensated for this by decrementing the string length. This resulted in the file name being truncated when the ISC DHCP daemon was used. The solution was to define a symlink proxy.pa –> proxy.pac.
So in other words: Microsoft worked around a off-by-one bug in their own DHCP server by patching their DHCP client to parse faulty configurations — and obviously only faulty configurations by expecting some length statement to be always off-by-one. *hrrrrng*
Our solution was BTW to insert an appropriate Alias directive into our
Apache web server hosting the file.
Tagged as: Admin, Apache, Bugs, DHCP, ETH Zürich, HTTP, ISC, Microsoft, MSIE, Proxy, proxy.pac, Rant, WTF
// show without comments // write a comment