bash - How to properly handle a gzipped page when using curl?

Question

Welcome To Ask or Share your Answers For Others

bash - How to properly handle a gzipped page when using curl?

asked Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

bash - How to properly handle a gzipped page when using curl?

I wrote a bash script that gets output from a website using curl and does a bunch of string manipulation on the html output. The problem is when I run it against a site that is returning its output gzipped. Going to the site in a browser works fine.

When I run curl by hand, I get gzipped output:

$ curl "http://example.com"

Here's the header from that particular site:

HTTP/1.1 200 OK
Server: nginx
Content-Type: text/html; charset=utf-8
X-Powered-By: PHP/5.2.17
Last-Modified: Sat, 03 Dec 2011 00:07:57 GMT
ETag: "6c38e1154f32dbd9ba211db8ad189b27"
Expires: Sun, 19 Nov 1978 05:00:00 GMT
Cache-Control: must-revalidate
Content-Encoding: gzip
Content-Length: 7796
Date: Sat, 03 Dec 2011 00:46:22 GMT
X-Varnish: 1509870407 1509810501
Age: 504
Via: 1.1 varnish
Connection: keep-alive
X-Cache-Svr: p2137050.pubip.peer1.net
X-Cache: HIT
X-Cache-Hits: 425

I know the returned data is gzipped, because this returns html, as expected:

$ curl "http://example.com" | gunzip

I don't want to pipe the output through gunzip, because the script works as-is on other sites, and piping through gzip would break that functionality.

What I've tried

changing the user-agent (I tried the same string my browser sends, "Mozilla/4.0", etc)
man curl
google search
searching stackoverflow

Everything came up empty

Any ideas?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-16T23:30:16+0000

curl will automatically decompress the response if you set the --compressed flag:

curl --compressed "http://example.com"

--compressed (HTTP) Request a compressed response using one of the algorithms libcurl supports, and save the uncompressed document. If this option is used and the server sends an unsupported encoding, curl will report an error.

gzip is most likely supported, but you can check this by running curl -V and looking for libz somewhere in the "Features" line:

$ curl -V
...
Protocols: ...
Features: GSS-Negotiate IDN IPv6 Largefile NTLM SSL libz

Note that it's really the website in question that is at fault here. If curl did not pass an Accept-Encoding: gzip request header, the server should not have sent a compressed response.

Categories

bash - How to properly handle a gzipped page when using curl?

bash - How to properly handle a gzipped page when using curl?

What I've tried

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags