How to Use Curl to Check if a Remote Resource is Available

Do you want to check if a file is available on a remote server before attempting to download it - in a short and effective way? If so, and you're not sure how you'll learn how in this post.

Recently, I was writing a shell script as part of a new Docker container setup. While I was pretty happy with the script, it wasn't as defensive as I'd have liked it to be. This was because it didn't check if a remote tarball was available before attempting to download and use it.

Sure, it's a personal project that only I use, so no one else would ever get burned if the resource wasn't available. However, you never know, one day, it may be publicly available, and I want it to be as good as I can make it.

So, how should it check if the resource was available, before attempting to download it? The first idea that came to mind was to make a HEAD request with Curl, by passing the -I or --head flags, and see if the status code returned was an HTTP 200.

Here are the contents of the man page for those options:

-I, --head (HTTP FTP FILE) Fetch the headers only! HTTP-servers feature the command HEAD which this uses to get nothing but the header of a document. When used on an FTP or FILE file, curl displays the file size and last modification time only.

And here's an example of the output that you might expect to see by making a HEAD request:

HTTP/1.1 200 OK
X-Powered-By: React/alpha
Content-Type: image/png
Transfer-Encoding: chunked

There, at the top, you see that the server's using HTTP 1.1 and that the resource is there, as the status code is a 200 OK. So far, so good.

Now, how do I — efficiently — extract just the status code and nothing else? As the response headers are all text, then many of the standard Linux shell commands, e.g., awk, sed, and grep could be used. After a bit of experimenting, I came up with the following bash one-liner (formatted for readability).

(($(curl --silent -I https://example.com/my.remote.tarball.gz \
    | grep -E "^HTTP" \
    | awk -F " " '{print $2}') == 200)) \
    && echo "file exists"

It starts with Curl making a HEAD request and then pipes the returned headers through to grep. Grep, using a regular expression, extracts the response code header and pipes it to awk. Awk then extracts the response number. Finally, if the response code is 200, then "file exists" is echoed to the terminal.

You could argue that I should be happy at this point. It works. Use it. Move on. Nope! No can do. Something seems wrong here.

Surely, I think to myself, if I can make a HEAD request with Curl, then Curl can also filter out the status code itself, without having to pass the response headers through a pipe. So, I go back through the Curl man page, looking for an option to use.

Nothing stands out, at least at first. Then, after a bit of reading, along with suggestions from the Twitter-verse, I uncover a new option: -w --write-out. Here are the option's essential details from Curl's man page:

-w, --write-out Make curl display information on stdout after a completed transfer. The variables present in the output format will be substituted by the value or text that Curl thinks fit, as described below. All variables are specified as %{variable_name}...

Looks like a winner! After looking through the list of available variables, I hit pay dirt with http_code. Here's what it does:

http_code The numerical response code that was found in the last retrieved HTTP(S) or FTP(s) transfer. In 7.18.2 the alias response_code was added to show the same info.

By using this option, there's be no need to parse regular expressions or to pipe for further text processing. Now that's efficient! So, what does the revised command look like? Here it is:

curl --silent --head --write-out '%{http_code}' \
    https://example.com/my.remote.tarball.gz

Running it, the output is now changed to:

[14/Aug/2019 08:27:58] "HEAD /assets/img/logos/ms-logo-46x46-transparent.png HTTP/1.1" 200
HTTP/1.1 200 OK
X-Powered-By: React/alpha
Content-Type: image/png
Transfer-Encoding: chunked

200%

Hmmm, almost there, as the 200 code is displayed last. So, not entirely, as the other headers are still present. I don't want all of the header data, just the extracted code. To get rid of this, I used the -o --output option, which writes output to a file instead of stdout. As I'm not interested in the content, I decided to write it to /dev/null. So, here’s the final version:

curl -o /dev/null --silent -Iw '%{http_code}' \
    https://example.com/my.remote.tarball.gz

The output is now just the HTTP status code, as follows:

200%

Perfect!

In Conclusion

And that's how to check if a remote resource, whether it's an image, tarball (or other compressed files), text file, or whatever you're after, is available before attempting to download it.

The moral of the story is that man pages contain a wealth of information; search, and you shall find. It can take some searching (at times) to find what you're looking for, and the right command (or command combination) may not always be evident at first.

However, it's worth spending the time to read man pages in-depth. If, after doing so, you don't find an option that helps, then start building a pipe. But don't build one as a first resort.

Finally, a big thank you for the feedback I received on Twitter — especially to Dan Allen, who pointed out that I wasn't as clear as I should have been with my initial question, and Thomas Boerger, for sharing a link to an example script that he had, which did almost exactly what I was after. Thanks, everyone!


Matthew Setter. Ethical Hacker, Online Privacy Advocate, and a Software Engineer.

Matthew Setter

Software Engineer, Ethical Hacker, & Online Privacy Advocate.

Matthew Setter is a software engineer, ethical hacker, privacy advocate, & technical writer, who loves travelling. He is based in Nuremberg, Germany. When he's not doing all things tech, he's spending time with his family, and friends.