October 21, 2016

Send yourself latest XKCD every morning (And learn a bit of Bash scripting along the way)

You’ve probably heard about XKCD, a popular web comic on physics, math, computer science and science in general, and lots of other subjects. Over the years, author Randall Munroe released some pretty awesome pieces such as this, this or this. Wouldn’t it be great if new comics from XKCD automatically appeared in our inbox?

Breaking It Down

We want to send ourselves every new XKCD comic as an email. We want to do this with a Bash script triggered by cron.

For this, we need to get the latest comic data. Also we need to keep track of what we’ve sent, so we don’t send the same thing again and again. After deciding there’s a new comic that we haven’t sent to our email address, we’ll send it.

Upon a little research on the website, I realized that XKCD serves a JSON endpoint here, which includes the latest comic, it’s number, the image url, alt text; basically everything we need.

{
    "month": "10",
    "num": 1749,
    "link": "",
    "year": "2016",
    "news": "",
    "safe_title": "Mushrooms",
    "transcript": "",
    "alt": "Evolutionarily speaking, mushrooms are technically a type of ghost.",
    "img": "http:\/\/imgs.xkcd.com\/comics\/mushrooms.png",
    "title": "Mushrooms",
    "day": "21"
}

We can use the num field, which is like an ID, to store what we sent the last time and realize if there is a new comic released.

If we break it down to independent Bash tasks, these are the seven tasks we need to be able to accomplish to finish this project, using Bash.

  1. Check if a file exists.
  2. Read from a file and put it into a variable.
  3. Download a file from a remote source and read it into a variable.
  4. Parse JSON data and extract a specific value by its key.
  5. Compare two values.
  6. Create a string from multiple variables (string concatenation).
  7. Send an email with HTML content.

I’ll assume you’re familiar with Unix concepts like piping, redirecting and appending.

Check If a File Exists

In Bash, this is a basic if structure:

if [ $someVar = $anotherVar ]; then
    # do something
else
    # do something else
fi

We could also use two equals like in ‘regular’ languages. To do assignment = should be used without spaces, like this: pi=3. Spaces are extra important in Bash.

We’ll use -f expression from the many expressions that test accepts, to see if our log file exists. This square-brackets syntax ([ -f fileName ]) is equal to test -f fileName, because, basically, [ is a synonym for test which takes] as last parameter. To see all expressions available, run man test.

lastSent=0
if [ -f ./xkcd-last-sent.log ]; then
    #read it into lastSent
fi

Reading From a Text File

Since basically everything is a file in Unix, there are lots of tools to work with files. Our log file will only contain one line and no spaces, so to read this file content into a variable, we can just do this:

varName=`cat fileName`

lastSent=`cat ./xkcd-last-sent.log`

Getting JSON From a Remote Host

In Unix-like systems, the most common way to download a file from a remote host is to use wget or curl. While curl is a very sophisticated network program, wget is basically a downloader. We can use either, but let’s go with wget:

jsonStr=$(wget -qO- https://xkcd.com/info.0.json)

-q option tells wget to be quiet, meaning no command output will be printed, and -0- tells it to write downloaded content to STDOUT, the current stream.

At this point, we have our json content as a string in jsonStr. If download failed, this should be empty. Let’s check it before going further:

if [ "$jsonStr" == "" ]
then
    echo 'Failed pulling JSON data.';
    exit;
fi

Why are we putting $jsonStr inside quotes? Because variables will be evaluated; and if it’s empty, Bash will try to execute [ == "" ], which is clearly a syntax error. By wrapping our variable with quotes, we’re making sure if the string is empty, the executed expression will be syntactically correct. This will also protect us from errors caused by spaces and other special characters.

An alternative to this, is to use double square-brackets. In double square brackets, empty strings or strings with spaces won’t cause a syntax error:

if [[ $jsonStr == "" ]]

Parsing JSON

Now we need to parse the string and get values. Following snippet from this answer on StackOverflow looks quite compact and easy to use:

function getJsonVal() {
    python -c "import json,sys;sys.stdout.write(json.dumps(json.load(sys.stdin)$1))"; 
}

This little function lets us to get value of a key with this syntax:

value=`echo $jsonStr | getJsonVal "['keyName']"`

We’ll modify this function a little bit. In this form, you need to specify keys with square brackets, like this: "['key']". The author probably did it this way, because, for example, when you need to get a value in second level or deeper, you need to reach it like this: [firstLevel][secondLevel]. So it is wise to leave it that way to keep the code simple, otherwise you’d need to write more than one line of code to handle depth. But we don’t need to go deeper than one level as our json data is flat.

function getJsonVal() {
    python -c "import json,sys;sys.stdout.write(json.dumps(json.load(sys.stdin)['$1']))"; 
}

Now we can do this:

echo $jsonStr | getJsonVal keyName

So, we’re not doing this using standart Unix tools and just calling python? Well, apparently, there is no simple way to parse JSON using standart Unix tools. A lot of people recommend jq, and you should definitely use it if you’re frequently doing JSON at the command line. Other solutions exists using awk, but i found awk to be a bit confusing to parse JSON. A suggested awk solution, which i find complicated, is below:

echo $jsonStr | grep -Po '"keyName":.*?[^\\]",' | perl -pe 's/"keyName"://; s/^"//; s/",$//'

On the other hand, Python will most probably be available in every Unix-friendly environment that you’ll use, so i think it’s better to use it instead of adding a dependency and better than using awk because it’s more stable than your awk string.

We can use the same syntax to extract four values we need from the JSON string, whose key names are num, title, img and alt.

currentNum=`echo $jsonStr | getJsonVal num`

Comparing Values

At this point, we need to check if the current num is greater than the value we keep in our little log file. If not, it means we’ve already sent the latest comic to ourselves and there’s no need to send it again.

We’ve already done a comparison above by checking if our json string is empty. This time we will compare two variables, and use -gt (greater than) with similar syntax.

if [[ $currentNum -gt $lastSent ]]
then
    title=`echo $jsonStr | getJsonVal title`
    img=`echo $jsonStr | getJsonVal img`
    alt=`echo $jsonStr | getJsonVal alt`
    # create html string by concatenating the values above
    # send mail
fi

String Concatenation

In Bash, you can pretty much write variables one after another to concatenate them. This is valid syntax:

aAndB=$a$b

We can also add text directly before, after or in between variables, with the help of quotes:

concatStr=$a" and some text here "$b" a little more text here"

This is all we need to produce a simple HTML string for our email content. We have a title, an image and some alt text. So something like this could work:

htmlContent='<h2>'$title'</h2><br><img src="'$img'"><br><p>'$alt'</p>'

You’ll notice that the three strings we extracted from json are wrapped by quotes. To get rid of them we’ll be using a special Bash syntax, which works like a substring function that you’re familiar from languages like PHP or Javascript.

subString=${string:startPos:endPos}

We can also use negative indexes to get position from the end of the string. Remember, we want to get rid of first and last characters and keep the rest:

title=${title:1:-1}

Sending Email

If you’re going to run this on a real server with a domain attached, you probably already have postfix and mailutils installed and can send email.

But if you want to run this on your personal computer, you’ll probably need to install an MTA software and do some configuration.

If you’re using OSX, you have postfix by default. You can configure it to use your personal email address. This gist shows step by step how to do it.

If you’re not on OSX and don’t have any MTA, i would suggest ssmtp. It’s quite easy to install and configure. On Ubuntu, you can install it by:

sudo apt-get install ssmtp

After installation, you need to edit two configuaration files, /etc/ssmtp/ssmtp.conf and /etc/ssmtp/revaliases. I’ll share with you the content of my configuration files for gmail:

`/etc/ssmtp/ssmtp.conf`

mailhub=smtp.gmail.com:587
rewriteDomain=gmail.com
hostname=localhost
FromLineOverride=YES
UseTLS=YES
UseSTARTTLS=YES
AuthUser=username@gmail.com
AuthPass=password

`/etc/ssmtp/revaliases`
local_user_name:gmail_user_name@gmail.com:smtp.gmail.com:587

To test email using ssmtp:

echo "Test message" | sudo ssmtp -vvv your_email@example.com

To test email using mail:

echo "Test message" | mail -s "Subject" your_email@example.com

Hopefully, our test message made it to our inbox, so we can continue.

We just sent ourselves a text-based email. To send HTML, we only need to set MIME-version and Content-type headers. A minimum html email looks like this:

MIME-Version: 1.0
Content-type: text/html; charset=utf-8

our html content

Putting it All Together

The only step left, is to update our log file with the new id. We can easily do this by redirecting this id to our log file:

echo $currentNum > /our/log/file

This will override the content in the log file with the new id.

This is what our complete script looks like:

A view from my inbox A view from my inbox. Everything in its right place. Note: gazorbazorb [at] gmail is not my email address.

Setting Crontab

To make this work on a regular basis, we will set up crontab. On XKCD website, it says new comics are released every Monday, Wednesday and Friday. So we can check every Monday, Wednesday and Friday night at 23:00 by this:

0 23 * * 1,3,5 /path/to/our/script

I prefer to check every morning instead, at 8:45, as i like to read them on the commute to work:

45 8 * * * /path/to/our/script

One last useful trick; we’ll append the output of our script to another log file, preferably somewhere like /var/log/cron/. This will write everything our script echoes to this file. So if we need to debug our program, this file will come in handy, since we are echoing every time it worked and its decision about sending an email.

45 8 * * * /path/to/our/script >> /var/log/cron/xkcd.log

Conclusion

With crontab set, we certainly made sure that we’ll never miss new XKCD comics.

And last but not least, my two favorite XKCD comics; i couldn’t pick one over the other, so i’m putting them both here:

Exploits of a Mom

Incident

https://linuxacademy.com/blog/linux/conditions-in-bash-scripting-if-statements/ http://mywiki.wooledge.org/BashFAQ/031

© Ahmet Kun 2017