Travis tools: Restart failed jobs and clean caches

Throughout my work on creating a travis.yml file for nextcloudpi project (read more about this here), I created several tools, which I believe would be useful to share.

Restart failed jobs

One usual issue I encountered with Travis was an unstable behavior of the armhf image, which was sometimes failing. I usually just had to restart the job and then it was passing. It appears that Travis boxes are doing something like this on and off and thus, restarting jobs is inevitable. Since it was something I had to do often, I decided to automate this.

Travis offers a command line client, travis-cli, which replaces the actions you used to do through Travis page with commands.

However, Travis-cli package seems to have some issues on the distro I’m using (By the way… I’m using Arch 😎 ), so I decided to use travis-cli docker image in my automation scripts. Docker is portable and its containers run anywhere.

The automated script I wrote to restart jobs is the following:

#!/usr/bin/env python3

"""

Automatic restarting failed jobs on Travis

This script constitutes an agent on host who
uses travis-cli to monitor the status of the
most recent build and restart any failed jobs.
Before running this script, generate a token on
Github page (https://github.com/settings/tokens)
and export it on host machine as an env var
named GITHUB_TOKEN
(EXPORT GITHUB_TOKEN=<github token>)
The script will return when the build passes.

    python restart_failed_jobs.py

"""

import subprocess, re, os, signal, sys

# Signal handler for termination signals
def termination_signal_handler(signalNumber, frame):
    print ('\nReceived signal no ', signalNumber, '\nKilling travis-cli container...')
    docker_kill()
    raise SystemExit (1)
    return

signal.signal(signal.SIGTERM, termination_signal_handler)
signal.signal(signal.SIGINT, termination_signal_handler)

# Killing the running container of travis-cli
def docker_kill():
    subprocess.run("docker kill travis-cli", shell=True)
    return

def main():

    # Travis cli configuration

    # Build the travis cli docker image
    subprocess.run("cd .travis/travis-cli && docker build . -t travis-cli && cd ../..", shell=True)

    # Restarts need to be made interactively so that travis login is verified
    subprocess.run("docker run --name travis-cli --rm -t -d -v $(pwd):/project --entrypoint=/bin/sh travis-cli", shell=True)

    # Get github token env var
    gh_token = os.environ['GITHUB_TOKEN']

    # Enter the running container with docker exec and login to travis
    command_docker = "docker exec travis-cli travis login --pro --org --github-token "
    command_docker += gh_token
    subprocess.run(command_docker, shell=True)

    # Travis Build

    build_state = ''
    restart_attempt = 0
    # A daemon will run this block of code until the build is successful
    while (build_state != 'passed'):

        flag_attempt = 1

        # Run travis show to get info about the build state and its jobs
        travis_show = subprocess.run("docker exec travis-cli travis show", shell=True, encoding='utf-8', stdout=subprocess.PIPE)
        travis_show = travis_show.stdout.split('\n')
        
        subprocess.run("sleep 5", shell=True)

        # Extract status and number of current build
        build_state = travis_show[1].split()[1]
        build_num = travis_show[0].split()[1].lstrip('#').rstrip(':')

        # Extract info about jobs
        jobs = []
        for line in travis_show:
            if line.startswith('#'):
                jobs.append(line)

        for job in jobs:
            if any(status in job for status in ['failed:', 'errored:', 'canceled:']):
                num = job.split()[0].split('.')[1]
                restart_job = 'docker exec travis-cli travis restart '
                restart_job += build_num + '.' + num
                if flag_attempt:
                    restart_attempt+=1
                    print ('\n===Restart attempt no ' + str(restart_attempt) + '===')
                print ('Job ' + build_num + '.' + num + ' has failed. Restarting...')
                subprocess.run(restart_job, shell=True)
                flag_attempt = 0
    
    # Kill travis-cli docker container
    docker_kill()

if __name__ == '__main__':
    try:
        main()
    except (SystemExit):
        print ('Exiting gracefully...')
    else:
        print ('Caught error. Killing travis-cli container...')
        docker_kill()
        e = sys.exc_info()[0]
        print('Error:', e)

Usage: As soon as you push your commit which triggers a Travis build, execute the restarting jobs script. If any job fails, the script will detect it and will restart it through the travis-cli container and will keep doing it until the job passes. Also, remember to export your github token before you run the script.

This script is actually parsing the output of travis show within a loop and if any jobs are found to have status failed/errored/canceled, it uses travis restart to restart it.

Clean caches

When I first started using the caching strategy in Travis, I noticed that caches were not cleaned after the build. Thus, when I started a new build, if there was a cache with the same name, Travis was using it. If the build is successful, then this caching strategy will speed up your new build. But if the build has failed, then it might cause problems to the new build as well. Thus, I prefer to clean the cache before each build.

I created the following script to do this automatically:

#!/usr/bin/env python3

"""

Automatic clearing travis cache

Before each build in Travis the cache
should be clean, for having a new clean
environment.
Before running this script, generate a
token on Github page 
(https://github.com/settings/tokens)
and export it on host machine as an
env var named GITHUB_TOKEN 
(EXPORT GITHUB_TOKEN=<github token>)

    python clear_travis_cache.py

"""

import subprocess, re, os, signal, sys

# Killing the running container of travis-cli
def docker_kill():
    subprocess.run("docker kill travis-cli", shell=True)
    return

def main():

    # Travis cli configuration

    # Build the travis cli docker image
    subprocess.run("cd .travis/travis-cli && docker build . -t travis-cli && cd ../..", shell=True)

    # Clearing cache needs to be made interactively so that travis login is verified
    subprocess.run("docker run --name travis-cli --rm -t -d -v $(pwd):/project --entrypoint=/bin/sh travis-cli", shell=True)

    # Get github token env var
    gh_token = os.environ['GITHUB_TOKEN']

    # Enter the running container with docker exec and login to travis
    command_docker = "docker exec travis-cli travis login --pro --org --github-token "
    command_docker += gh_token
    subprocess.run(command_docker, shell=True)

    # Run travis cache to delete all caches of the repo
    subprocess.run("docker exec travis-cli travis cache -f --delete", shell=True)

    # Kill travis-cli docker container
    docker_kill()

if __name__ == '__main__':
    try:
        main()
    except:
        print ('Caught error. Killing travis-cli container...')
        docker_kill()
        e = sys.exc_info()[0]
        print('Error:', e)

Usage: Execute this script before you push your commit so that the caches are clear on your new build. Also, remember to export your github token before you run the script.

GitHub token

As the usage comment inside the scripts informs you, before you execute any of them, you need to export your GitHub token. This derives from the fact that the travis-cli login command needs to be run interactively so that it can verify your credentials.

What the above automation scripts do is running the travis-cli container, and then using docker exec, they enter inside the container process to execute any travis-cli command.

Travis login commands though needs to be run specifically using your github token as a flag.

In order to make this script go public (and not hand my github token to everyone 😆 ), I use an environment variable named GITHUB_TOKEN. So if you intend to use the scripts, just execute the following command first:

$ export GITHUB_TOKEN=<github_token>

If you don’t have a GitHub token already, you can create one here.

Dockerfile

The Dockerfile I used for travis-cli docker image is this one.

If you’re using the automation scripts, you have to copy this Dockerfile under the directory .travis/travis-cli inside your repository.

The scripts are building it every time they run, in order to ensure that the docker image exists locally.