Lately I've been working on a lot of automation and monitoring projects, a big part of these projects are taking existing scripts and modifying them to be useful for automation and monitoring tools. One thing I have noticed is sometimes scripts use exit codes and sometimes they don't. It seems like exit codes are easy for poeple to forget, but they are an incredibly important part of any script. Especially if that script is used for the command line.
On Unix and Linux systems, programs can pass a value to their parent process while terminating. This value is referred to as an exit code or exit status. On POSIX systems the standard convention is for the program to pass
0 for successful executions and
1 or higher for failed executions.
Why is this important? If you look at exit codes in the context of scripts written to be used for the command line the answer is very simple. Any script that is useful in some fashion will inevitably be either used in another script, or wrapped with a bash one liner. This becomes especially true if the script is used with automation tools like SaltStack or monitoring tools like Nagios, these programs will execute scripts and check the status code to determine whether that script was successful or not.
On top of those reasons, exit codes exist within your scripts even if you don't define them. By not defining proper exit codes you could be falsely reporting successful executions which can cause issues depending on what the script does.
In Linux any script run from the command line has an exit code. With Bash scripts, if the exit code is not specified in the script itself the exit code used will be the exit code of the last command run. To help explain exit codes a little better we are going to use a quick sample script.
#!/bin/bash touch /root/test echo created file
The above sample script will execute both the
touch command and the
echo command. When we execute this script (as a non-root user) the touch command will fail, ideally since the touch command failed we would want the exit code of the script to indicate failure with an appropriate exit code. To check the exit code we can simply print the
$? special variable in bash. This variable will print the exit code of the last run command.
$ ./tmp.sh touch: cannot touch ‘/root/test’: Permission denied created file $ echo $? 0
As you can see after running the
./tmp.sh command the exit code was
0 which indicates success, even though the touch command failed. The sample script runs two commands
echo, since we did not specify an exit code the script exits with the exit code of the last run command. In this case, the last run command is the
echo command, which did execute successfully.
#!/bin/bash touch /root/test
If we remove the
echo command from the script we should see the exit code of the
$ ./tmp.sh touch: cannot touch ‘/root/test’: Permission denied $ echo $? 1
As you can see, since the last command run was
touch the exit code reflects the true status of the script; failed.
While removing the
echo command from our sample script worked to provide an exit code, what happens when we want to perform one action if the
touch was successful and another if it was not. Actions such as printing to
stdout on success and
stderr on failure.
Earlier we used the
$? special variable to print the exit code of the script. We can also use this variable within our script to test if the
touch command was successful or not.
#!/bin/bash touch /root/test 2> /dev/null if [ $? -eq 0 ] then echo "Successfully created file" else echo "Could not create file" >&2 fi
In the above revision of our sample script; if the exit code for
0 the script will
echo a successful message. If the exit code is anything other than
0 this indicates failure and the script will
echo a failure message to
$ ./tmp.sh Could not create file
While the above revision will provide an error message if the
touch command fails, it still provides a
0 exit code indicating success.
$ ./tmp.sh Could not create file $ echo $? 0
Since the script failed, it would not be a good idea to pass a successful exit code to any other program executing this script. To add our own exit code to this script, we can simply use the
#!/bin/bash touch /root/test 2> /dev/null if [ $? -eq 0 ] then echo "Successfully created file" exit 0 else echo "Could not create file" >&2 exit 1 fi
exit command in this script, we will exit with a successful message and
0 exit code if the
touch command is successful. If the
touch command fails however, we will print a failure message to
stderr and exit with a
1 value which indicates failure.
$ ./tmp.sh Could not create file $ echo $? 1
Now that our script is able to tell both users and programs whether it finished successfully or unsuccessfully we can use this script with other administration tools or simply use it with bash one liners.
Bash One Liner:
$ ./tmp.sh && echo "bam" || (sudo ./tmp.sh && echo "bam" || echo "fail") Could not create file Successfully created file bam
The above grouping of commands use what is called list constructs in bash. List constructs allow you to chain commands together with simple
&& for and and
|| for or conditions. The above command will execute the
./tmp.sh script, and if the exit code is
0 the command
echo "bam" will be executed. If the exit code of
1 however, the commands within the parenthesis will be executed next. Within the parenthesis the commands are chained together using the
|| constructs again.
The list constructs use exit codes to understand whether a command has successfully executed or not. If scripts do not properly use exit codes, any user of those scripts who use more advanced commands such as list constructs will get unexpected results on failures.
exit command in bash accepts integers from
0 - 255, in most cases
1 will suffice however there are other reserved exit codes that can be used for more specific errors. The Linux Documentation Project has a pretty good table of reserved exit codes and what they are used for.