How-Tos

An Ultrasonic Range Sensor, Linux, Ruby, and an Arduino

1. Intro

Recently, I needed to count open-close machine cycles for a customer. We couldn’t trust the machine readouts, so we needed an external device to count the cycles accurately. This device also needed to be built quickly and easily moved from machine to machine. Those requirements pushed my thoughts toward using an ultrasonic sensor and an Arduino. Although Ruby is not the first language I normally think of for the client-side of something like this, the customer uses Ruby extensively in their enterprise. I knew they’d be able to support a Ruby app without any trouble. Below is a trimmed down version of what my associate Tom and I came up with.

2. Hardware

The device is very simple, consisting of three main components:

  1. Radio Shack Ultrasonic Range Finder Unit
  2. Arduino Uno Rev 3
  3. RadioShack Project Enclosure

Two holes were drilled into one side of the box for the Arduino’s USB port and power connector. However, we ended up powering the Arduino though just the USB connection. Two other holes were drilled in one end of the box for the ultrasonic’s emitter and receiver modules.

Holes Cut in Project Box

Holes Cut in Project Box

Following the how-to in the resources section below, the following pins were connected between the ultrasonic sensor and the Arduino with some spare computer connectors. Using something like SchmartBoard jumpers would make the connections even easier.

Ultrasonic Pin Arduino Pin
GND GND
VCC 5V
SIG Digital 7

The Arduino and ultrasonic sensor were then mounted into the project box using double sided tape and hot melt glue.

Mounted Arduino and Sensor

Mounted Arduino and Sensor

Next the box was mounted on a magnetic base and arm, and the unit was ready for software.

Finished Unit with Base on Side

Finished Unit with Base on Side


Finished Unit Standing Up On Base

Finished Unit Standing Up On Base

3. Software

The only thing that gave me trouble in Ruby was the read_timeout setting for the serial port. gets and readline wouldn’t work properly without setting this value much higher than I would have expected. With the timeout set too low, gets would return before it had read a full line from the Arduino. This would throw the cycle counting code off when a number like 68 was read as a 6 followed by an 8 in a separate read.

All of the code below is available on GitHub.

require 'serialport'

# Make sure we got the right number of arguments
if ARGV.size < 2
  puts "Usage: ruby client.rb [serial_port] [inches_to_target]"
  puts "Example: ruby client.rb /dev/ttyACM0 40"

  exit
end

# Latch variables so we only trigger once on a close or open
is_open = false
is_closed = false

# Keeps track of the number of open-close cycles
cycle_count = 0

# Parameters used to set up the serial port
port_str  = ARGV[0] # The serial port is grabbed from the command line arguments
baud_rate = 9600
data_bits = 8
stop_bits = 1
parity = SerialPort::NONE
 
 # Set up the serial port with the settings from above
sp = SerialPort.new(port_str, baud_rate, data_bits, stop_bits, parity)

# We have to set the read timeout to a very high value or we may get partial reads
sp.read_timeout=(1000)

# Grab the distance to the target from the command line arguments
inches_to_target = ARGV[1].to_i

 # Wait to make sure the serial port is initialized
sleep(2)

# Loop forever reading from the serial port
while true do
	# Grab the next string from the serial port
	value = sp.gets.chomp
	
	# Check to see if we have a closed condition within a +/- 3 inch range (adjust as needed)
	if not value.nil? and value.to_i > inches_to_target - 3 and value.to_i < inches_to_target + 3
		# Make sure the target wasn't already closed
		if not is_closed
			#puts "Closed"

			# If the target was previously open we want to increment the cycle count
			if is_open
				# Keep track of the cycle count
				cycle_count += 1

				# Let the user know what the current cycle count is
				puts cycle_count
			end			

			# Flip the latch bits so that we only enter here once
			is_closed = true
			is_open = false
		end
	# We're outside the range the defines the closed condition
	else
		# Make sure the target wasn't already open
		if not is_open
			#puts "Open"

			# Flip the latch bits so that we only enter here once
			is_open = true
			is_closed = false
		end
	end
end
  

The require 'serialport' line and lines 33 through 41 allow us to establish a connection to the Arduino. You'll need to make sure the serialport gem is installed as shown below in the Usage section. If you're not interested in the logic that determines open versus closed, focus on the sp.gets.chomp statement and ignore everything below that except the nil check.

I modified existing code for the Arduino, and the original source file is listed in the resources section under "Support Files". I changed the main loop to send only the range value (in inches) that I was interested in. See the original source file for an example of how to use centimeters.

// Main program loop
void loop()
{
  long rangeInInches; //The distance to the target in inches
  
  // Get the current signal time
  ultrasonic.DistanceMeasure();
  
  // Convert the time to inches
  rangeInInches = ultrasonic.microsecondsToInches();
  
  // Send the number of inches to the target to the client
  Serial.println(rangeInInches);
}

4. Usage

The device is set up by aiming the ultrasonic's emitter and receiver at the object that opens and closes. This could be something like a sliding door, a machine's parting line, or a robot arm that always returns to the same place during a cycle. An important thing to remember is that this sensor plays by different rules than an optical sensor. The ultrasonic sensor will register against clear things like Plexiglas. The best way to get good readings is by shooting against a hard surface that's perpendicular to the line of sight of the ultrasonic sensor. A measurement should be taken of the distance from the ultrasonic sensor to the target object. This will be used when starting the Ruby application.

This application has only been tested with Ruby 2.0.0, but should work fine with 1.9.3. If you have any questions on how to install Ruby on Linux, have a look at the RVM and Ruby parts of Ryan Bigg's blog post here. I would highly discourage you from installing Ruby from most Linux distribution repositories, especially Ubuntu's.

In order to get the Ruby app to run, the serialport gem has to be installed first.


$ gem install serialport

If the program is run without any arguments it will display a usage message.


$ ruby client.rb
Usage: ruby client.rb [serial_port] [inches_to_target]
Example: ruby client.rb /dev/ttyACM0 40

The serial port normally shows up as /dev/ttyACMx on my Ubuntu based laptop, where x is a number between 0 and 3 usually. In some cases your Arduino might show up as something like /dev/ttyUSBx. The inches_to_target argument is the distance to where the ultrasonic sensor should see the target (closed condition). If the application sees anything outside of a +/- range around this distance it will count it as an open condition. When it sees something within this range again (closed), it counts that as a cycle. At the end of each cycle the application outputs a line showing the cycle count. You could easily add code that would display the present time, the last time a cycle was made, and the difference between the two, which can give you the cycle time of a machine.

5. Conclusion

In practice this system has been fairly intuitive and easy to use, although the ultrasonic sensor's reliable range is far less than the vendor's spec of 157 inches. Again, an important thing to remember when trying to get reliable readings is to shoot against a hard surface, and keep the "beam" of the ultrasonic sensor as perpendicular (90 degrees) to the face of the target object as possible.

Still have questions? Have suggestions that will make the hardware or software better? Please let us know in the comments section.

6. Resources

  1. Ultrasonic Range Finder - Radio Shack
  2. Ultrasonic Range Finder User's Guide
  3. Ultrasonic Range Finder How-To - Radio Shack Blog
  4. Ultrasonic Range Finder Support Files
  5. Arduino Uno Rev 3
  6. ruby-serialport Ruby Serial Library
  7. Example of Using Ruby getc to Read From Arduino
  8. Using Ruby with the Arduino - Arduino Forums

Collaborative Design with Open Design Engine

1. Intro

Pulling together hardware designers from all over the Internet presents several challenges. Among these is the task of keeping things like project time lines, roadmaps, forums, wikis, and issue tracking organized and cohesive. There may also be the need to collect and organize design files such as CAD, which don’t fit well into the traditional paradigm used by source code management systems like git and CVS. The main reason for this is because CAD files tend to be binary, and are hard to diff or merge in any meaningful way.

Addressing these challenges any many more is what the Redmine based Open Design Engine project (Figure 1.1) is seeking to do. Open Design Engine, or ODE for short, is an open source web application that’s an initiative of Mach 30, an organization committed to fostering a space-faring future for the human race through safe, routine, reliable, and sustainable access to space. Mach 30′s mission is a lofty one but the organization has laid a solid foundation, which includes the implementation of ODE, on which to move toward this future.

Figure 1.1 – Open Design Engine (ODE) Banner

Mach 30 likes to “think of Open Design Engine as the Source Forge for hardware”1, and I would say that ODE is well positioned to fill that role. You can download ODE and self-host it if you like, or you can simply use the Mach 30 hosted version by going to opendesignengine.net.

I thought I would put together a quick start guide that covers a few of ODE’s basic features. If there’s anything that I miss please feel free to ask questions in the comments section, or have a look at the ODE forums .

2. Getting Started

The first thing that you’ll need to do is register. The Open Design Engine registration process is very similar to Redmine’s (Figure 2.1) except that ODE requires you to accept its Terms of Service (ToS) before completing the registration.

Redmine_Registration_Form
Figure 2.1 – Redmine’s Registration Form

Whenever you need to sign in to ODE again after registration, look for the Sign in link in the upper right hand corner of any ODE page. Once you’re signed in, the Sign in link will turn to Sign out and you’ll see the addition of your user name and a My account link (Figure 2.2). With registration completed you can start experimenting with some ODE features and evaluating any existing projects you think you’d like to be involved in. We’ll take a look at the Open Design Engine project entry (Figure 2.2) to help you get familiar with the layout.

ODE_Main_Project_Page
Figure 2.2 – The ODE Project Tabs on OpenDesignEngine.net

The first thing to notice is that there’s a Help link toward the upper left hand corner of every ODE page. This takes you directly to the Redmine guide which would be one good place to start if you’re having trouble with an aspect of ODE’s functionality. The next link over is Projects, which will take you to a list of all the projects currently hosted on ODE. The second link from the left is My page, which leads to a customizable page that summarizes your involvement in any ODE projects that list you as a developer or manager. To customize this page you click on the Personalize this page link (Figure 2.3).

Figure 2.3 – The “Personalize this page” Link on “My page”

3. Modules

The tabs in Figure 2.2 represent Redmine modules chosen by the project manager (Figure 3.1), and will vary from project to project. Several of the tabs/modules are self-explanatory, but there are others that warrant a closer look. For instance, the Issue and New issue tabs are interesting because “Issues” are the mechanism by which you can add items to a project’s time line. Without the issue tracking system, items will never show up on the Gantt chart and calendar, even if those modules are enabled.

Figure 3.1 – Modules Available in ODE

Adding a new issue is pretty straight-forward. Using the New issue tab (Figure 3.2), you first need to set the issue type via the Tracker pull-down. Your choices are Bug, Feature, and Support, which are a bug report, a feature request, and a request for support respectively. You can also set the Priority, who the issue is assigned to (Assignee), and which Target version from the project’s roadmap the issue applies to. The Start date, Due date, and Estimated time fields are where you start setting what shows up in the calendar and Gantt chart. Even if you’re not a member of a project’s team, you can still file an issue as long as you’re signed in to ODE.

New_Issue_Tab
Figure 3.2 – The “New issue” Tab

The Activity and News tabs are similar in that they’re used to keep track of what’s happening in a project. The items on the Activity tab are made up of things like wiki edits, messages, forum postings, and file uploads, and you can filter these updates using the check boxes in the right hand column (Figure 3.3).

Figure 3.3 – Activity Filter Check Boxes

The News module acts like a blog for the project, and the news posts will show up under Latest news if you added that block when customizing My page. The Wiki and Forums tabs are what you would expect, and I thought they were fairly intuitive to use. In my experience these are where the bulk of the planning and design collaboration happen.

The last thing I’m going to cover is ODE’s ability to handle files. There are three file handling modules that I feel are the most useful, and those are Files, Repository, and DMSF, although DMSF is technically a plug-in. The Files module is useful for distributing project packages. For instance, the Files tab for the Open Design Engine project is where you would look to download ODE so that you could host it yourself. The Repository module in ODE allows you to use a Subversion (and eventually a git) repository with your project. If you have any source code to manage in addition to the rest of your project’s files, this is an invaluable tool. That brings us to DMSF (Figure 3.4).

DMSF_Module_Tab
Figure 3.4 – The “DMSF” Tab

The DMSF documentation states that it aims to replace Redmine’s current Documents module, which is used for technical and user documentation. In my view the DMSF plug-in, especially when coupled with the Wiki module, does make the Documents module feel unneeded. Some of the features of DMSF include document versioning and locking, allowing multiple downloads at once via a zip file, and optional full-text searching through text-based documents. There are several other features, and they can be found in the DMSF documentation.

The operation of this plug-in is pretty straight-forward, but there are a few buttons that we’ll take a quick look at (Figure 3.5).

Figure 3.5 – The “DMSF” Module Buttons

Buttons 1 and 2 operate on the directory level, while buttons 3 through 6 are for the sub-directory and file levels. Button 1 allows you to edit the meta data (name, description, etc) for the directory level that you’re currently at. If you were to click the button at the level shown in Figure 3.4, you would end up editing the meta data for the Documents directory. Button 2 creates a new directory at the current level. If you clicked it in Figure 3.4 you would create a new folder directly under Documents. Button 3 gives you the ability to edit the meta data for a file in the same way that button 1 does for parent directories. Button 4 is only visible when you’re working with files and allows you to lock a file so that it can’t be changed by other project members. Be careful when you use this feature so that you don’t needlessly keep other contributors from getting the access that they need to push the project forward. Button 5 (the red ‘X’) deletes a file, and button 6 allows you to set whether or not modifications to a file show up in the project’s Activity tab.

Uploading a file using DMSF requires only a few button clicks. You first need to click the Add Files button which will bring up a standard open file dialog. Once you’ve selected the file, you can click the Start Upload button which will bring up the form shown in 3.6.

DMSF_Upload_Meta_Data
Figure 3.6 – The “DMSF” Upload Meta Data Setting

This form allows you to set the meta data for the file you’re uploading. If the file already exists in the current directory, ODE will automatically increment the minor version number for you. You can then set other attributes like whether the file has been approved or is awaiting approval, and even attach a comment that will stay with the file. Once these things are set you can click Commit to upload the file.

There’s much more that can be done when handling files in ODE, but that will hopefully give you a good start.

4. Conclusion

This post is essentially an ODE crash course since it would be much too large if I tried to cover everything that ODE has to offer. The best way to learn the other features that are available is to set up an account and start looking around. There’s a Sandbox project where you can start the learning process in a safe environment where your actions won’t hinder other users.

Once you get comfortable with contributing to projects on ODE you’ll be in a great position to start a project of your own. If you have a project (hardware, software, mechanical, medical, whatever) that you’ve always wanted to work on but haven’t, why not start it on ODE where you have the chance of getting input from designers all over the world?

Special thanks go to J. Simmons, president of the Mach 30 foundation, for his help in writing this post. The efforts of J and the rest of the Mach 30 board have brought Open Design Engine to life and made it available to all of us, and I look forward to seeing the planned improvements implemented over time. Open Design Engine has the potential to become an invaluable tool for hardware developers and designers in general, and I’ve enjoyed my time working with it.

Please feel free to leave any comments or questions below, and have a look at innovationsts.com for other projects, tips, how-tos, and service offerings available from Innovations Technology Solutions. Thanks for reading.

Resources

  1. Open Design Engine
  2. Open Design Engine Forums
  3. ODE’s Sandbox Project
  4. Mach30 Website
  5. Redmine Website
  6. Redmine User’s Guide
  7. DMSF Documentation

Make Self-Extracting Archives with makeself.sh

Intro

When making your custom scripts or software available to someone else, it’s a good idea to make that content as easy to extract and install as possible. You could just create a compressed archive, but then the end user has to manually extract the archive and decide where to place the files. Another option is creating packages (.deb, .rpm, etc) for the user to install, but then you’re more locked into a specific distribution. A solution that I like to use is to create a self-extracting archive file with the makeself.sh script. This type of archive can be treated as a shell script and will extract itself, running a scripted set of installation tasks when it’s executed. The reason this works is that the archive is essentially a binary payload with a script stub at the beginning. This stub handles the archive verification and extraction process and then runs any predefined commands via a script specified at the time the archive is created. This model offers you a lot of flexibility, and can be used not only for installing scripts and software but also for things like documentation.

Video

Audio

Download

Installation

The makeself.sh script is itself packaged as a self-extracting archive when you download it. You can extract the script and its support files by running the makeself.run installer with a Bourne compatible shell (Listing 1).

Listing 1

$ sh makeself.run Creating directory makeself-2.1.5 Verifying archive integrity... All good. Uncompressing Makeself 2.1.5........ Makeself has extracted itself. $ ls makeself* makeself.run makeself-2.1.5: COPYING makeself.1 makeself-header.sh makeself.lsm makeself.sh README TODO

You can see from the output that I’m working with version 2.1.5 of makeself.sh for this post. To make things easier, you can install makeself.sh in your ~/bin directory, and then make sure $HOME/bin is in your PATH environment variable. You need to ensure that makeself.sh and makeself-header.sh are in the directory together unless you’re going to specify the location of makeself-header.sh with the --header option (Listing 3).

General Usage

Listing 2 shows the usage syntax for makeself.sh.

Listing 2

makeself.sh [OPTIONS] archive_dir file_name label startup_script [SCRIPT_ARGS]

After the OPTIONS, you need to supply the path and name of the directory that you want to include in the archive. The next argument is the file name of the self-extracting archive that will be created. You can choose any name you want, but for consistency and clarity it’s recommended that the file have a .run or .sh file name extension. Next, you can specify a label that will act as a short description of the archive and will be displayed during extraction. The final argument to makeself.sh is the name of the script that you want to have run after extraction is complete. In turn, this script can have arguments passed to it that are represented by [SCRIPT_ARGS] in Listing 2. It’s important not to get the arguments to the startup script confused with the arguments to makeself.sh.

Listing 3 shows some of the options for use with makeself.sh. You can find a comprehensive list on the makeself.sh webpage, but in my own experience I’m usually only concerned with the options listed here.

Listing 3

--gzip : Use gzip for compression (default setting) --bzip2 : Use bzip2 for better compression. Use the '.bz2.run' file name extension to avoid confusion on the compression type. --header : By default it's assumed that the "makeself-header.sh" header script is stored in the same location as makeself.sh. This option can be used to specify a different location if it's stored somewhere else. --nocomp : Do not use any compression, which results in an uncompressed TAR file. --nomd5 : Disable the creation of an MD5 checksum for the archive which speeds up the extraction process if you don't need integrity checking. --nocrc : Same as --nomd5 but disables the CRC checksum instead.

In addition to the options passed to makeself.sh when creating the archive, there are options that you can pass to the archive itself to influence what happens during and after the extraction process. Listing 4 shows some of these options, but again please have a look at the makeself.sh webpage for a full list.

Listing 4

--keep : Do not automatically delete any files that were extracted to a temporary directory. --target DIR : Set the directory (DIR) to extract the archive to. --info : Print general information about the archive without extracting it. --list : List the files in the archive. --check : Check the archive for integrity. --noexec : Do not run the embedded script after extraction.

Example

Let’s go through a practical example using some of the information above. If you had a directory named myprogram within your home directory and you wanted to package it, you could create the archive with the command line at the top of Listing 5.

Listing 5

$ makeself.sh --bzip2 myprogram/ myprogram.bz2.run "The Installer For myprogram" ./post_extract.sh Header is 402 lines long About to compress 20 KB of data... Adding files to archive named "myprogram.bz2.run"... ./ ./myprogram.c ./post_extract.sh ./myprogram CRC: 955035546 MD5: 7b74c31f31589ee236dea535cbc11fe4 Self-extractible archive "myprogram.bz2.run" successfully created.

Notice that I used bzip2 compression via the --bzip2 option rather than using the default of gzip. I couple this with setting the file name extension to .bz2.run so that the end user will have a way of knowing that I used bzip2 compression. After the compression option, I pass an argument requesting that the myprogram directory, which contains a simple C program also called myprogram, be added to the archive. After the file name specification (with the .bz2.run extension), we come to the description label for the archive. This can be a string of your choosing and will be displayed with the output from the extraction process. The last argument is the “startup script” that will be run when the archive is extracted. Listing 6 shows the contents of my simple startup script (post_extract.sh) that installs the myprogram binary in the user’s bin directory, but only if they have one.

Listing 6

#!/bin/sh #Install to ~/bin if it exists if [ -d $HOME/bin ] then cp myprogram $HOME/bin/ fi

Notice that when specifying the startup script, I used the path of ./ which points to the current directory. This is a reference to the directory after the extraction, not the directory where the script resides when you’re creating the archive. Your startup script should be inside the directory that you’re adding to the archive. One other thing to note about the startup script is that you will need to set its execute bit before creating the archive. Otherwise you’ll get a Permission denied error when the makeself-header script stub tries to execute the script.

Now we transition to the end user viewpoint, where the self-extracting archive has been downloaded and we’re getting ready to run it. You can set the execute bit of the archive and run it directly, or execute it with a Bourne compatible shell the way the makeself.run installer was: sh makeself.run . Before we extract the archive though, lets verify its integrity and have a look the contents (Figure 7).

Listing 7

$ sh myprogram.bz2.run --check Verifying archive integrity... MD5 checksums are OK. All good. $ $ sh myprogram.bz2.run --list Target directory: myprogram drwxr-xr-x jwright/jwright 0 2011-12-20 13:49 ./ -rw-r--r-- jwright/jwright 66 2011-12-20 11:45 ./myprogram.c -rw-r--r-- jwright/jwright 99 2011-12-20 11:49 ./post_extract.sh -rwxr-xr-x jwright/jwright 7135 2011-12-20 11:45 ./myprogram

We can see from the first command that the archive is intact and that there are no errors. The second command shows us that the archive contains 3 files. The first is the source file myprogram.c which I left in the archive directory so that I could have the option of giving the user the source code. The next file is the startup script that will be run after extraction. The last file of course is the binary that our end user is wanting to install. Lets go ahead and install myprogram by using the execute bit on the archive (Listing 8).

Listing 8

$ chmod u+x myprogram.bz2.run $ ./myprogram.bz2.run Verifying archive integrity... All good. Uncompressing The Installer For myprogram....

Now to test that the installation worked, we can try to run myprogram (Figure 9).

Listing 9

$ myprogram Hello world!

I can see that the program is present and did exactly what I expected it to do. Keep in mind that if ~/bin is not in your PATH variable you’ll have to supply the full path to the myprogram binary.

Conclusion

This has been a quick overview of what makeself.sh can do. I’ve found it to be a very useful script that is also very dependable and easy to use. Through the use of the startup script, along with the full complement of options, makeself.sh offers you a lot of flexibility when creating installers. You can create this type of self-extracting archive manually, but makeself.sh makes it much easier and adds great features like checksum validation.

Please feel free to leave any comments or questions below, and have a look at innovationsts.com for other projects, tips, how-tos, and service offerings available from Innovations Technology Solutions. Thanks for reading.

Resources

  1. makeself.sh Homepage
  2. makeself.sh GitHub Page
  3. Linux Journal Article on How to Make Self-Extracting Archives Manually

Bodhi Linux On a Touchscreen Device

Intro

Welcome, in this blog post we’re going to set Bodhi Linux up on a touchscreen device. Since the last post covered touchscreen calibration, I thought I would go one step beyond that by choosing and configuring a distribution to make the touchscreen easy to use (on-screen keyboard, finger scrolling, etc). This post won’t be an exhaustive run through of everything that you can do with Bodhi on a touchscreen system, but my hope is to give you a good start. Please feel free to talk about your own customizations and ways of doing things in the comment section. We’ll be focusing on desktop touchscreens and Intel based tablets here, but Bodhi also has an ARM version that’s currently in alpha. The ARM version of Bodhi will officially support Archos Gen 8 tablets initially, and then expand support out from there. I’m using Bodhi because it has a nice Enlightenment Tablet profile that I think makes using a touchscreen system fairly natural and intuitive. You of course could also use another distro like Ubuntu (Unity) or Fedora (Gnome Shell) with your touchscreen but, as I mentioned, I’m partial to Bodhi for this use.

Video

Audio

Download

The Software

For this post I installed Bodhi 1.2.0 (i386) and used xinput-calibrator as the touchscreen calibration utility. I wrote a Tech Tip on xinput-calibrator last month that you can find here. If your touchscreen doesn’t work correctly out of the box, I would suggest following the instructions in that blog post before moving on. If you’re new to Bodhi Linux, you might want to have a look at their wiki. I’ve also found Lead Bodhi Developer Jeff Hoogland’s blog to be very informative, especially when I was setting Bodhi up for this post. Jeff and the other users on the Bodhi forum are very nice and helpful if you want to ask questions too.

The Hardware

My test machine was an Intel based Lenovo T60 laptop with an attached Elo Touchsystems 1515L 15″ Desktop Touchmonitor. Even if you’re working with Bodhi Linux on an ARM device though, you’ll still be able to take a lot of tips away from this post.

Installation

I put a standard installation of Bodhi on the Lenovo T60 by simply following the on-screen instructions. Once I had it installed, I booted the system and ended up at the initial Profile selection screen.


The Bodhi Linux Profile Selection Screen

Since Bodhi uses Enlightenment for it’s desktop manager, this profile selection gives you an easy way to customize the Enlightenment UI for the way you’ll use it. In this case we’ll be interacting with Bodhi via a touchscreen, so we want to choose the Tablet profile. The next screen is theme selection, and for our purposes it doesn’t matter which theme you choose.

Once you’ve chosen a theme you should be presented with the Bodhi tablet desktop. The first thing that I notice on my machine is that the Y-axis of the touchscreen is inverted. When I touch the bottom of the screen the cursor jumps to the top, and vice versa. In order to fix that we need to get the machine on a network so that we can download and install the screen calibration utility. Bodhi’s network manager applet is easy to find on the right hand side of the taskbar. After clicking on that and setting up my local wireless network, I’m ready to download and install my preferred screen calibration utility – xinput-calibrator. As I mentioned, I wrote a blog post about xinput-calibrator last month.

Customization

Now we can start on the customizations that will make our touchscreen system easier to use. The first thing that I did was install Firefox. If you’re running on a lower power device you might want to stick with Midori, which is Bodhi’s default browser. If you use Firefox, there’s a nice add-on called Grab and Drag that allows you to do drag and momentum scrolling. As you’ll see the first time you run it, Grab and Drag has quite a few settings and I think it’s worth the time to look through them. One other thing that I like to do with Firefox on a touchscreen device is hide the menu bar, but that’s just my personal preference.

If you’re going to run Midori, you’re not out of luck on touch and drag scrolling. You can add the environment variable declaration export MIDORI_TOUCHSCREEN=1 somewhere like ~/.profile to enable touch scrolling. The drawback is that touch scrolling in Midori is not all that easy to use because it doesn’t distinguish between a touch to scroll, and a touch to drag an image or select text. I’ve also found that setting the MIDORI_TOUCHSCREEN variable on Bodhi 1.2.0 can be a little finicky, so if all else fails you can prepend MIDORI_TOUCHSCREEN=1 to the command in the Exec line of Midori’s .desktop file. In version 1.2.0, a search for midori.desktop finds this file.

Xournal is an application that allows you to write notes and sketch directly on the touchscreen. If you want to take notes on your touchscreen device, this is an application that you’ll want to check out. If you want to see Xournal in action, you can watch the videos below that have sections showing Jeff Hoogland using Xournal and Bodhi’s Tablet profile. In the videos you’ll see that Jeff uses his finger which worked okay for me, but to get nicer looking notes on the 1515L I had to switch to a stylus. If you want to install Xournal, just look for references to the xournal package in your package manager or download the latest version from the Xournal website.

Another customization that I make is to set the file manager up to respond to single clicks. Bodhi 1.2.0 uses PCManFM 0.9.9 as its default file manager, so to do this open it and click Edit -> Preferences in the menu. On the General tab make sure that the Open files with single click box is checked. Alternatively, you can use the less complete but more touch friendly EFM (Enlightenment File Manager). To use EFM, you’ll need to load the EFM (Starter) module under Modules -> Files. Once you’ve loaded the module, you can launch it by touching the Bodhi menu on the left hand side of the taskbar and then Files -> Home. The first time you use EFM you’ll need to add the navigation controls by right clicking on the toolbar, clicking toolbar -> Set Toolbar Contents, and then clicking on EFM Navigation followed by a click of the Add Gadget button. Please keep in mind that EFM is a work in progress, so it’s not feature-complete.


The Enlightenment File Manager (EFM)

I’ve got PDF copies of two of the Linux magazines I normally read, so another addition I make is to install Acrobat Reader or an open source PDF reader. It’s best if you choose a reader with drag to scroll capability like Adobe Reader. If you do use Adobe Reader, make sure that you have the Hand tool selected and use a continuous page view for the easiest scrolling.

If you’re going to view images on your touchscreen system, you may want to install Ephoto which is a simple image viewer for Enlightenment. On a Bodhi/Ubuntu/Debian based system a search for the ephoto package should find what you need to install.


The Ephoto Image Viewer For Enlightenment

General Usage

Below are a few tips for when you’re using your newly set up touchscreen system. So that you can see what’s possible when running Bodhi’s Tablet profile, I’ve included the two embedded videos below from Jeff Hoogland.

  • There is an applications menu button on the right side of the quick launch bar (bottom of the screen). Clicking this button will bring up a set of Applications along with Enlightenment Widgets, and Bodhi 1.2.0 seems to have a placeholder for a Config subset. There is also a more traditional applications menu button on the left end of the taskbar.
  • You can touch and hold down on an icon (launcher) in the applications menu until it lets you drag it. You can then drag the launcher to the desktop or the quick launch bar.
  • If you touch and hold the desktop, it’s icons and the icons in the quick launch bar will start to swing and will have red X’s beside them. If you click on one of the red X’s you’ll remove that launcher. Click on the big red X in the lower right-hand corner of the screen to exit this mode.
  • To change to another workspace, simply drag your finger from right to left across the screen. There is a set of dots just above the quick launch bar that shows you which workspace you’re in. Each of the workspace desktops can be customized with their own set of icons, but the taskbar and quick launch bar stay the same.
  • You can touch the Scale windows button on the left of the task bar to get a composited window list. Once you have this list, you can close windows simply by touching and dragging them off the screen.


The Scale Windows Button On The Tablet Profile Taskbar

Bodhi Linux Tablet Usage Videos


Jeff Hoogland Showing Bodhi Linux On A Dell Duo


Jeff Hoogland Demonstrating Bodhi Linux On An ARM Device

Possible Issues

Below is a list of things that might cause you some trouble and/or confusion.

  • In my experience when the GUI asked for an administrator password, I couldn’t enter it because the dialog was modal and didn’t allow me to get to the on-screen keyboard button. A good example of this happens when I try to launch the Synaptic Package Manager.
  • If you have trouble closing a window with the Bodhi close button (far right side of the taskbar), try touching the window first to make sure it’s in focus.
  • The on-screen keyboard is not context sensitive and does not do auto-completion. I wasn’t personally bothered by this, but some avid users of other tablet and smartphone platforms might be.
  • Support for screen rotation (from portrait to landscape) will be hit and miss, and depends almost exclusively on community support. Unfortunately, many devices have closed specs so reverse engineering becomes the only solution.

Conclusion

That concludes this quick Project. Please feel free to leave any comments or questions below. Before signing off, I’d like to thank Jeff Hoogland for being so helpful in answering my questions while I was writing this post. A great community has gathered around Bodhi, and I’m looking forward to see where Jeff and his team take the distro in the future. If you haven’t tried Bodhi yet, I highly encourage you to head over to their website and have a look. Also, have a look at innovationsts.com for other projects, tips, how-tos, and service offerings available from Innovations Technology Solutions. Thanks for reading.

Resources

  1. Bodhi Linux for ARM Alpha 1 – Jeff Hoogland
  2. ARM Section of Bodhi Linux Forum
  3. Bodhi Linux Forum – Arm Version of Bodhi Discussion
  4. HOWTO: Linux on the Dell Inspiron Duo
  5. Bodhi Linux Website
  6. Lead Bodhi Developer Jeff Hoogland’s Blog
  7. xinput-calibrator freedesktop.org Page

Tech Tip – Touchscreen Calibration In Linux

Intro

Welcome, this is an Innovations Tech Tip. I recently did some work with an ELO Touchsystems 1515L 15″ LCD Desktop Touchmonitor. I was pleased with the touchmonitor’s hardware and performance, but in order to make it work properly in Linux I had to find a suitable calibration program. Out of the box on several distributions this touchscreen exhibits Y-axis inversion, where touching the top of the screen moves the cursor to the bottom and vice versa. xinput-calibrator is a freedesktop.org project that worked well for calibration, fixing the Y-axis inversion issue, and as a bonus it works for any standard Xorg touchscreen driver.

Video

Audio

Download

The Software

For this post I tested on Bodhi Linux 1.2.0 (based on Ubuntu 10.04 LTS), Fedora 15, and Ubuntu 11.04. xinput-calibrator, as I mentioned, was the screen calibration utility.

The Hardware

My test machine was an Intel based Lenovo T60 laptop with an attached ELO Touchsystems 1515L 15″ LCD Desktop Touchmonitor.

Installation

Click here to go to xinput-calibrator‘s website and choose your package. Be aware that if you’re using the ARM version of Bodhi (in alpha at the time of this writing) it’s based on Debian, so you’ll want to grab the Debian testing package. You can also add a PPA if you’re running Ubuntu, but I had trouble getting that to work during my tests. Last but not least, you can grab the source and compile it yourself by downloading the tarball or using git.

Before you actually install xinput-calibrator on a freshly installed Debian based system (including Ubuntu and Bodhi), make sure to update your package management system or you’ll get failed dependencies. This is because the package management system doesn’t know what packages are available in the repositories yet. This isn’t a problem with Fedora since the package management index is updated every time you use YUM. Once you’ve ensured that the system is or will be updated, you’ll be ready to install xinput-calibrator via the package that you downloaded or the PPA.

Calibration

Once xinput-calibrator is installed, it should show up in your application menu(s). Look for an item labeled “Calibrate Touchscreen”. If you don’t see it anywhere, you can launch it from the terminal with the xinput_calibrator (note the underscore) command.


Figure 1 – xinput_calibrator screenshot

Using It

The use of xinput-calibrator is very simple. You’re presented with a full-screen application that asks you to touch a series of 4 points. The instructions say that you can use a stylus to increase precision, but I find that using my finger works well for the ELO touchscreen. One of the nice features of xinput-calibrator is that it’s smart enough to know when it encounters an inverted axis. After I run through the calibration the Y-axis inversion problem is fixed, so I’m ready to start using the touchscreen.

Persistent Calibration

You’ll probably want your calibration to persist across reboots, so you’ll need to do a little more work now to make the settings permanent. First you’ll need to run the xinput_calibrator command from the terminal and then perform the calibration.

Listing 1

$ xinput_calibrator Calibrating EVDEV driver for "EloTouchSystems,Inc Elo TouchSystems 2216 AccuTouch® USB Touchmonitor Interface" id=9 current calibration values (from XInput): min_x=527, max_x=3579 and min_y=3478, max_y=603 Doing dynamic recalibration: Setting new calibration data: 527, 3577, 3465, 600 --> Making the calibration permanent <-- copy the snippet below into '/etc/X11/xorg.conf.d/99-calibration.conf' Section "InputClass" Identifier "calibration" MatchProduct "EloTouchSystems,Inc Elo TouchSystems 2216 AccuTouch® USB Touchmonitor Interface" Option "Calibration" "527 3577 3465 600" EndSection

Toward the bottom of the output you can see instructions for "Making the calibration permanent". This section will vary depending what xinput_calibrator detects about your system. In my case under Ubuntu the output was an xorg.conf.d snippet, which I then copied into the xorg.conf.d directory on my distribution. Be aware that even though the output says that xorg.conf.d should be located in /etc/X11, it might actually be located somewhere else like /usr/share/X11 on your distribution. Once you've found the xorg.conf.d directory you can use your favorite text editor (with root privileges) to create the 99-calibration.conf file inside of it. Now when you reboot, you should see that your calibration has stayed in effect.

If you have a reason to avoid using an xorg.conf.d file to store your calibrations, you can run xinput_calibrator with the --output-type xinput option/argument combo.

Listing 2

$ xinput_calibrator --output-type xinput Calibrating EVDEV driver for "EloTouchSystems,Inc Elo TouchSystems 2216 AccuTouch® USB Touchmonitor Interface" id=9 current calibration values (from XInput): min_x=184, max_x=3932 and min_y=184, max_y=3932 Doing dynamic recalibration: Setting new calibration data: 524, 3581, 3482, 591 --> Making the calibration permanent <-- Install the 'xinput' tool and copy the command(s) below in a script that starts with your X session xinput set-int-prop "EloTouchSystems,Inc Elo TouchSystems 2216 AccuTouch® USB Touchmonitor Interface" "Evdev Axis Calibration" 32 524 3581 3482 591

At the bottom of this output you can see that there are instructions for using xinput to make your calibration persistent. If it's not already present, you'll need to install xinput and then copy the command line in xinput_calibrator's instructions into a script that starts with your X session. You can usually also add it to your desktop manager's startup programs via something like gnome-session-properties if you would prefer.

Another option that might be of use to you is -v. The -v (--verbose) option displays extra output so that you can see more of what's going on behind the scenes. If you have any trouble getting your calibration to work, this would be a good place to start.

Your output will probably vary from what I have here depending on what type of hardware you have and which distribution you run. For instance, on Fedora 15 I get the xinput instructions by default instead of an xorg.conf.d snippet. Make sure that you run the above commands yourself, and don't copy the output from my listings.

If you have a desire or need to redo the calibration periodically, you might want to consider creating a wrapper script to automate the process of making the calibration permanent. Such a script might use sed to strip out the relevant code and then a simple echo statement to dump it into the correct xorg.conf.d file or startup script.

Wrapping Up

That concludes this Tech Tip. Have a look at innovationsts.com for other tips, projects, how-tos, and service offerings available from Innovations Technology Solutions. Thanks, and stay tuned for more from Innovations.

<

Resources

  1. xinput-calibrator Page
  2. xinput-calibrator On Github
  3. freedesktop.org Page
  4. Bodhi Linux Website
  5. Ubuntu Linux Website
  6. Fedora Linux Website

Video Tip – Finding Open IP Addresses

Intro

Welcome, this is an Innovations Tech Tip. In this tip we’re going to explore a couple of ways to find open IP (Internet Protocol) addresses on your network. You might need this information if you were going to temporarily set a static IP address for a host. Even after you’ve found an open IP though, you still need to take care to avoid IP conflicts if your network uses DHCP (Dynamic Host Configuration Protocol). Please also be aware that one of these techniques uses the nmap network scanning program, which may be against policy in some environments. Even if it’s not against corporate policy, the nmap man page states that “there are administrators who become upset and may complain when their system is scanned. Thus, it is often advisable to request permission before doing even a light scan of a network.”2

Video

Audio

Download

arping

The first technique that we’re going to cover is the use of the arping command to tell if a single address is in use. arping uses ARP (Address Resolution Protocol) instead of ICMP (Internet Control Message Protocol) packets. The reason this is significant is because many firewalls will block ICMP traffic as a security measure. So when using ICMP you’re never sure whether the host is really down, or if it’s just blocking your pings. ARP pings will almost always work because ARP packets are used to provide the critical network function of resolving IP addresses to MAC (Media Access Control) addresses. Hosts on an Ethernet network will use these resolved MAC addresses to communicate instead of IPs. Be aware that one case in which ARP pings will not work is when you’re not on the same subnet as the host you’re trying to ping. This is because ARP packets are not routed. See Resource #3 below for more details.

arping has several options, but the three that we’ll be focusing on here are -I, -D, and -c . The -I option specifies the network interface that you want to use. In many cases you might use eth0 as your interface, but I’m using a laptop connected via wireless and my interface is wlan0 . The -D option checks the specified address in DAD (Duplicate Address Detection) mode. Let’s look at an example.

Listing 1

$ arping -I wlan0 -D 192.168.1.1 ARPING 192.168.1.1 from 0.0.0.0 wlan0 Unicast reply from 192.168.1.1 [D4:4D:D7:64:C6:5F] for 192.168.1.1 [D4:4D:D7:64:C6:5F] 2.094ms Sent 1 probes (1 broadcast(s)) Received 1 response(s)

You can see that I’m pinging 192.168.1.1 (a known router) with the -D option. If no replies are received DAD mode is considered to have succeeded, and you can be reasonably sure that address is free for use. Listing 2 shows an example of what you would see if the address is not in use.

Listing 2

$ arping -I wlan0 -c 5 -D 192.168.1.76 ARPING 192.168.1.76 from 0.0.0.0 wlan0 Sent 5 probes (5 broadcast(s)) Received 0 response(s)

Here I’ve picked a different network address that I knew would be unused. I’ve also added the -c option mentioned above so that I could have arping stop after sending 5 requests. Otherwise arping would keep trying until I interrupted it (possibly via the Ctrl-C key combo).

Armed with this information and a knowledge of any dynamic addressing scheme on my network, I can set a temporary static IP for a host. See Resource #1 for more information on arping.

nmap

nmap, which stands for “Network MAPper”, was “designed to rapidly scan large networks…to determine what hosts are available on the network, what services (application name and version) those hosts are offering, what operating systems (and OS versions) they are running, what type of packet filters/firewalls are in use, and dozens of other characteristics.”2 We’ll be using this to find all of the currently used IP addresses on the network.

nmap has many options and is a very deep utility, and I highly suggest spending some time reading its man page. Of all these options, the only one that we’ll be dealing with in this quick tech tip is -e. The -e option allows you to specify the interface to use when scanning the network. This is similar to the -I option of arping. The example below shows a simple usage.

Listing 3

$ nmap -e wlan0 192.168.1.0/24 Starting Nmap 5.21 ( http://nmap.org ) at 2011-08-23 11:13 EDT Nmap scan report for 192.168.1.1 Host is up (0.033s latency). Not shown: 996 closed ports PORT STATE SERVICE 23/tcp open telnet 53/tcp open domain 80/tcp open http 5000/tcp open upnp Nmap scan report for 192.168.1.7 Host is up (0.00015s latency). Not shown: 997 closed ports PORT STATE SERVICE 111/tcp open rpcbind 5900/tcp open vnc 8080/tcp open http-proxy Nmap scan report for 192.168.1.10 Host is up (0.033s latency). Not shown: 995 closed ports PORT STATE SERVICE 22/tcp open ssh 111/tcp open rpcbind 139/tcp open netbios-ssn 445/tcp open microsoft-ds 2049/tcp open nfs Nmap done: 256 IP addresses (3 hosts up) scanned in 4.22 seconds

The first thing to notice is the notation that I used to specify the network submask (/24). If you’re unfamiliar with this notation, please see Resource #5 below. The next thing to notice is that nmap gives us a lot more information than just what IPs are in use. nmap also shows us things like what ports are open on each host, and what service it thinks is running on each port. As a network administrator you can use this information to get a quick overview of your network, or you can dig deeper into nmap to perform in-depth network audits. In our case we’re just looking for an open IP address to use temporarily, so we can choose one that’s not listed. Again, care needs to be taken when statically setting IPs on a network with DHCP. Have a look at Resource #4 for a more comprehensive guide to using nmap.

That concludes this Tech Tip. Have a look at innovationsts.com for other tips, tricks, how-tos, and service offerings available from Innovations Technology Solutions. Thanks, and stay tuned for more from Innovations.

Resources

  1. man arping
  2. man nmap
  3. Linux.com – Gerard Beekmans – Ping: ICMP vs. ARP
  4. Network Uptime – James Messer – Secrets of Network Cartography: A Comprehensive Guide to nmap
  5. About.com – Bradley Mitchell – CIDR – Classless Inter-Domain Routing

Video Tip – Using Pipes With The sudo Command

Summary

Welcome, this is an Innovations Tech Tip. In this tip we’re going to cover how to run a command sequence, such as a pipeline, using sudo which is sometimes also pronounced “pseudo”. It may be tempting to think of the “su” in sudo as standing for “super user” since, especially if you’re an Ubuntu user, you normally use sudo to execute things as root. Something that may surprise you though is that you can use the -u option of sudo to specify a user to run the command as. This is assuming that you have the proper privileges. Have a look at the sudo man and info pages for more interesting options.

Video

Audio

Download

Now, if you’ve ever tried to use sudo to run a command sequence such as a pipeline, where each step required superuser priveleges, you probably got a Permission denied error. This is because sudo only applies to the first command in the sequence and not the others. There are multiple ways to handle this, but there are two that stand out to me. First, you can use sudo to start a shell (such as bash) with root priveleges, and then give that shell the command string. This can be done using the -c option of bash. To illustrate how this works, I’ll start out using sudo to run cat on a file that I created in the /root directory that I normally wouldn’t have access to.

Listing 1

$ cat /root/example.txt cat: /root/example.txt: Permission denied $ sudo cat /root/example.txt [sudo] password for jwright: You won't see this text without sudo.

If I try to use sudo with a pipeline to make a compressed backup of the /root/example.txt file, I again get the Permission denied error.

Listing 2

$ sudo cat /root/example.txt | gzip > /root/example.gz -bash: /root/example.gz: Permission denied

Notice that it’s the second command (the gzip command) in the pipeline that causes the error. That’s where our technique of using bash with the -c option comes in.

Listing 3

$ sudo bash -c 'cat /root/example.txt | gzip > /root/example.gz' $ sudo ls /root/example.gz /root/example.gz

We can see form the ls command’s output that the compressed file creation succeeded.

The second method is similar to the first in that we’re passing a command string to bash, but we’re doing it in a pipeline via sudo.

Listing 4

$ sudo rm /root/example.gz $ echo "cat /root/example.txt | gzip > /root/example.gz" | sudo bash $ sudo ls /root/example.gz /root/example.gz

Either method works, it’s just a matter of personal preference on which one to use.

That concludes this Tech Tip. Have a look at innovationsts.com for other tips, tricks, how-tos, and service offerings available from Innovations Technology Solutions. Thanks, and stay tuned for more quick tips from Innovations.

Resources

  1. man bash
  2. man sudo
  3. The Ink Wells – James Cook
  4. Linux Journal – Don Marti – Running Complex Commands with sudo
  5. bash Cookbook – Albing, Vossen, Newham

Writing Better Shell Scripts – Part 3

Quick Start

This post doesn’t really lend itself to being a quick read, but you can have a look at the How-To section of this post and skip the rest if you’re in a hurry. I would highly recommend reading everything though, since there’s a lot of information that may serve you well in the future. There is also Video attached to this post that may be a good quick reference for you. Don’t forget that the man and info pages of your Linux/Unix installation can be an invaluable resource as well when you’re trying to learn new concepts and solve problems.

Video

Audio

Download

Preface

To make things easier on you, all of the black command line and script areas are set up so that you can copy the text from them. This does make using the commands and scripts easier, but if you’re not already familiar with the concepts presented here, typing the commands/code yourself and working through why you’re typing them will help you learn more. If you hit problems along the way, take a look at the Troubleshooting section near the end of this post for help.

There are formatting conventions that are used throughout this post that you should be aware of. The following is a list outlining the color and font formats used.

Command Name or Directory Path
Warning or Error
Command Line Snippet With Commands/Options/Arguments
Command Options and Their Arguments Only
Hyperlink

Overview

There is no way for me to cover all of the issues surrounding shell script security in a single blog post. My goal with this post is to help you avoid some of the most common security holes that are often found in shell scripts. No script can be un-crackable, but you can make the cracker’s task more challenging by following a few guidelines. A secondary goal with this post is to make you more savvy about the scripts that you obtain to run on your systems. With the fact that scripts written for the BASH and SH shells are so portable in the Linux/Unix world, it can be easy for a cracker to write malware that will run on many different systems. Having some knowledge about the security issues surrounding shell scripts might just keep you from installing/running a malicious script such as a trojan, which gives the cracker a back door to your system. The Resources section holds books and links which will allow you to delve more deeply into this topic if you’re looking for more comprehensive knowledge. Listing 1 shows an example script that contains some of the security problems that we’ll talk about in this post.

Listing 1

#!/bin/bash # A SUID root script that demonstrates various security problems # Prepend the current path onto the PATH variable PATH=.:${PATH} #Count the number of lines in a listing of the current directory ls | wc -l # Get user input read USR_INPUT # Check to see if the user supplied the right password if [ $USR_INPUT == "mypassword" ];then echo "User input was $USR_INPUT and should have matched the string 'mypassword'" fi # Create a temp file touch /tmp/mytempfile # Set the temp file so that only the owner can read/write/execute the contents chmod 0700 /tmp/mytempfile # Save the password that the user supplied to the temp file echo $USR_INPUT > /tmp/mytempfile

Environment Variables

Your shell script has little to no chance of running securely if it trusts the environment that it runs in, and that environment has been compromised. You can help protect your script from unintended behavior by not trusting items like environment variables. Whenever possible, assume that input from the external environment has been designed to cause your script problems.

The PATH variable is a common source of security holes in scripts. Two of the most common issues are the inclusion of the current directory (via the . character) in the path, and using a PATH variable that’s been manipulated by a cracker. The reason that you don’t want the current directory included in the path is that a malicious version of a command like ls could have been placed in your current directory. For example, lets say your current directory is /tmp which is world writable. A cracker has written a script named ls and placed it in /tmp as well. Since you have the current directory at the front of your PATH variable in Listing 1, the malicious version of ls will be run instead of the normal system version. If the cracker wanted to help cover their tracks, they could run the real version of ls before or after running their own code. Listing 2 shows a very simple script that could replace the system’s ls command in this case.

Listing 2

#!/bin/bash # Run the real ls with the original arguments/options to cover our tracks /bin/ls "$@" # Run whatever malicious code we want here echo "Malicious code"

There’s a decent chance that any cracker who planted the fake ls would create it in such a way that it would look like ls was running normally. This is what I’ve done in Listing 2 by passing the @ variable to the real ls command so that the user doesn’t suspect anything. This brings up another point besides the use of the current directory in the path. Just because your script seems to be running fine from the user’s point-of-view doesn’t mean that it hasn’t been compromised. A good cracker knows how to cover their tracks, so if a security flaw has been exploited in your script the breach may go undetected for an indefinite period of time.

You can see in Listing 1 that the order of directories in the PATH variable makes a difference. This is important because if a cracker has write access to a directory that’s earlier in the search order, they can preempt the standard directories like /bin and /usr/bin that may be harder to gain access to. When you try to run the standard command, the malicious version will be found first and run instead. All the cracker has to do is insert a replacement command, like the one in Listing 2, earlier in the path search order.

The second main problem with the PATH environment variable is that it could have been manipulated by a cracker before, or as your script was run. If this happens, the cracker could point your script to a directory that they created which holds modified versions of the system utilities that your script relies on. Knowing this, it’s best if you add code to the top of your script to set the PATH variable to the minimal value your script needs to run. You can save the original PATH variable and restore it on exit. Listing 3 shows Listing 1 with the current directory removed from the PATH variable, and a minimal path set to lessen the chances of problems. Keep in mind though that a cracker could have compromised the actual system utilities that are in locations such as /bin and /sbin. Ways to detect and combat this occurrence fall more into the system security realm though and won’t be talked about in this post.

Listing 3

#!/bin/bash # A SUID root script that demonstrates various security problems # Save the current path variable to restore it later OLDPATH=${PATH} # Set a minimal path for our script to use PATH=/bin:/usr/bin #Count the number of lines ls | wc -l # Get user input read USR_INPUT # Check to see if the user supplied the right password if [ $USR_INPUT == "mypassword" ];then echo "User input was $USR_INPUT and should have matched the string 'test'" fi # Create a temp file touch /tmp/mytempfile # Set the temp file so that only we can read/write the contents chmod 0700 /tmp/mytempfile # Save the password that the user supplied to the temp file echo $USR_INPUT > /tmp/mytempfile # Reset the PATH variable to its original value PATH="$OLDPATH"

In your own scripts it would probably be best to put the reset of the PATH variable inside of a trap on the exit condition. That way PATH gets reset to the original value even if your script is terminated early. I wrote about traps in the last post in this series on error handling.

Another, less desirable way of avoiding malicious PATH exploits would be to use the full (absolute) path to the binary your script is trying to run. So, instead of just entering ls by itself, you would enter /bin/ls . This ensures that you’re running the binary that you want to, but it’s a more “brittle” approach. If your script is run on a system where the binary you are calling is in a different location, your script will break when the command is not found. One approach to help cut down on this drawback is to use the whereis command to locate the command for you. Caution needs to be applied with this approach too, but I’ve created an example in Listing 4 that shows how to do this. Remember that if the cracker has somehow compromised the system’s standard version of the command that you’re trying to run, this technique won’t help. That really starts being a system security problem rather than a script security problem at that point though.

Listing 4

#!/bin/bash - #File: findcmd.sh # Attempt to find the command with the whereis command CMD=$(whereis $1 | cut -d " " -f 2) # Check to make sure that the command was found if [ -n "$CMD" ];then echo "$CMD" fi

The script uses the command name to give the user the full path to the binary, if it can be found.There are of course numerous improvements that you could make to the script in Listing 4. My main suggestion would be to rewrite the script as a function, and then put that inside a script that you can source. That way you maximize code reuse throughout the rest of your scripts. I’ve done this in Listing 29 via the run_cmd function.

Another environment variable that can be problematic is IFS. IFS stands for “Internal Field Separator” and is the variable that the shell uses when it breaks strings down into fields, words, and so on. It can actually be a handy variable to manipulate when you’re doing things like using a for loop to deal with a string that has odd separator characters. If your shell inherits the IFS variable for it’s environment, a cracker can insert a character or characters that will make your script behave in an unexpected way. For example, suppose I have a few scripts in my ~/bin directory that I want to run together (or nearly together). The script in Listing 5 shows one very simple way of doing this.

Listing 5

#!/bin/bash - BINS="/home/jwright/bin/bin1.sh /home/jwright/bin/bin2.sh" for BIN in $BINS do echo $($BIN) done

When I run the script I get the output from bin1.sh and bin2.sh that I expect. In this case the scripts just output their name and exit. Everything is fine until a cracker comes along and sets the IFS variable to a forward slash (/). Now when I run my script I get the output in Listing 6.

Listing 6

$ ./ifscrack.sh ./ifscrack.sh: line 8: home: command not found ./ifscrack.sh: line 8: jwright: command not found ./ifscrack.sh: line 8: bin: command not found ./ifscrack.sh: line 8: bin1.sh : command not found ./ifscrack.sh: line 8: home: command not found ./ifscrack.sh: line 8: jwright: command not found ./ifscrack.sh: line 8: bin: command not found bin2.sh executing

Notice that since the directory /home/jwright/bin is in my path, the bin1.sh call should have run. If you look closely though you’ll see that there is a space after the filename, which causes the command to not be found. The IFS variable change has not only broken my script, it has allowed the cracker to open up a significant security hole. If the cracker creates a program or script with any of the names like home, jwright, or bin anywhere in the directories in PATH, their code will be executed with the privileges of my script. Because of the privilege issue, this security hole is an even bigger problem with SUID root scripts.

On some Linux distributions, the IFS variable is not inherited by a script and instead a default standard IFS value is used. You can still change the value of IFS within your script thought. With this said, it’s still a good idea to set the IFS variable to a known value at the beginning of your script and restore it before your script exits. This is similar to the change we made in Listing 3 to store and reset the PATH variable. This is a good idea because even though the distribution that your developing your script on may not allow IFS inheritance, your script may be moved to another distribution that does. It’s best to be safe and always set IFS to a known value.

Make sure that you never use the UID, USER, and HOME environment variables to do authentication. It’s too easy for a cracker to modify the values of these variables to give themselves elevated privileges. Now on the Fedora system that I’m using to write this blog post the UID variable is readonly, so I can’t change it. That doesn’t guarantee that every system that your script runs on will make UID readonly though. Err on the side of caution and use the id command or another mechanism to authenticate users instead of variables. The id command is very useful, and can give you information like effective user ID, real user ID, username, etc. Listing 7 is a quick reference of some of the id command’s options.

Listing 7

-g (--group) Print only the effective group ID -n (--name) Print a name instead of a number, for -ugG -r (--real) Print the real ID instead of the effective ID, with -ugG -u (--user) Print only the effective user ID -Z (--context) Print only the security context of the current user (SELinux)

You’ll need to use the options -u and -g with some of the other options (-r and -n) so that the id command knows whether you want information on the user or group. For example you would use /usr/bin/id -u -n to get the name of the user instead of their user ID.

The fact that the UID variable is set to readonly on my system gives you a hint at how to protect some variables. There is actually a command named readonly that sets variables to a readonly state. This does protect variables from being changed, but it also keeps you as the “owner” of the variable from making any changes to it too. You can’t even unset a readonly variable. To make a variable readonly, you would issue a command line like readonly MYVAR . Make sure to carefully evaluate whether or not a variable will ever need to change or be unset before setting it to readonly.

There’s an IBM developerWorks article in the Resources section (#20) that mentions security implications for some other environment variables such as LD_LIBRARY_PATH and LD_PRELOAD. That would be a good place to start digging a little deeper on the security issues surrounding environment variables.

Symbolic Links

You should always check symbolic links to make sure that a cracker is not redirecting you to their modified code. Symbolic links are a transparent part of everyday life for Linux users. Chances are that when you run sh on your favorite Linux distribution, /bin/sh is actually a link to /bin/bash . Go ahead and run ls -l /bin/sh if you’ve never noticed this before. Symbolic link attacks can take a few different forms, one of which is redirection of sensitive data. In one situation, you may think that you’re caching sensitive data to a file you’ve created in /tmp with 0700 file permissions. Instead, by exploiting a race condition in your script (we’ll talk about race conditions later) a cracker creates a symbolic link with the same filename that your script will be writing data into first, thus causing your creation of the temporary file to throw an error. If your script doesn’t stop on this error, it will begin dumping data into the file at the end of the symbolic link. The endpoint of the link could be on a mounted remote filesystem where the cracker can get easier access to it. There were several mistakes made in this scenario that we’ll talk more about later, but before that lets look at making sure we’re not writing data to a symbolic link.

Listing 8

#!/bin/bash - #File: symlink_test.sh # Poor method of temp file creation touch /tmp/mytempfile # Check the new temp file to see if it's a symbolic link IS_LINK=$(file /tmp/mytempfile | grep -i "symbolic link") # If the variable is not null, then we've detected a symbolic link if [ -n "$IS_LINK" ];then echo "Possible symbolic link exploit detected. Exiting." exit 1 #Exit before we dump the sensitive data into to the link fi # Dump our sensitive data into the temp file echo "Sensitive Data" > /tmp/mytempfile

If our script sees the string “symbolic link” in the output from the file command, it assumes that it’s looking at an attempted symbolic link exploit. Rather than continuing on and possibly sending data to a cracker, the script chooses to warn the user and exit with an exit status indicating an error. Be aware thought that this script doesn’t protect against the situation where a cracker creates your temp file in place with permissions to give themselves access to the data. In the case that you don’t expect a the temp file to already be there, you would throw an error and exit. This brings up another problem though – DoS (Denial of Service) attacks. If the cracker simply wants your script to fail, all they have to do is make sure your temp file has already been created so that your script will throw and error and exit. You’re not handing over sensitive data, but your users are being denied the use of your script. The answer to this is to create temporary files with less-predictable file names.

“Safe” Temporary Files

In the header for this section, I put the word safe in quotes to denote that it’s very difficult to make anything completely safe. What you have to do is make things as safe as possible, and then keep an eye out for suspicious activity. In the last blog post I created a a function named create_temp that used a simple, but risky mechanism to create temp files. A snippet of the code from that listing is shown in Listing 9.

Listing 9

# Function to create "safe" temporary files function create_temp { # Give preference to user tmp directory for security if [ -e "$HOME/tmp" ] then TEMP_DIR="$HOME/tmp" else TEMP_DIR="/tmp" fi # Construct a "safe" temp file name TEMP_FILE="$TEMP_DIR"/"$PROGNAME".$$.$RANDOM # Keep the file in an array to remove it later TEMPFILES+=( "$TEMP_FILE" ) { touch $TEMP_FILE &> /dev/null } || fatal_err $LINENO "Could not create temp file $TEMP_FILE" }

The problem with this function is that it uses a temporary file name with 2 elements that are easy to predict – the program name and the process ID. The fact that there is a random number on the end is only an inconvenience for the cracker, because all they have to do is create a file for each possible file name with an ending number between 0 and 32767. They can be sure that you’ll dump data into one of those files, and it’s easy to write a script to find out which file holds the data. A slightly better method would be to append multiple sets of random numbers onto the file name, separating each set with periods. This makes it much harder for the cracker to cover all the possible file names. A much better way to handle this situation is to use the mktemp command, which is available on most Linux systems.

The mktemp command takes a string template that you supply and creates a unique temporary file name. The form could be something like mktemp /tmp/test.XXXXXXXXXXXX which would print the random file name to standard out and create a file with that name and path. Running that command line on a Fedora 13 system once gave me the output /tmp/test.o0mTLAgSWTfX which of course will vary each time you run the command. The more X characters you add to the template, the harder it is for a cracker to predict the file name. From what I’ve read, 10 or so is the recommended minimum amount. Another nice thing about mktemp is that when it creates a temp file, it makes sure that only the owner has access to it. Some useful options for mktemp are shown in Listing 10. You should use mktemp in preference to commands like touch and echo to create temp files.

Listing 10

-d (--directory) Create a directory, not a file. -q (--quiet) Suppress diagnostics about file/dir-creation failure. --suffix=SUFF Append SUFF to TEMPLATE. SUFF must not contain slash. This option is implied if TEMPLATE does not end in X. --tmpdir[=DIR] Interpret TEMPLATE relative to DIR. If DIR is not specified, use $TMPDIR if set, else /tmp. With this option, TEMPLATE must not be an absolute name. Unlike with -t, TEMPLATE may contain slashes, but mktemp creates only the final component.

There are just a few other miscellaneous facts about mktemp that I want to make sure you’re aware of.

  1. The man pages for mktemp on both Ubuntu 9.10 and Fedora 13 systems specify that the minimum number of X characters that you can have in a template is three. Even though you can go this low, I wouldn’t recommend it because it greatly increases the predictability of your file names. Ten or more random alpha-numeric characters is better.
  2. mktemp is commonly part of the coreutils package.
  3. The default number of X characters that you get when you don’t specify a template with mktemp is 10. This held true on the Fedora 13 and Ubuntu 9.10 systems that I tested.

So what happens if you don’t have mktemp on your system? The LinuxSecurity.com article in the Resources section (#17) gives a way to use mkdir to create a temporary directory that only the creator has access to. A script based on the examples in that article is found in Listing 11, but should not be used in preference to the mktemp command unless you have a compelling reason.

Listing 11

#!/bin/bash - #File safetmp.sh # Give preference to user tmp directory for security if [ -e "$HOME/tmp" ];then TEMP_DIR="$HOME/tmp" else TEMP_DIR="/tmp" fi # Create somewhat secure directory name TEMP_NAME=${TEMP_DIR}/$$.$RANDOM.$RANDOM.$RANDOM # Create the directory while at the same time giving # only the user access to it (umask 077 && mkdir $TEMP_NAME) || { echo "Error creating the temporary directory." exit 1 }

Notice that this script does use multiple references of the RANDOM variable separated by periods to make the directory name harder to guess. Also, the umask is set to 077 just before the directory is created so that the end directory permissions are 700. That gives the owner full access to the file, but none to anyone else. At the top of the script I have reused code from the create_temp function in Listing 9. This code gives preference to the user’s home directory over the system (/tmp) directory. If the temporary file or directory that you are creating can be placed in the user’s home directory, that’s just one more layer of protection from prying eyes. I would suggest using the user’s own tmp directory whenever possible.

Keep in mind that as I mentioned above, even though you’ve protected the data in the temp files a cracker can still launch a DoS (Denial of Serivce) attack against your script. In this case since the cracker probably can’t guess the temporary file name, they might try to fill the /tmp directory so that there’s no more space for you to create your file. Things like user disk quotas can help mitigate this type of attack though.

Now that you know a little more about temp file safety, I’ll caution you not to overuse temporary files. When you store or use data in external files you are opening a door into your script that a knowledgeable individual may be able to exploit. Use temp files only when needed, and make sure to consistently follow safe guidelines for their use.

Race Conditions

A race condition occurs when a cracker has a window of opportunity to preempt and modify your scripts behavior, usually by exploiting a design flaw in the execution sequence of your script, or in its reliance on an external resource (like a lock file). The example that we’ve already talked about is creating a symbolic link or a file in place of the script’s temp file to capture data. The script that I’ve created in Listing 12 uses the sleep command to create a larger window for a race condition.

Listing 12

#!/bin/bash - #File: race_cond.sh TEMP_FILE=/tmp/predictable_temp # Make sure that the temp file doesn't already exist if [ ! -f $TEMP_FILE ];then # Do something here that takes 10 seconds. This # creates the race condition and is simulated by # the sleep command sleep 10 # Create the temp file touch $TEMP_FILE # Make sure only the user can view the contents chmod 0700 $TEMP_FILE # Dump our sensitive data to the temp file echo "secretpassword" > $TEMP_FILE fi

Once the script is run, the cracker has 10 seconds to create the temp file before the script does. The timing is rarely as simple as I have made it out to be in this example, but the 10 second gap between checking for the existence of the file and the creation of it illustrates the point. The two lines in Listing 13 can be entered as a different user. The touch command in Listing 12 will fail because the file is owned by a different user, but the script has another flaw in that it doesn’t check for that error before writing the data. Because of this the sensitive data is written into a file that is easy for the cracker to read. Checking for an error and making sure that the file you want to create doesn’t already exist and has the correct permissions would go a long way toward making this script more secure.

Listing 13

touch /tmp/predictable_temp chmod 0777 /tmp/predictable_temp

When the 10 second delay expires in my script I get the error chmod: changing permissions of `/tmp/predictable_temp': Operation not permitted just before the data is written to the file. The temp file is accessible to the cracker using the cat command, and an ls -l of the temp file shows that it’s owned by the user name that the cracker used. There are other race condition exploits, but the moral of the story is to not leave gaps between critical sections of your script. Listing 11 shows a good example of closing the gap between operations. In that case the permissions are set as the directory is created by setting the umask before the call to mkdir. Race conditions are certainly something to keep in mind as your attempt to increase the security of your scripts.

The Shebang Line

You may have noticed before that I put a dash and a space (a bare option) at the end of most of my script shebang lines. This is the same as the double dash (--) option and signals the end of all options for the shell. Any options that are tacked onto the end of the shebang line will be treated as arguments to the shell, and will most likely throw an error. The reason that this is important is that it prevents option spoofing. On some systems if the cracker can get the shebang line to effectively read #!/bin/sh -i they will get an interactive shell with the privileges of the script. It’s important to note that I was not able to get an interactive shell using a script on a Fedora 13 system, even when I entered the shebang line directly as having the -i option. Even so, you don’t always know which systems your script will run on, and it only takes a fraction of a second to add the dash (or double dash) at the end of your shebang line. That’s a very small price to pay for some added security.

User Input

As I discussed in the error handling post of this series, user input should be processed cautiously. Even when there is no malicious intent by a user very serious errors can result from incorrect input. At its worst, user input can give a cracker an open door into your system through things like injection attacks. Keeping this in mind, there are a few guidelines that you can follow to help keep user input from bringing your script down.

If you can avoid it, don’t pass user input to the eval command, or pipe the input into a shell binary. This is a script crash or security problem waiting to happen. Listing 14 shows the wrong way to handle user input when it’s captured with the read command.

Listing 14

#!/bin/bash - #File: badinput.sh # Get the input from the user read USR_INPUT # Don't use eval like this eval $USR_INPUT # Don't pipe input to a shell like this echo $USR_INPUT | sh

It’s probably pretty easy to agree with me that the script in Listing 14 is a bad idea. The user can type any command string they want (including rm -rf /*) and it will be executed with the privileges of the script. Depending on how much the permissions of the script are elevated, this could do a lot of damage. Another scenario that may seem more harmless is the one in Listing 15.

Listing 15

#!/bin/bash - read USR_INPUT if [ $USR_INPUT == "test" ];then echo "You should only see this if you typed test." fi

Everything works fine until a cracker enters the string random == random -o random and hits enter. What this effectively does is changes the if statement so that it reads if [ random == random -o random == "test" ] where the -o is a logical or. It tells the if statement that either the first statement or the second statement has to be true, but not both. Of course the first statement (random == random) is true, so what’s inside the if statement executes even though the cracker didn’t type the correct word or phrase. Depending on what’s inside the if statement, that security hole could range from a minor to major problem. The way to combat this is to quote your variables (i.e. "$USR_INPUT") so that they are tested as a whole string. In general quoting your variables is a good idea as you’ll also head off problems with things like spaces that might otherwise cause your script trouble.

This is an example of an injection attack where the cracker slips some extra information in with the input to trick your script into running unintended code. This is a very common attack “vector” for database and web servers where a cracker carefully crafts a request to cause arbitrary code to execute, or to bring down the web/database service. Another script that can be exploited by an injection attack is found in Listing 16.

Listing 16

#!/bin/bash - read USR_INPUT #This line contains the fatal security flaw echo ls $USR_INPUT > test.sh

This script isn’t necessarily something that you would do in the real world, but it’s a simple way to demonstrate this injection attack. What the script does is takes a list of directories from the user and then builds a script using the ls command to list the contents of the directories. The injection attack comes when a cracker types && rm randomfile and you find that the resulting script (test.sh) contains a line that will delete files (Listing 17). The && rm randomfile line could have just as easily be && rm -rf /* if the cracker wanted.

Listing 17

ls && rm randomfile

The && operator runs the second command in the sequence if the first command runs successfully (without an error). The ls command is not likely to fail by itself as it just lists the contents of the current directory, so the rm command will most likely run and delete files. The method to deal with this type of attack is similar to the previous method of quoting, except that in this case you escape the quotes around the user input to make sure that it is properly contained. Listing 18 shows the corrected script.

Listing 18

#!/bin/bash - read USR_INPUT #This line uses escaped quotes to enclose the potentially dangerous input echo ls ""$USR_INPUT"" >> test.sh

Along with quoting, it’s a good idea to search user input for unacceptable entries like meta or escape characters. You can search the user input for these undesirable characters and replace them with something harmless to your script like a blank character or underscore. When doing this, it may be easier to search for the characters that are acceptable instead of trying to cover every single character that’s not acceptable. The set of acceptable characters is almost always smaller, and it’s hard to anticipate every bad character that might be passed to your script. Listing 19 shows a simple way of cleaning the input using the tr command.

Listing 19

#!/bin/bash - #File: scrubinput.sh # Grab the user's input read USR_INPUT # Remove all characters that aren't alphanumeric or newline USR_INPUT=$(echo "$USR_INPUT" | tr -cd '[[:alnum:]n]')

This script takes the user input using the read command as before, but then pipes the value directly into the tr command. The tr command’s -c (--complement) and -d (--delete) options are used to cause tr to look for and delete the unmatched characters. So, anything that’s not an alphanumeric character (via the alnum character class) or a newline character will be deleted. It’s not hard to adapt the tr statement to your situation, maybe even replacing the characters instead of deleting them.

As with the other topics in this post I’m scratching the surface, but hopefully you can see how important it is to check user input before doing anything with it. The inability of a script or program to handle improper input is a common bug in the software world. Whether the user has malicious intent or not, bad user input is something that you must plan for.

SUID, SGID, and Scripts

There are several of the above scenarios that may not cause that much harm on their own because the user running the script has restricted permissions. This can all change with a script that has its SUID and/or SGID bit set though. The SUID and SGID bits show up in the first of four digits in the octal representation of a file’s permissions. The SUID bit has a value of 4 and the SGID bit has a value of 2. If both bits are set you get a value of 6, which is similar to how normal permission bits can be added. The other place that you normally see the SUID and SGID bits are in the symbolic permission string. There they show up as the character “s” in either the user execute permission space, or in the group execution space respectively. For example, if only the SUID bit was set on a script and the file had read/write/execute permissions of 755, the full permissions for the script would be 4755. The symbolic representation of this would be -rwsr-xr-x .

When the SUID bit is set on an executable, the file is run using the privileges of the file’s owner. In the same way if the SGID bit is set on an executable, it will be run with the rights of the file owner’s group. Typically a command/script executes using the real user ID (and rights), but when the SUID or SGID bits are set the script executes with the effective user ID of the file owner instead. A common use is to have the SUID bit set on a file that is owned by root so that a user can access files and resources that they normally wouldn’t have access to. The passwd command is a good example of this. In order to change a user’s password, passwd has to access protected files such as /etc/passwd and /etc/shadow . If a normal user is running the passwd command, they would need elevated privileges to access the files since they are only readable and writable by root. This is very handy, and as you’ve seen, sometimes required on Linux systems but is something that you should avoid doing with your scripts whenever possible. The problem with an SUID root script is that if a cracker compromises that script, they have superuser privileges that could be used to run commands like rm -rf /* . As a programmer and/or system administrator, you need to guard against the tendency to take the easiest route to a solution rather than the most secure one. All to many admins will set a script to be SUID root when with some thought the script could have been designed to run without superuser privileges. With that said, you may run into situations where you have to use SUID and SGID. Just make sure that it’s a true “have to” situation. Always follow the Rule of Least Privilege which says that you should never give a user or a program any more rights than you have to.

If you really need to use the SUID and SGID bits, you can set them with the chmod u+s FILENAME and chmod g+s FILENAME command lines respectively. Keep in mind that there are Linux distributions and Unix variants that do not honor the SUID bit when it is set on a script. You’ll need to check the documentation for your Linux distribution to be sure that setting the SUID bit will work.

You can use the find command to search for files on your system with the SUID and SGID bits set. You can use this as a security auditing tool to search for SUID/SGID scripts that look out of place. Listing 20 shows a quick and simple way to search for out of place SUID/SGID shell scripts that are on your system.

Listing 20

$ sudo find / -type f -user root -perm /4000 2> /dev/null | xargs file | grep "shell script" /usr/bin/malscript.sh: setuid Bourne-Again shell script text executable

Let’s take the command line from Listing 20 one step at a time. The first section is the actual find command (find / -type f -user root -perm +4000). The find command searches for a file of type regular file (-type f) and not a directory, it checks to make sure that the file is owned by root (-user root), and that it has the SUID bit set (-perm /4000). The next short section of 2> /dev/null redirects any errors to the null device so that they are thrown away. This effectively suppresses errors resulting from find trying to access things like Gnome’s virtual file system. The file command deciphers which type of file is being looked at. This command is not perfect, but will work for a quick and dirty security audit. The file command needs to work on each of the file names individually, so I use the xargs command to run file separately with each line of output from the find command. I could have also used the -exec option of find in the following way: -exec file '{}' ; . The command line up to this point gives me output telling what each type of file is, but I really only care about shell scripts. That’s where the grep statement comes in. I use grep to filter out only the lines that mention a “shell script”.

As you can see in the output of the command line, there is a suspicious file called malscript.sh in /usr/bin . Searching in this way made a file that normally would be overlooked stand out by itself. In this case I created that script and put it in /usr/bin myself so that I would have something to find, but it simulates something that you might find in the field. You could just as easily have searched for SGID scripts (-perm /2000), SUID/SGID combo scripts (-perm /6000), SUID root binaries, and much more. Be aware that if the owner execution bit is not set on a directory then it is not searchable. This would cause the find command to skip over the directory, possibly causing you to miss a suspicious file.

The SUID root mechanism can be especially dangerous if a cracker manages to make a copy of a shell binary and sets it to be SUID root. Some shells such as BASH will automatically relinquish their privileges if they’re being run this way. Keep an eye out for extra copies of shell binaries that are set SUID, as they could be part of an attack by a cracker. The shell binary could have been copied and modified using several of the security flaws that we’ve talked about above. You could use the script in Listing 20 to help you search for SUID root copies of shell binaries.

When running scripts manually as a system administrator, you should run scripts with temporary elevated privileges through a mechanism like sudo whenever possible, rather than setting a script to be SUID root. Even with sudo though you still need to make sure your script is secure as possible because sudo is still granting your script root privileges, and it doesn’t take much time to do a lot of damage. Item #16 in the Resources section touches on many of the security aspects that we’ve talked about here from the perspective of proper sudo usage.

In some cases a user may install or use your script improperly, running it as SUID root or with sudo. If you never want your script run as root, you could use the id command along with some text manipulation to warn the user and then exit. The script in Listing 21 shows one way of doing this.

Listing 21

#!/bin/bash - #File: droproot.sh # Check to see if we're running as root via sudo if [ $(/usr/bin/id -ur) -eq 0 ];then echo "This script cannot be run with sudo" exit 1 fi # Get the listing on this script INFO=$(ls -l $0) # Grab the permission at the SUID position PERM=$(echo "$INFO" | cut -d " " -f 1 | cut -c 4) # Grab the owner OWNER=$(echo "$INFO" | cut -d " " -f 3) # Check for the SUID bit and the owner of root if [ "$PERM" == "s" -a "$OWNER" == "root" ];then echo "This script cannot be run as SUID root" exit 1 fi

The script uses the id command to check the real user ID of the user, and if it’s 0 (root) then the script warns the user that the script is not supposed to be run with sudo or as root and exits. To check for the SUID root condition, I’ve taken a slightly more complicated route. I run the command line ls -l $0 which gives me a long listing for the script name (represented by $0) showing the symbolic permission string and the owner. I then extract the character in the permission string that would represent the SUID bit as an “s” if present so that I can check it. This is done with the cut -c 4 command line which extracts the fourth character. Once I have the SUID bit and the user, I just use an if statement to check to see if both the SUID bit is set and that the script is owned by root. If both of those conditions are true, I warn the user that the script can’t be SUID root and exit.

One of the nice things about the BASH shell is that if it detects that it has been run under the SUID root condition, it will automatically drop its superuser privileges. This is nice because even if an attacker is able to make a copy of the bash binary and set it as SUID root, it will not allow them to gain additional access to the system. Unfortunately, most crackers are going to know this and will try to make a copy of another shell like sh that doesn’t have this feature.

The last thing that I’ll mention about SUID root scripts is that I have seen it suggested by several system administrators that you should use Perl or C whenever you must use SUID root. There have been arguments for and against using Perl or C in place of shell scripting, and ultimately you must decide which you feel safer with. I’m not going to argue the point, but I will say that if you use unsafe practices when writing your Perl scripts or C programs, you’re going to end up no better off anyway. Take your time and make sure the code you write is as secure as you can make it. This is a rule to live by no matter what language you’re using.

Storing Sensitive Data In Scripts

This is just a bad idea, do your best to avoid it. If you store passwords in a script they’re just waiting to be found. Even if you set the permissions to 0700, the passwords will still be compromised if a cracker compromises your account. There’s also the risk that you might accidentally send the script to another user, and forget to scrub the passwords from it.

You should also not echo passwords as a user types them. Shoulder surfers could see the password as the user enters it if you have the shell set to echo user input. To avoid this in your script, you can use stty -echo as I have in the very simple example in Listing 22.

Listing 22

#!/bin/bash - # Turn echoing off stty -echo # Read the password from the user echo "Please enter a password: " read PASSWD # Turn echoing back on stty echo

Notice that only what the user types is suppressed and not the output from the echo command itself. This of course doesn’t protect the user from somebody watching what their fingers press on the keyboard, but there’s nothing that you as a programmer can do about that.

If you do end up storing passwords in your script or in files on your system, it would be a good idea to encrypt the information. You can encrypt passwords using the md5sum or sha*sum commands. You can pipe the password string straight into the command as with the line echo "secretpassword" | sha512sum . I would suggest writing a script that takes the password without echoing the input and converts it into an encrypted hash. Once you’ve encrypted the password this way it is never decrypted, you just encrypt the password given by the user and compare that to the stored password hash. That way the password is not out in clear text for a cracker to find. Granted, it’s still possible to crack encryption, but remember that no system is bulletproof and the goal is to make the crackers life as difficult as possible.

One habit that you should encourage with your users (and any system admins under you) is picking long and complex passwords. To ease the strain of having to remember a convoluted password, have users build passwords based on first letters and punctuation from a random phrase. For instance, the phrase “This is 1 fairly strong password, don’t you think Jeremy?” would reduce to “Ti1fsp,dytJ?”. The specific phrase doesn’t matter, but it should include a mix of numbers, letters (upper and lowercase), and symbols to be the most secure. Make sure that all of the symbols being used are acceptable for the system you’re choosing the password for though.

The shc Utility

The shc utility compiles a script in order to make it harder for a cracker to read its contents. This is especially useful if you find that you have to store passwords or other sensitive information inside of a script. Take note that I said “harder” and not “impossible” for a cracker to read. It’s been shown that shc compiled scripts can be reverse engineered to gain access to the contents. Remember that you should strive to make sure that your protection mechanisms are multi-layered. If you use shc to compile a script with passwords in it, encrypt the passwords with the md5sum command, and set the access permissions to be as restrictive as possible. That way you’re not just relying on shc to keep your data safe. Some of the options for the shc utility are shown in Listing 23.

Listing 23

-e date The date after which the script will refuse to run (dd/mm/yyyy) -f script_name The file name of the script to compile -m message The message that will be displayed after the expiration date -T Allow the binary form of the script to be traceable -v Verbose output

Using these options I compiled a sample script via the command line in Listing 24, looked at what files were created, and then tried to run the resulting binary. The version of shc that I used was 3.8.7 which I compiled from source. I then copied the shc binary to my ~/bin directory so that I could run it more conveniently.

Listing 24

$ shc -e 08/09/2010 -m "Please contact your administrator" -v -f test.sh shc shll=bash shc [-i]=-c shc [-x]=exec '%s' "$@" shc [-l]= shc opts=- : No real one. Removing opts shc opts= shc: cc test.sh.x.c -o test.sh.x shc: strip test.sh.x shc: chmod go-r test.sh.x $ ls test.sh test.sh.x test.sh.x.c $ ./test.sh.x ./test.sh.x: has expired! Please contact your administrator

You can see in Listing 24 that I’ve set an expiration date of September 8th, 2010, which is earlier than the date that I’m writing this. I supply the expiration message of “Please contact your administrator”, I ask shc for verbose output, and then I give it the script that I want it to compile (test.sh). When I list the files in the directory I see test.sh, test.sh.x, and test.sh.x.c . test.sh.x is the compiled binary that shc creates from my original script. test.sh.x.c is the C source code that is generated for test.sh . Be careful to keep this file in a safe place as it gives critical information that will compromise your compiled script. In Listing 24 I get an error when I try to run the compiled script (test.sh.x), but this is expected as I used an expiration date in the past. I did this just to show you how the compiled script would react when the expiration period expires. You don’t have to specify the expiration date, but it can be handy if you only want to give a user access to a script’s capabilities for a few days or weeks.

Overall shc is a nice tool to have at your disposal, but as I mentioned above don’t count on it for foolproof protection. The Linux Journal article in the Resources section (#5) talks about how shc compiled scripts can be cracked. Additional features have been added to newer versions of shc, such as the removal of group and other read permissions by default, to make the compiled scripts harder to get at. Even so, make sure that you have multiple layers of security surrounding your scripts as we’ve talked about earlier.

How-To

At this point, let’s take what we’ve discussed so far and apply it to the script in Listing 1. I’ve already removed the current directory from the PATH variable, and made sure that we start off with a clean path by resetting the variable in Listing 3. The script in Listing 25 shows the script that we’ll be starting with.

Listing 25

#!/bin/bash # A SUID root script that demonstrates various security problems # Save the current path variable to restore it later OLDPATH=${PATH} # Set a minimal path for our script to use PATH=/bin:/usr/bin #Count the number of lines ls | wc -l # Get user input read USR_INPUT # Check to see if the user supplied the right password if [ $USR_INPUT == "mypassword" ];then echo "User input was $USR_INPUT and should have matched the string 'test'" fi # Create a temp file touch /tmp/mytempfile # Set the temp file so that only we can read/write the contents chmod 0700 /tmp/mytempfile # Save the password that the user supplied to the temp file echo $USR_INPUT > /tmp/mytempfile # Reset the PATH variable to its original value PATH="$OLDPATH"

Now that we have a minimal and known PATH variable set, we can feel a little better about running the ls | wc -l command line. As stated before, we could use absolute paths for each command but that could lead to a portability issue on some systems where the binaries are stored in different locations.

The next step is to deal with the user input. I’m first going to put quotes around the variable to help ensure that it’s treated as a string, and not a part of the statement. Also, just after the read line I’m going to scrub the input to make sure there aren’t any inappropriate characters contained within it. Listing 26 shows the script with these changes.

Listing 26

#!/bin/bash # A SUID root script that demonstrates various security problems # Save the current path variable to restore it later OLDPATH=${PATH} # Set a minimal path for our script to use PATH=/bin:/usr/bin #Count the number of lines ls | wc -l # Get user input read USR_INPUT # Remove all characters that aren't alphanumeric or newline USR_INPUT=$(echo "$USR_INPUT" | tr -cd '[[:alnum:]n]') # Check to see if the user supplied the right password if [ "$USR_INPUT" == "mypassword" ];then echo "User input was $USR_INPUT and should have matched the string 'mypassword'" fi # Create a temp file touch /tmp/mytempfile # Set the temp file so that only we can read/write the contents chmod 0700 /tmp/mytempfile # Save the password that the user supplied to the temp file echo $USR_INPUT > /tmp/mytempfile # Reset the PATH variable to its original value PATH="$OLDPATH"

The section of code that scrubs the user input is taken from Listing 19, and a full explanation of the process can be found in the paragraphs following that listing. In short, the user input is echoed into the tr command so that all characters except alpha-numeric and newline characters are deleted.

Of course as I mentioned above, you wouldn’t want to store any password information in a script unless you have to. If it becomes necessary to store a password inside a script it’s best to encrypt the password using a command like md5sum. Think about this decision carefully because there is almost always a way to avoid storing a password inside of a script. For the purpose of this example, I’ve decided to leave the password in the file and use md5sum to encrypt it. Listing 27 shows the results of adding password encryption.

Listing 27

#!/bin/bash # A SUID root script that demonstrates various security problems # Create the array that will keep the list of temp files TEMPFILES=( ) # Function to create "safe" temporary files. function create_temp { # Give preference to user tmp directory for security if [ -e "$HOME/tmp" ] then TEMP_DIR="$HOME/tmp" else TEMP_DIR="/tmp" fi # Construct a "safe" temp file using mktemp TEMP_FILE=$(mktemp --tmpdir=$TEMP_DIR XXXXXXXXXX) # Keep the file in an array to remove it later TEMPFILES+=( "$TEMP_FILE" ) } # Save the current path variable to restore it later OLDPATH=${PATH} # Set a minimal path for our script to use PATH=/bin:/usr/bin #Count the number of lines ls | wc -l # Make sure that nobody can see the password as it's entered stty -echo # Get user input read USR_INPUT # Re-enable echoing of typed input stty echo # Remove all characters that aren't alphanumeric or newline USR_INPUT=$(echo "$USR_INPUT" | tr -cd '[[:alnum:]n]') # Check to see if the user supplied the right password, but use encryption if [ $(echo "$USR_INPUT" | md5sum | cut -d " " -f 1) == "d84c7934a7a786d26da3d34d5f7c6c86" ];then # Don't echo the user's password, just tell them it worked echo "Password Accepted." fi # Call the function that will create a "safe" temp file for us create_temp # Make sure that the temp file/name was added to the array echo ${TEMPFILES[0]} # Reset the PATH variable to its original value PATH="$OLDPATH"

Next, we start getting into the temporary file section of the script. I had created a function for this in the last blog post, but we’ll write the function from scratch here applying what we’ve learned so far. Listing 28 shows the new function and it’s implementation within the script.

Listing 28

#!/bin/bash # A SUID root script that demonstrates various security problems # Create the array that will keep the list of temp files TEMPFILES=( ) # Function to create "safe" temporary files. function create_temp { # Give preference to user tmp directory for security if [ -e "$HOME/tmp" ] then TEMP_DIR="$HOME/tmp" else TEMP_DIR="/tmp" fi # Construct a "safe" temp file using mktemp TEMP_FILE=$(mktemp --tmpdir=$TEMP_DIR XXXXXXXXXX) # Keep the file in an array to remove it later TEMPFILES+=( "$TEMP_FILE" ) } # Save the current path variable to restore it later OLDPATH=${PATH} # Set a minimal path for our script to use PATH=/bin:/usr/bin #Count the number of lines ls | wc -l # Make sure that nobody can see the password as it's entered stty -echo # Get user input read USR_INPUT # Re-enable echoing of typed input stty echo # Remove all characters that aren't alphanumeric or newline USR_INPUT=$(echo "$USR_INPUT" | tr -cd '[[:alnum:]n]') # Check to see if the user supplied the right password, but use encryption if [ $(echo "$USR_INPUT" | md5sum | cut -d " " -f 1) == "d84c7934a7a786d26da3d34d5f7c6c86" ];then # Don't echo the user's password, just tell them it worked echo "Password Accepted." fi # Call the function that will create a "safe" temp file for us create_temp # Make sure that the temp file/name was added to the array echo ${TEMPFILES[0]} # Reset the PATH variable to its original value PATH="$OLDPATH"

Within the create_temp function, I use the TEMPFILES array to hold the file names and paths of the temporary files that I create. That way I can remove them later when the script is finished. Normally I would add a trap to handle this which I talked about in the last blog post on error handling. I left the trap out of Listing 28 just to keep the example a little bit shorter. When the create_temp function is called, the script first checks to see if the user has their own tmp directory. If they do, it is used in preference to the main /tmp directory since it is world writable. Once the tmp folder has been selected it is passed to the mktemp command using the --tmpdir option. mktemp creates the temp file, and the pathname of the file that was created is stored in a variable. According to our error handling knowledge, I should be checking to make sure that the temp file was created and that there were no errors, but I’ve left this check out to keep the script more streamlined. In your own use of this script code you’ll want to apply the error handling techniques that we talked about in the last post. The path and file name that’s stored in the variable is then added to the TEMPFILES array to be dealt with later. Once that’s done, the temp file is ready for use. Normally you would redirect data into the temp file, but I just echoed the path and name of the temp file instead.

The last thing that I do is to restore the PATH variable using the saved value in the OLDPATH variable. This undoes the change that we made at the beginning of the script which helped us run system commands more safely.

There are still improvements that can be made to this script based on what has been discussed in previous posts. Please add your ideas about the script in the comments on this post.

Tips and Tricks

  • Never copy commands or code from forums, blogs, and the like without checking them to make sure they’re safe. Resource #2 has a list of malicious commands that have been given out by problem users in the Ubuntu Forums. Your best defense is to review the commands/code thoroughly yourself, or find someone who can review it for you before you execute it. You can also post the code to other forums and ask the users there if it’s safe.
  • Always be suspicious of external inputs to your script whether they be variables, user input, or anything else. We talked about validating user input in the last post on error handling as well as this one. It’s important to remember that incorrect input is not always the doing of a cracker. Many times users make honest mistakes, and your script needs to be able to handle that eventuality.
  • Make sure that your script is writable only by the owner. That makes direct code injection attacks harder for a cracker to accomplish.
  • Use the cd command to change into a trusted directory when your script starts. This way you have a known starting point.
  • When you’re writing a script, always assume that it will be installed and run incorrectly. If it’s designed to be in a directory that’s only readable/writable by the owner, and it holds sensitive information, assume that it’s going to be placed in a world writable directory with full permissions for everyone. Don’t hard code an installation directory into your script unless you have to.
  • Don’t assume that your script is always going to be run as a regular user, or just as the super user. You need to understand what your script will do when run by unprivileged and privileged users.
  • Attempt to keep your scripts and files out of world writable directories like /tmp as much as possible.
  • Don’t give users access to programs with shell escapes (like vi and vim) from your scripts, especially when elevated privileges are involved.
  • Do not rely only on one security technique to protect your script and your users. Putting all your faith in a method like “security through obscurity” (such as password encryption) while ignoring all of the other security tools in your box is asking for trouble. Some security methods can give you a false sense of security, and you need to be vigilant. Remember, try to make the crackers life as difficult as you possibly can. This involves a multi-tiered script security strategy.
  • Use secure versions of commands in your scripts whenever possible. For instance, use ssh and scp instead of telnet and rcp, or the slocate command rather than the locate command. The man page for the base command will sometimes point you toward the more secure versions.
  • Have other coders look over your script to check it for problems and security holes. You can even post your script to various forums and ask them to try to break it for you.
  • Make sure that any startup and configuration scripts that you add to your system are as secure and bug free as possible. Don’t add a script to the system’s init or Upstart mechanism without testing it thoroughly.
  • When using information like passwords within your script, try not to store the information within environment variables. Instead use pipes and redirection. The data will be harder to access by a cracker.
  • When creating and running scripts you should follow the Rule Of Least Privilege by only giving the minimal set of privileges that the script needs to do it’s job. Also, make sure that you’ve designed the script well so that it doesn’t need elevated privileges unnecessarily. For instance, if a script works well with ownership of nobody and a permission string of 0700, don’t set the script to be owned by root and have permissions of 4777 .
  • In the appropriate context, use options for commands that tend to enhance security and resistance to bad input. For instance, the find command has an option -print0 that causes the output to be null terminated instead of newline terminated. The xargs command has a similar option (-0). These options can help ensure that input containing things like newlines won’t break your script. This requires extra study of what can go wrong with your script, and how to use the available commands to avoid anything going wrong.
  • If you have scripts shared via something like a download repository, consider giving your users md5 and/or sha1 sum values so that they can check the integrity of a script they download. If you’re emailing a script, you might want to use GPG so that you can do things like ensuring that the contents of the script have not been tampered with, and that a third party cannot read the contents of the script in transit.

Scripting

These scripts are somewhat simplified and in most cases could be done other ways too, but they will work to illustrate the concepts. If you use these scripts, make sure you adapt them to your situation. Never run a script or command without understanding what it will do to your system.

This first script (Listing 29) is a compilation of the shell script code that I’ve demonstrated throughout this post. The code has been organized into functions and placed in a separate script that can be sourced to add security specific code to your own scripts. Keep in mind though that the functions in this script don’t give you comprehensive coverage. Once again, we’re barely scratching the surface.

Listing 29

#!/bin/bash - # File: security_src.sh # Script that you can source to add a few security features to # your own scripts. #Variables to store the old values of the IFS and PATH variables OLD_IFS="" OLD_PATH="" # Function to create "safe" temporary files. function create_temp { # Give preference to user tmp directory for security if [ -e "$HOME/tmp" ] then TEMP_DIR="$HOME/tmp" else TEMP_DIR="/tmp" fi # Construct a "safe" temp file using mktemp TEMP_FILE=$(mktemp --tmpdir=$TEMP_DIR XXXXXXXXXX) # Keep the file in an array to remove it later TEMPFILES+=( "$TEMP_FILE" ) } # Function that will keep this script from being run with any kind # of root privileges. function drop_root { # Check to see if we're running as root via sudo if [ $(/usr/bin/id -ur) -eq 0 ];then echo "This script cannot be run with sudo" exit 1 fi # Get the listing on this script INFO=$(ls -l $0) # Grab the permission at the SUID position PERM=$(echo "$INFO" | cut -d " " -f 1 | cut -c 4) # Grab the owner OWNER=$(echo "$INFO" | cut -d " " -f 3) # Check for the SUID bit and the owner of root if [ "$PERM" == "s" -a "$OWNER" == "root" ];then echo "This script cannot be run as SUID root" exit 1 fi } # Function that will get and scrub user input to make it safer to use. function scrub_input { # Grab the user's input read USR_INPUT # Remove all characters that aren't alphanumeric or newline USR_INPUT=$(echo "$USR_INPUT" | tr -cd '[[:alnum:]n]') } # Function that sets certain environment variables to known values function clear_vars { # Save the old variables so that they can be restored OLD_IFS="$IFS" OLD_PATH="$PATH" # Set the variables to known safer values IFS=$' tn' #Set IFS to include whitespace characters PATH='/bin:/usr/bin' #Assumed safe paths } # Function that restores environment variables to what the were at the # start of the script. function restore_vars { IFS="$OLD_IFS" PATH="$OLD_PATH" } # Function that attempts to run a command safely via the whereis command. function run_cmd { # Attempt to find the command with the whereis command CMD=$(whereis $1 | cut -d " " -f 2) # Check to make sure that the command was found if [ -f "$CMD" ];then eval "$CMD" else echo "The command $CMD was not found" exit 127 fi }

This script starts out with our new and improved function which creates relatively safe temp files for us (create_temp). This was taken directly from Listing 28 which we’ve already discussed. After that, there’s the drop_root function that encapsulates the functionality from Listing 21. We can just call this function at the beginning of the script to make sure that we’re not being run with sudo and that the script is not SUID root. This function merely warns the user and exits, it does not give up it’s root privileges like BASH does. The next function reads input from the user and then removes everything but alphanumeric characters and the newline character. This is taken from Listing 19. The next two functions deal with environment variables. The first (clear_vars) saves the old variable values for both IFS and PATH, and then sets new values for each. The restore_vars function uses the saved variable values to reset the variables back to their original condition. This is the same concept as what we talked about in Listing 3 enclosed in functions. The last function (run_cmd) is similar to Listing 4, but I’ve expanded it a little bit to check if a file with the name of the command exists or not before trying to run it. If the command exists, it is run via the eval command. If the command does not exist, we warn the user and exit.

Listing 30 shows a simple script where I implement the collection of security specific functions in Listing 29.

Listing 30

#!/bin/bash - # File: security_src_test.sh # Script to test the sourcable script security_src.sh # Function to clean up after ourselves function clean_up { # Step through and delete all of the temp files for TMP_FILE in "${TEMPFILES[@]}" do # Make sure that the tempfile exists if [ -e "$TMP_FILE" ]; then echo "Temp file: $TMP_FILE" rm $TMP_FILE fi done # Reset the variables to their original values restore_vars } # Source the script that holds the security functions . security_src.sh # Make sure that we delete the temp files when we exit trap 'clean_up' EXIT # Array to hold the temporary files TEMPFILES=( ) # Variable to hold the user's input USR_INPUT="" # Make sure that we're not running with root privileges drop_root # Make sure that we have safe variables to work with clear_vars # Call the function that will create a temp file for us create_temp # Check to make sure that the temp file was created echo "${TEMPFILES[0]}" # Let the user know that input is expected printf "Please enter your input: " # Get and scrub the user input scrub_input # Test the user input echo $USR_INPUT # Try to safely run a command that exists run_cmd ls > /dev/null # Try to safely run a command that does not exist run_cmd foo

At the very top of the script I create a clean_up function that handles the removal of any temporary files, and calls the sourced function that restores the IFS and PATH variables to their original values. This function is used in the trap statement so that it will be called whenever the script exits. Just above the trap statement is where the script is sourced (security_src.sh) that gives us access to the security related functions. Continuing on down the script you see that I’ve created a couple of variables to hold the temporary file names and the user input. The names of these variables are from the sourced script. The sourced function drop_root ensures that the script is not being run with root privileges, and then clear_vars is called to make sure that IFS and PATH are safer to use. After that I call the create_temp function to set up a temporary file for me, and then immediately echo the name/path of the file by accessing the first element of the TEMPFILES array (echo "${TEMPFILES[0]}").

I prompt the user for input with an echo statement next, but instead of putting the read command directly in my script I call the scrub_input function and let it handle the task of getting the input from the user. When I ran the script I tried inputting several symbols that should not be allowed in the user input, and upon hitting enter I saw via the echo $USR_INPUT statement that the symbols were properly scrubbed from the input. The last two things that I do is to try to run two commands via the run_cmd function. The first time that I use the function I run the ls command, which I would expect to succeed. I use the > /dev/null section of the line to suppress the output from the ls command so that the output of the script doesn’t get too cluttered. The second command that I try to run with the run_cmd function is foo. I would not expect this command to be found, and have added it to show what the function does. Listing 31 shows the output that I get when I run the script in Listing 30.

Listing 31

$ ./security_src_test.sh /home/jwright/tmp/mEAPJhqgyb Please enter your input: ?blog blog The command foo: was not found Temp file: /home/jwright/tmp/mEAPJhqgyb

When I check the /home/jwright/tmp folder for the temporary file, I see that it was properly deleted by the script. I also see that the ls command was found since there is no error, but the foo command was not. This is exactly what was expected. The example script in Listing 30 is not a real world script by any means, but works to show you how you would use the sourced script, and what order you might want to call the sourced functions in. As always, I welcome any input on corrections, additions, and tweaks that you think should be added to these scripts or any scripts in this post. Tell me what you think in the comments section.

Troubleshooting

If you get any capital letters in the symbolic permission string for a file, it means that something is wrong. Usually if you get a capital “S” in the string, it means that you need to set execute rights for the owner or the group. A capital “T” means that you set the sticky bit without setting the execute permission for other/world on the file or directory.

Conclusion

As I stated when we started this post, I haven’t been able to cover every aspect shell script security, and for the most part I avoided the issue of system security as that’s an even larger (but related) subject. It’s simply been my hope that I’ve given you a good starting point to plug some of the common security holes in your own scripts. Using this as a starting point, have a look at the Resources section for more information, and make sure to take opportunities to continue your learning on script, program, and system security whenever they arise.

Resources

Books

Links

  1. Purdue University’s Center for Education and Research in Information Assurance and Security
  2. Ubuntu Forums Announcement A Few Malicious Commands To Avoid In Forums/Posts/Lists
  3. LinuxSecurity.com Article On The shc Utility That Encrypts Shell Scripts
  4. shc Utility Homepage
  5. Linux Journal Article On Security Concerns When Using shc
  6. 7 Tips On Script Security By James Turnbull (requires registration)
  7. TLDP Advanced Bash-Scripting Guide: Chapter 35 – Miscellany
  8. Mac OS X Article On Shell Script Security That Gives Examples Of Attacks
  9. Article From faq.org On SUID Shell Scripts
  10. Practical Unix & Internet Security – Chapter 5 – Section 5.5 (SUID)
  11. Practical Unix & Internet Security – Chapter 23 – Writing Secure SUID and Network Programs
  12. Help Net Security Article On Unix Shell Scripting Malware
  13. More SUID Vulnerability Information
  14. Short Article On Linuxtopia About THe Dangers Of Running Untrusted Shell Scripts
  15. etutorials.org Article On Useful Shell Utilities For Scripts
  16. Examples Of Risky Scripts To Use With sudo
  17. Very Good LinuxSecurity.com Article On Creating Safe Temporary Files
  18. IBM developerWorks Article: Secure programmer: Developing secure programs
  19. IBM developerWorks Article: Secure programmer: Validating input
  20. IBM developerWorks Article: Secure programmer: Keep an eye on inputs
  21. Article on SUID, SGID, and Stick Bits In Linux And Unix

Writing Better Shell Scripts – Part 2

Quick Start

As with Part 1 of this series, this information does not lend itself to having a “Quick Start” section. With that said, you can read the How-To section of this post for a quick general overview. I would highly recommend reading everything though, as a good understanding of the concepts and commands outlined here will serve you well in the future. Video and Audio are also included with this post which may work as a quick reference for you. Don’t forget that the man and info pages of your Linux/Unix system can be an invaluable resource as well when you’re learning commands and solving problems.

Video

Audio

Download

Preface

To make things easier on you, all of the black command line and script areas are set up so that you can copy the text from them. This does make using the commands easier, but if you’re not already familiar with the concepts presented here, typing the commands yourself and working through why you’re typing them will help you learn more. If you hit problems along the way, take a look at the Troubleshooting section near the end of this post for help.

There are formatting conventions that are used throughout this post that you should be aware of. The following is a list outlining the color and font formats used.

Command Name or Directory Path
Warning or Error
Command Line Snippet With Commands/Options/Arguments
Command Options and Their Arguments Only
Hyperlink

Overview

This post is the second in a series on shell script debugging, error handling, and security. The content of this post will be geared mainly toward BASH users, but there will be information that’s suitable for users of other shells as well. Information such as techniques and methodologies may transfer very well, but BASH specific constructs and commands will not. The users of other shells (CSH, KSH, etc) will have to do some homework to see what transfers and what does not.

There are a lot of opinions about how error handling should be done, which range from doing nothing to implementing comprehensive solutions. In this post, as well as my professional work, I try to err on the side of in-depth solutions. Some people will argue that you don’t need to go through the trouble of providing error handling on small single-user scripts, but useful scripts have a way of growing past their original intent and user group. If you’re a system administrator, you need to be especially careful with error handling in your scripts. If you or an admin under you gets careless, someday you may end up getting a call from one of your users complaining that they just deleted the contents of their home directory – with one of your scripts. It’s easier to do than you might think when precautions are not taken. All you need are a couple of lines in your script like the those in Listing 1.

Listing 1

#!/bin/bash cd $1 rm -rf *

So what happens if a user forgets to supply a command line argument to Listing 1? The cd command changes into the user’s home directory, and the rm command deletes all of their files and directories without prompting. That has the makings of a bad day for both you and your user. In this post I’ll cover some ways to avoid this kind of headache.

To help ease the extra burden of making your scripts safer with error handling, we’ll talk about separating error handling code out into reusable modules which can be sourced. Once you do this and become familiar with a few error handling techniques, you’ll be able to implement robust error handling in your scripts with less effort.

The intent of this post is to give you the information you need to make good judgments about error handling within your own scripts. Both proactive and reactive error handling techniques will be covered so that you can make the decision on when to try to head off errors before they happen, and when to try to catch them after they happen. With those things in mind, lets start off with some of the core elements of error handling.

BASH Options

There are several BASH command line options that can help you avoid some errors in your scripts. The first two are ones that we already covered in Part 1 of this series. The -e option, which is the same as set -o errexit, causes BASH to exit as soon as it detects an error. While there are a significant number of people who promote setting the -e option for all of your scripts, that can prevent you from using some of the other error handling techniques that we’ll be talking about shortly. The next option -u, which is the same as set -o nounset causes the shell to throw an error whenever a variable is used before its value has been set. This is a simple way to prevent the risky behavior of Listing 1. If the user does not provide an argument to the script, the shell will see it as the 1 variable not being set and complain. This is usually a good option to use in your scripts.

set -o pipefail is something that we’ll touch on in the Command Sequences section and causes a whole command pipeline to error out if just one of the sections has an error. The last shell option that I want to touch on is set -o noclobber (or the -C option) which helps you because it prevents the overwriting of files with redirection. You will just get an error similar to cannot overwrite existing file. This can save you when you’re working with system configuration files, as overwriting one of them could result in any number of big problems. Listing 2 holds a quick reference list of these options.

Listing 2

errexit (-e) Causes the script to exit whenever there is an error. noclobber (-C) Prevents the overwriting of files when using redirection. nounset (-u) Causes the shell to throw an error whenever an unset variable is used. pipefail Causes a pipeline to error out if any section has an error.

Exit Status

Exit status is the 8-bit integer that is returned to a parent process when a subprocess exits (either normally or is forced to exit). Typically, an exit status of 0 means that the process completed successfully, and a greater than 0 exit status means that there was a problem. This may seem counter intuitive to C/C++ programmers who are used to true being 1 (non-zero) and false being 0. There are exceptions to the shell’s exit status standard, so it’s always best to understand how the distribution/shell/command combo you’re using will handle the exit status. An example of a command that acts differently is diff. When you run diff on two files, it will return 0 if the files are the same, 1 if the files are different, and some number greater than 1 if there was an error. So if you checked the exit status of diff expecting it to behave “normally”, you would think that the command failed when it was really telling you that the files are different.

Probably the easiest way to begin experimenting with exit status is to use the BASH shell’s built-in ? variable. The ? variable holds the exit status of the last command that was run. Listing 3 shows an example where I check the exit status of the true command which always gives an exit status of 0 (success), and of the false command which always gives an exit status of 1 (failure). Credit goes to William Shotts, Jr. who’s straight forward use of true and false in his examples on this topic inspired some of the examples in this post.

Listing 3

$true $echo $? 0 $false $echo $? 1

In this case the true and false commands follow the 0 = success, non-zero = failure standard, so we can be certain whether or not the command succeeded. As stated above though, the meaning of the exit status is not always so clear. I check the man page for any unfamiliar commands to see what their exit statuses mean, and I suggest you do the same with the commands you use. Listing 4 lists some of the standard exit statuses and their usual meanings.

Listing 4

0 Command completed successfully. 1-125 Command did not complete successfully. Check the command's man page for the meaning of the status. 126 Command was found, but couldn't be executed. 127 Command was not found. 128-254 Command died due to receiving a signal. The signal code is added to 128 (128 + SIGNAL) to get the status. 130 Command exited due to Ctrl-C being pressed. 255 Exit status is out of range.

For statuses 128 through 254, you see that the signal that caused the command to exit is added to the base status of 128. This allows you to subtract 128 from the given exit status later to see which signal was the culprit. Some of the signals that can be added to the base of 128 are shown in Listing 5 and were obtained from the signal man page via man 7 signal . Note that SIGKILL and SIGSTOP cannot be caught, blocked, or ignored because those signals are handled at the kernel level. You may see all of these signals at one time or another, but the most common are SIGHUP, SIGINT, SIGQUIT, SIGKILL, SIGTERM, and SIGSTOP.

Listing 5

Signal Value Action Comment ────────────────────────────────────────────────────────────────────── SIGHUP 1 Term Hangup detected on controlling terminal or death of controlling process SIGINT 2 Term Interrupt from keyboard SIGQUIT 3 Core Quit from keyboard SIGILL 4 Core Illegal Instruction SIGABRT 6 Core Abort signal from abort(3) SIGFPE 8 Core Floating point exception SIGKILL 9 Term Kill signal SIGSEGV 11 Core Invalid memory reference SIGPIPE 13 Term Broken pipe: write to pipe with no readers SIGALRM 14 Term Timer signal from alarm(2) SIGTERM 15 Term Termination signal SIGUSR1 30,10,16 Term User-defined signal 1 SIGUSR2 31,12,17 Term User-defined signal 2 SIGCHLD 20,17,18 Ign Child stopped or terminated SIGCONT 19,18,25 Cont Continue if stopped SIGSTOP 17,19,23 Stop Stop process SIGTSTP 18,20,24 Stop Stop typed at tty SIGTTIN 21,21,26 Stop tty input for background process SIGTTOU 22,22,27 Stop tty output for background process

A listing of signals which only shows the symbolic and numeric representations without the descriptions can be obtained with either kill -l or trap -l .

You can explicitly pass the exit status of the last command executed back to the parent process (most likely the shell) with a line like exit $? . You can do the same thing implicitly by calling the exit command without an argument. This works fine if you want to exit immediately, but if you want to do some other things with the exit status first you’ll need to store it in a variable. This is because after you read the ? variable once, it resets. Listing 6 shows one way of using an if statement to pass the exit status back to the parent after implementing your own error handling functionality.

Listing 6

#!/bin/bash - # Run the command(s) false # Save the exit status (it's reset once we read it) EXITSTAT=$? # If the command has a non-zero exit status if [ $EXITSTAT -gt 0 ] then echo "There was an error." exit $EXITSTAT #Pass the exit status back to parent fi

You can also use an if statement to directly test the exit status of a command as in Listing 7. Notice that using the command this way resets the ? variable so that you can’t use it later.

Listing 7

#!/bin/bash - # If the command has a non-zero exit status if ! false then echo "There was an error." exit 1 fi

The if ! false statement is the key here. What’s inside of the if statement will be executed if the command (in this case false) returns a non-zero exit status. Using this type of statement can give you a chance to warn the user of what’s going on and take any actions that are needed before the script exits.

You can also use the if and test combination in more complex ways. For instance, according to its man page, the ls command uses an exit status of 0 for no errors, 1 for minor errors like not being able to access a sub directory, and 2 for major errors like not being able to access a file/directory specified on the command line. With this in mind, take a look at Listing 8 to see how you could differentiate between the “no error”, “minor error”, and “major error” conditions.

Listing 8

#!/bin/bash - function testex { # We can only read $? once before it resets, so save it exitstat=$1 # See which condition we have if test $exitstat -eq 0; then echo "No error detected" elif test $exitstat -eq 1; then echo "Minor error detected" elif test $exitstat -eq 2; then echo "Major error detected" fi } # Try a listing the should succeed echo "- 'ls ~/*' Executing" ls ~/* &> /dev/null # Check the success/failure of the ls command testex $? # Try a listing that should not succeed echo "- 'ls doesnotexist' Executing" ls doesnotexist &> /dev/null testex $?

Inside the testex function I have placed code that looks for specific exit statuses and then tells the user what was found. Normally you wouldn’t worry about handling the situation where there’s no error (exit status 0), but doing so helps clarify the concept in our example. The output that you would get from running this script is shown in Listing 9.

Listing 9

$ ./testex.sh - 'ls ~/*' Executing No error detected - 'ls doesnotexist' Executing Major error detected

There are a couple of final things to be aware of when you’re using the ? variable. First, remember that whenever you use ? from the command line or in a script, the shell resets its value. If you need to use the ? variable more than once in your script, you’ll want to store it’s value in another variable and use that. The second is that ? becomes ineffective when you are using the -e option or the line set -o errexit. The reason for this is that the script will exit as soon as an error is detected, and so you never get a chance to check the ? variable.

The command_not_found_handle Function

As of BASH 4.0, the provision for a command_not_found_handle function has been added. This function makes it possible to display user friendly messages when a command the user types is not found. BASH searches for the command and if it’s not found anywhere, BASH looks to see if you have the command_not_found_handle function defined. If you do, that function is invoked passing it the attempted command and its arguments so that a useful message can be displayed. If you use a Debian or Ubuntu system you’ve probably seen this in action as they’ve had this feature for awhile. Listing 10 shows an example of the command_not_found_handle function output on an Ubuntu 9.10 system.

Listing 10

$cat2 No command 'cat2' found, did you mean: Command 'cat' from package 'coreutils' (main) cat2: command not found

You can implement/override the behavior of the command_not_found_handle function to provide your own functionality. Listing 11 shows an implementation of the command_not_found_handle function inside of a stand-alone script. In most cases you would want to add it to your BASH configuration file(s) so that you can make use of the function anytime that you’re at the shell prompt.

Listing 11

#!/bin/bash - # File: cmdnf.sh function command_not_found_handle { echo "The command ($1) is not valid." exit 127 #The command not found status } cat2

You would access the arguments to the original (not found) command via $2, $3 and so on. Notice that I used the exit command and passed it the code of 127, which is the command not found exit status. The exit status of the whole script is the exit status of the command_not_found_handle function. If you don’t set the exit status explicitly the script will end up returning 0 (success), thus preventing a user or script from using the exit status to determine what type of error occurred. Propagation of the exit status and terminating signal (which we’ll talk about later) is a good thing to do to prevent your users from missing important information and/or having problems. When run, the script in Listing 11 gives you the following output in Listing 12.

Listing 12

$./cmdnf.sh The command (cat2) is not valid. $echo $? 127

Command Sequences

Command sequences are multiple commands that are linked by pipes or logical short-circuit operators. Two logical short-circuits are the double ampersand (&&) and double pipe (||) operators. The && only allows the command that comes after it in the series to be executed if the previous command exited with a status of 0. The || operator does the opposite by only allowing the next command to be executed if the previous one returned a non-zero exit status. Listing 13 shows examples of how each of these work.

Listing 13

$true && echo 'Hello World!' Hello World! $false && echo 'Hello World!' $true || echo 'Hello World!' $false || echo 'Hello World!' Hello World!

So, one of the many ways to solve the unset variable problem we see in Listing 1 is the example shown in Listing 14.

Listing 14

#!/bin/bash #Make sure the user provided a command line argument [ -n "$1" ] || { echo "Please provide a command line argument."; exit 1; } #Change to the directory and delete the files and dirs cd $1 && rm -rf *

In the first line of interest, we check to make sure that the value of $1 is not null. If that test command fails, it means that $1 is unset and that the user did not provide a command line argument. Since the || operator only allows the next command to run if the previous one fails, our code block warns the user of their mistake and exits with a non-zero status. If a command line argument was supplied, the script continues on. In the second interesting line we use the && operator to run the rm command if, and only if, the cd command succeeds. This keeps us from accidentally deleting all of the files and directories in the user’s/script’s current working directory if the cd command fails for some reason.

The next type of command sequence that we’re going to cover is a pipeline. When commands are piped together, only the last return code will be looked at by the shell. If you have a series of pipes like the one in Listing 15, you would expect it to show a non-zero exit status, but instead it’s 0.

Listing 15

$true | false | true $echo $? 0

To change the shell’s behavior so that it will return a non-zero value for a pipeline if any of it’s elements have a non-zero exit status, use the set -o pipefail line in your script. The result of using pipefail is shown in Listing 16.

Listing 16

$set -o pipefail $true | false | true $echo $? 1

This method doesn’t give you any insight into where in the pipeline your error occurred though. In many cases I prefer to use the BASH array variable PIPESTATUS to check pipelines. It gives you the ability to tell where in the pipeline the error occurred, so that your script can more intelligently adapt to or warn about the error. Listing 17 gives an example.

Listing 17

$true | false | true $echo ${PIPESTATUS[0]} ${PIPESTATUS[1]} ${PIPESTATUS[2]} 0 1 0

To keep things clean inside your script, you might put the code to check the PIPESTATUS array into a function and use a loop to process the array elements. This way you have reusable code that will automatically adjust to the number of commands that are in your pipe. One of the scripts in the Scripting section shows this technique.

If you’re running a version of BASH prior to 3.1, a potential problem with using pipes is the Broken pipe warning. If a reader in a pipeline finishes before its writer completes, the writer command will get a SIGPIPE signal which causes the Broken pipe warning to be thrown. It may be a non-issue for you, but it doesn’t hurt to be aware of it. If you’re running a version of BASH that’s 3.1 or higher, you use the PIPESTATUS variable to see if there’s been a pipe error. I’ve done this in Listing 18 where I’ve written two scripts that will cause the pipeline to break. The code inside the scripts doesn’t really matter in this case, just the end result.

Listing 18

$./pipeerr2.sh | ./pipeerr.sh test test test $echo ${PIPESTATUS[0]} ${PIPESTATUS[1]} 141 0

You can see that the pipe exit status for the first script (or pipeline section) is 141. This number actually results from the addition of a base exit status and the signal code, which I’ve mentioned before. The base status is 128, which the shell uses to signify that a command stopped due to receiving a signal rather than exiting normally. Added to that is the code of the signal that caused the termination, which in this case is 13 (SIGPIPE) on my system. This technique embeds the signal code in the exit status in a way that makes it easy to retrieve. Since the status is built by adding 128 and 13, all I have to do is use arithmetic expansion to extract the signal code from Listing 18: echo $((${PIPESTATUS[0]}-128)) . This gives me output showing the value of 13, which is what we expect. Keep in mind that the PIPESTATUS array variable is like the ? variable in that it resets once you access it or a new pipeline is executed.

As stated in Part 1 of this series, you can replace pipes with temporary files. This will eliminate the SIGPIPE and exit status pitfalls of pipes, but as stated before temp files are much slower than pipes and require you to clean them up after you’re done with them. In general, I would suggest staying away from temp files unless you have a compelling reason to use them. A compromise between temp files and pipes might be named pipes. On modern Linux systems you use the mkfifo command to create a named pipe, which you can then use with redirection. On older systems you may have to use mknod instead to create the pipe. In Listing 19 you can see that I’ve used named pipes instead of regular pipes, and that this technique allows me to check each of the sections of the pipeline as they’re used. Keep in mind that I’m reading from the named pipe in another terminal with cat < pipe1 since a line like true > pipe1 will block until the pipe has been read from. Also notice that I use the rm command to delete the named pipe after I’m done with it. I do this as a housekeeping measure, since I don’t want to leave named pipes laying around that I don’t need.

Listing 19

$mkfifo pipe1 $true > pipe1 $echo $? 0 $false > pipe1 $echo $? 1 $rm pipe1

Wrapper Functions

If there’s a command that you’re using multiple times in your script and that command requires some error handling, you might want to think about creating a wrapper function. For instance, in Listing 1 the cd command has the unwanted side effect of switching to the user’s home directory if the user hasn’t supplied a command line argument. If you’re using cd multiple times throughout the script, you could write a function that extends cd‘s functionality. Listing 20 shows an example of this.

Listing 20

#!/bin/bash - function cdext { # We want to make sure that the user gave an argument if [ $# -eq 1 ] then cd $1 else echo "You must supply a directory to change to." exit 1 fi } # This should succeed cdext /tmp # Make sure that it did succeed pwd # This should fail with our warning cdext

I first use the shell’s built-in # variable to make sure that the user has specified and single argument. It would probably also be a good idea to add a separate else statement to warn the user that they supplied too many arguments. If the user supplied the single argument, the function uses cd to change to that directory and we make sure it worked correctly with the pwd command. If the user didn’t supply a command line argument, we warn them of their error and exit the script. This simple function adds an extra restriction to the cd command’s usage to help make your script safer.

To make the most of this technique you need to understand what types of things can go wrong with a command. Make sure that you’ve learned enough about the command, through resources like the man page, to handle the potential errors properly.

“Scrubbing” Error Output

What I mean by scrubbing in this instance is searching through the error output from a command looking for patterns. That pattern could be something like “file not found” or “file or directory does not exist”. Essentially what you’re doing is looking through the command’s output trying to find a string that will give you specific information about what error occurred. This method tends to be very brittle, meaning that the slightest change in the output can break your script. For this reason I don’t recommend this method, but in some cases it may be your only choice to gather more specific information about a command’s error condition. One method to make this technique slightly more robust would be to use regular expressions and case insensitivity. In Listing 21 I’ve provided a very simple example of output scrubbing.

Listing 21

$ls doesnotexist 2>&1 | grep -i "file not found" $ls doesnotexist 2>&1 | grep -i "no such" ls: cannot access doesnotexist: No such file or directory

Notice that I’m using the -i option of grep to make it case insensitive. I’m also redirecting both stdout and stderr into the pipe with the 2>&1 statement. That way I can search all of of the command’s messages, errors, and warnings looking for the pattern of interest. In the first search statement I look for the pattern “file not found”, which is not a statement found in the ls command’s output. When I search for the statement “no such”, I get the line of output that contains the error. You could push this example a lot further with the use of regular expressions, but even if you’re very careful a simple change to the command’s output by the developer could leave your script broken. I would suggest filing this technique away in your memory and using it only when you’re sure there’s not a better way to solve the problem.

Being A Good Linux/UNIX Citizen

There are some signals that we need to take extra care in dealing with, such as SIGINT. With SIGINT all processes in the foreground see the signal, but the innermost (foremost) child process decides what will be done with the signal. The problem with this is that if the innermost process just absorbs the SIGINT signal and doesn’t act on it and/or send it on up to it’s parent, the user will be unable to exit the program/script with the Ctrl-C key combination. There are a few applications that trap this signal intentionally which is fine, but doing this on your own can lead to unpredictable behavior and is what I would consider to be an undesirable practice. Try to avoid this in your own scripts unless you have a compelling reason to do otherwise and understand the consequences. To get around this issue we’ll propagate signals like SIGINT up the process stack to give the parent(s) a chance to react to them.

One way of handling error propagation is shown in Listing 22 where I’ve assumed that the shell is the direct parent of the script.

Listing 22

#!/bin/bash - function int_handler { echo "SIGINT Caught" #Propagate the signal up to the shell kill -s SIGINT $$ # 130 is the exit status from Ctrl-C/SIGINT exit 130 } # Our trap to handle SIGINT/Ctrl-C trap 'int_handler' INT while true do : done

First of all, don’t get caught up in the trap statement if you don’t already know what it is. We’ll talk about traps shortly. This script busy waits in a while loop until the user presses Ctrl-C or the system sends the SIGINT signal. When this happens the script uses the kill command to send SIGINT on up to the shell (who’s process ID is represented by $$ in the line kill -s SIGINT $$), and then exits with an exit status corresponding to a forced exit due to SIGINT. This way the shell gets to decide what it wants to do with the SIGINT, and the exit status of our script can be examined to see what happened. Our script handles the signal properly and then allows everyone else above it to do the same.

Error Handling Functions

Since you’re most likely going to be using error handling code in multiple places in your script, it can be helpful to separate it out into a function. This keeps your script clean and free of duplicate code. Listing 23 shows one of the many ways of using a function to encapsulate some simple error handling functionality.

Listing 23

#!/bin/bash - function err_handler { # Check to see which error code we were given if [ $1 -eq 1001 ]; then echo "Non-Fatal Error #1 Has Occurred" # We don't need to exit here elif [ $1 -eq 1002 ]; then echo "Fatal Error #2 Has Occurred" exit 1 # Error was fatal so exit with non-zero status fi } # Notice that I'm using my own made up error codes (1001, 1002) err_handler 1001 err_handler 1002

Notice that I made up my own error codes (1001 and 1002). These have no correlation to any exit status of any of the commands that my script would use, they’re just for my own use. Using codes in this way keeps me from having to pass long error description strings to my function, and thus saves typing, space, and clutter in my code. The drawback is that someone modifying the script later (maybe years later) can’t just glance at a line of code (err_handler 1001) and know what error it is referring to. You could help lessen this problem by placing error code descriptions in the comments at the top of your script. When I run the script in Listing 23 I get the output in Listing 24.

Listing 24

$./err_handler.sh Non-Fatal Error #1 Has Occurred Fatal Error #2 Has Occurred $

Introducing The trap Command

The trap command allows you to associate a section of code with a particular signal (see Listing 5), so that when the signal is seen by the shell the code is run. The shell essentially sets up a signal handler for the signal associated with the trap. This can be very handy to allow you to correct for errors, log what happened, or remove things like temporary files before your script exits. These things highlight one of the downsides to using kill -9 because SIGKILL is one of the two signals that can’t be trapped. If you use SIGKILL, the process that you’re killing won’t get a chance to clean up after itself before exiting. That could leave things like temporary files and stale file locks around to cause problems later. It’s better to use SIGTERM to end a process because it gives the process a chance to clean up.

Listing 25 shows a couple of ways to use the trap command in a script.

Listing 25

#!/bin/bash - function exit_handler { echo "Script Exiting" } trap "echo Ctrl-C Caught; exit 0" int trap 'exit_handler' EXIT while true do : done

Notice that I first use a semi-colon separated list of commands with trap to catch the SIGINT (Ctrl-C) signal. While this particular implementation is bad design because it doesn’t propagate SIGINT, it allows me to keep the example simple. The exit 0 statement is what causes the second trap that’s watching for the EXIT condition to be triggered. This second trap uses a function instead of a semi-colon separated list of commands. This is a cleaner way to handle traps that promotes code reuse, and except in simple cases should probably be your preferred method. Notice the form of the SIGINT specifier that I use at the end of the first trap statement. I use int because the prefix SIG is not required, and the signal declaration is not case sensitive. The same applies when using signals with commands like kill as well. You’re also not limited to specifying one signal per trap. You can append a list of signal specifiers onto the end of the trap statement and each one will use the error handling code specified within the trap.

One tip to be aware of is that you can specify the signals by their numeric representation, but I would advise against it. Using their symbolic representation tells anyone looking at your script (which could even be you years from now) at a glance which signal you’re using. There’s no chance for misinterpretation, and symbolic signals are more portable than just specifying a signal number since numbers tend to vary more by platform.

The output from running the script in Listing 25 and hitting Ctrl-C is shown in Listing 26. Notice that the SIGINT trap is processed before the EXIT trap. This is the expected behavior because the traps for all other signals should be processed before the EXIT trap.

Listing 26

$./trapuse.sh ^CCtrl-C Caught Script Exiting $

There are four signal specifiers that you’re probably going to be most interested in when using traps and they are INT, TERM, EXIT, and ERR. All of these have been touched on so far except for ERR. If you remember from above, you could use set -o errexit to cause the shell to exit on an error. This was great from the standpoint that it kept your script from running after a potentially dangerous error had occurred, but kept you from handling the error yourself. Setting a trap using the ERR signal specifier takes care of this shortcoming. The shell receives an ERR signal on the same conditions that cause an exit with errexit, so you can use a trap statement to do any clean up or error correction before exiting. ERR does have the limitation that an error is not detected if it is enclosed in a command sequence, if statement test, a while or until statement, or if the command’s exit status is being inverted by an ! . On older versions of BASH command substitutions $(...) that fail may not be caught by a trap statement either.

You can reset traps back to their original conditions before they were associated with commands using the - command specifier. For example, in the script in Listing 25 you could add the line trap - SIGINT after which the code for the SIGINT trap would no longer be called when the user hits Ctrl-C. You can also cause the shell to ignore signals by passing a null string as a signal specification as in trap "" SIGINT . This would cause the shell to ignore the user whenever they press the Ctrl-C key combination. This is not recommended though as it makes it harder for the user to terminate the process. It’s a better practice to do our clean up and then propagate the signal in the way that we talked about earlier. A handy trick is that you can simulate the functionality of the nohup command with a line like trap "" SIGHUP . What this does is cause your script to ignore the HUP (Hangup) signal so that it will keep running even after you’ve logged out.

If you run trap by itself without any arguments, it outputs the traps that are currently set. Using the -p option with trap causes the same behavior. You can also supply signal specifications (trap -p INT EXIT) and trap will output only the commands associated with those signals. This output can be redirected and stored, and with a little bit of work read back into a script to reinstate the traps later. Listing 27 shows two lines of output from the addition of the line trap -p to the script in Listing 25 just before the while loop.

Listing 27

trap -- 'exit_handler' EXIT trap -- 'echo Ctrl-C Caught; exit 0' SIGINT

Even with all the information that I’ve given you on the trap command, there’s still more information to be had. I’ve tried to hit the highlights that I think will be most useful to you. You can open the BASH man page and search for “trap” if you want to dig deeper.

How-To

In this section I’m going to use a few of the different methods that we’ve discussed to fix the script in Listing 1. The goal is to protect the user from unexpected behavior such as having everything in their home directory deleted. I won’t cover every single way of solving the problem, instead I’ll be integrating a few of the topics we’ve covered into one script to show some practical applications. It’s my hope that by this point in the post you’re starting to see your own solutions and will be able to build on (and/or simplify) what I do here.

If you look at Listing 28 I’ve added the -u option to the shebang line of the script, and also added a check to make sure that the directory exists before changing to it.

Listing 28

#!/bin/bash -u if [ ! -d $1 ];then echo "Please provide a valid directory." exit 1 fi cd $1 rm -rf *

Listing 29 shows what happens when I make a couple of attempts at running the script incorrectly.

Listing 29

$./l1cor_1.sh ./l1cor_1.sh: line 3: $1: unbound variable $./l1cor_1.sh /doesnotexist Please provide a valid directory.

The -u option causes the unbound variable error because $1 will not be set if the user doesn’t supply at least one command line argument. The if/test statement declares that if the directory does not exist we will give the user an error message and then exit. There are also other checks that you could add to Listing 28 including one to make sure that the directory is writable by the current user. Ultimately you decide which checks are necessary, but the end goal with this particular example is to make sure that any dangerous behavior is avoided.

Listing 28 still has a problem because the rm command will run even if the cd command has thrown an error (like Permission denied). To fix this I’m going to rearrange the cd and rm commands into a command sequence using the && operator, and then check the exit status of the sequence. You can see these changes in Listing 30.

Listing 30

#!/bin/bash -u if [ ! -d $1 ];then echo "Please provide a valid directory." exit 1 fi cd $1 && rm -rf * if [ $? -gt 0 ];then echo "An error occurred during the cd/rm process." exit 1 fi

The double ampersand (&&) will cause the command sequence to exit if the cd command fails, thus ignoring the rm command. I do this to catch any of the other errors that can occur with the cd command. If there’s an unknown error with the cd command, we don’t want rm to delete all of the files/directories in the current directory. Remember that I can only check the exit status of the last command in the sequence, which doesn’t tell me whether it was cd or rm that failed. As a work around to this I’ll check to see if the rm command succeeded in the next step where I set a trap on the EXIT signal. I’ve added the trap statement and a function to use with the trap in Listing 31.

Listing 31

#!/bin/bash -u # A final check to let the user know if this script failed # to perform its primary function - deleting files function exit_handler { # Count the number of lines (files/dirs) in the directory DIR_ENTRIES=$(ls $1 | wc -l) # If there are still files in there throw an error message if [ $DIR_ENTRIES -gt 0 ];then echo "Some files/directories were not deleted" exit 1 fi } # We want to check one last thing before exiting trap 'exit_handler $1' EXIT # If the directory doesn't exist, warn the user if [ ! -d $1 ];then echo "Please provide a valid directory." exit 1 fi # Don't execute rm unless cd succeeds and suppress messages cd $1 &> /dev/null && rm -rf * &> /dev/null # If there was an error with cd or rm, warn the user if [ $? -gt 0 ];then echo "An error occurred during the cd/rm process." exit 1 fi

I’m not saying that this is the most efficient way to solve this problem, but it does show you some interesting uses of the techniques we’ve talked about. I went ahead and suppressed the messages from cd and rm so that I could substitute my own. This is done with the &> /dev/null additions to the command sequence. I also added the trap 'exit_handler $1' EXIT line to the script, which sets a trap for the EXIT signal and uses the exit_handler function to handle the event. Notice the use of single quotes around the 'exit_handler $1' argument to trap. This keeps the $1 variable reference from being expanded until the trap is called. We need that variable so that our exit handler can check the directory to make sure that all the files and directories were deleted. For our purposes the example script is now complete and does a reasonable job of protecting the user, but there is plenty of room for improvement. Tell us how you would change Listing 31 to make it better and/or simpler in the comments section of this post.

Tips and Tricks

  • You can sometimes use options with your commands to make them more fault tolerant. For instance the -p option of mkdir automatically creates the parents of the directory you specify if they don’t already exist. This keeps you from getting a No such file or directory error. Just make sure the options you use don’t introduce their own new problems.
  • It’s usually a good idea to enclose variables in quotation marks, especially the @ variable. Doing this ensures that your script can better handle spaces in filenames, paths, and arguments. So, doing something like echo "$@" instead of echo $@ can save you some trouble.
  • You can lessen your chances of leaving a file (like a system configuration file) in an inconsistent state if you make changes to a copy of the file and then use the mv command to put the altered file in place. Since mv typically only changes the information for the file and doesn’t move any bits, the changeover is much faster so it’s less likely that another program will try to access the file in the time the change is being made. There are a few subtle issues to be aware of when using this method though. Have a look at David Pashley’s article (link #2) in the Resources section for more details.
  • You can use parameter expansion (${...}) to avoid the null/unset variable problem that you see in Listing 1. Using a line like cd ${1:?"A directory to change to is required"} would display the phrase “A directory to change to is required” and exit the script if the user didn’t provide the command line argument represented by $1 . When used inside a script, the line gives you error output similar to ./expansion.sh: line 3: 1: A directory to change to is required
  • When you’re accepting input from a user, you can make your script more forgiving by using regular expressions and the case insensitive options of your commands. For instance, use the -i option of grep so that your script will not care whether it matches “Yes” or “yes”. With a regular expression, you could be as vague as ^[yY].* to match “y”, “Y”, “ya”, “Ya”, “Yeah”, “yeah”, “yes”, “Yes” and many other entries that begin with an upper/lower case “y” and have 0 or more letters that come after it.
  • Always check to make sure that you got the expected number of command line arguments before going any further in your script. If possible, also check the arguments to make sure that they’re what you expect (i.e. that a phone number wasn’t given for a directory name).
  • To avoid introducing portability errors when writing scripts for the Bourne Shell (sh), you can use the checkbashisms program from the devscripts package. This program will check to make sure that you don’t have any BASH specific statements in your Bourne Shell script.
  • Don’t catch an error on a low level inside your script and not pass it back up the stack to the parent. This can cause your program to behave in a non-standard (non-Unix) way.
  • If you have a script that runs in the background, it can create a predefined file and redirect output to it so that you can see what/when/how/why your script exited.
  • If you use file locks in your scripts, you’ll want to check for dead/stale file locks each time your script starts. This is because a user may have issued a kill -9 (SIGKILL) command on your script, which doesn’t give your script a chance to clean up it’s lock files. If you don’t check for stale/dead locks, your user could end up having to remove the locks themselves manually, which is definitely not ideal.
  • When you have a script that is processing a large amount of data/files, you can use trap to keep track of where your script was in the event of an unexpected exit. One way to do this would be to echo a filename into a predefined file when the trap is triggered. You can then read the start location back into the script when it starts up again and resume where you left off. If there’s a really large amount of data and you need to make sure your script keeps its place, you should probably already be continuously tracking the progress as part of the processing loop and using the trap(s) as a fallback.

Scripting

In this scripting section I’m going to create a script that we can source to add ready made error handling functions to other scripts. You will also see a couple of conceptual additions such as the use of code blocks in an attempt to streamline sections of code. Listing 32 shows the modular script that you can source, and Listing 33 shows it in use.

Listing 32

#!/bin/bash -u # File: error_source.sh # Holds functions that can be used to more easily add error handling # to your scripts. # The -u option in the shebang line above causes the shell to throw # an error whenever a variable is unset. # Define our handlers for errors and/or forced exits trap 'fatal_err $LINENO 1001' ERR #Handle uncaught errors trap 'clean_up; exit' HUP TERM #Clean up and exit on SIGHUP or SIGTERM trap 'clean_up; propagate' INT #Clean up after and propagate SIGINT trap 'clean_up' EXIT #Clean up last thing before we exit PROGNAME=$(basename $0) #Error source program name TEMPFILES=( ) #Array holding temp files to remove on script exit # This function steps through each pipe section's exit status to see if # there was an error anywhere. Takes as an argument the line number # that's being checked. function check_pipe { # We want to see if there was an error somewhere in the pipeline for PIPEPART in $2 do # There was an error at the current part of the pipeline if [ "$PIPEPART" != "0" ] then nonfatal_err $1 1002 return 0; #We don't need to step through the rest fi done } # Function that gets rid of things like temp files before an exit. function clean_up { # We want to remove all of the temp files we created for TFILE in ${TEMPFILES[@]} do # If the file doesn't exist, skip it [ -e $TFILE ] || continue # Notice the use of a code block to streamline this check { # If you use -f, errors are ignored rm --interactive=never $TFILE &> /dev/null } || nonfatal_err $LINENO 1001 done } # Function to create "safe" temporary files which we'll get into more in the # next blog post on security. function create_temp { # Give preference to user tmp directory for security if [ -e "$HOME/tmp" ] then TEMP_DIR="$HOME/tmp" else TEMP_DIR="/tmp" fi # Construct a "safe" temp file name TEMP_FILE="$TEMP_DIR"/"$PROGNAME".$$.$RANDOM # Keep the file in an array to remove it later TEMPFILES+=( "$TEMP_FILE" ) { touch $TEMP_FILE &> /dev/null } || fatal_err $LINENO "Could not create temp file $TEMP_FILE" } # Function that handles telling the user about critical errors that # force an exit. It takes 2 arguments, a line number near where the # error occurred, and an error code / message telling what happened. function fatal_err { # Call function that will clean up temp files clean_up printf "Near line $1 in $PROGNAME: " # Check to see if the supplied error matches any predefined codes if [ "$2" == "1001" ];then printf "There has been an unknown fatal error.n" # A custom error message has been specified by the caller else printf "$2n" fi # We don't want to continue running with a fatal error exit 1 } # Function that handles telling the user about non-critical errors # that don't force an exit. It takes 2 arguments, a line number near # where the error occurred, and an error code / message telling what # happened. function nonfatal_err { printf "Near line $1 in $PROGNAME: " # Check to see if the supplied error matches any predefined codes if [ "$2" == "1001" ];then printf "Could not remove temp file.n" elif [ "$2" == "1002" ];then printf "There was an error in a pipe.n" elif [ "$2" == "1003" ];then printf "A file you tried to access doesn't exist.n" # A custom error message has been specified by the caller else printf "$2n" fi } # Function that handles propagating the SIGINT signal up to the parent # process, which in this case is assumed to be the shell. function propagate { echo "Caught SIGINT" #Propagate the signal up to the shell kill -s SIGINT $$ # 130 is the exit status from Ctrl-C/SIGINT exit 130 }

Listing 32 has 6 functions that are designed to handle various error related conditions. These functions are check_pipe, create_temp, clean_up, propagate, fatal_err, and nonfatal_err. The check_pipe function takes a list representing all the elements of the PIPESTATUS array variable, and steps through each item in the list to see if there was an error. If there was an error it throws a non-fatal error message, which could just as easily be a fatal error message that causes an exit. This makes it a little easier to check our pipes for errors without using set -o pipefail. This function could easily be modified to tell you which part of the pipe failed as well.

The create_temp function automates the process of creating “safe” temporary files for us. It gives preference to the user’s tmp directory, and uses the system /tmp directory if the user’s is not available. We’ll talk more about temporary file safety in the next blog post on security. The path/name of the temp file created is added to a global array so that it will be easier to remove it later on exit. Notice the use of the code block around the touch command that creates the temp file. It might have been easier to leave the brackets out and just put the || right after the touch statement, but I felt that the code block helped streamline the code a little bit. The || at the end of the code block causes our error handling code to be executed if there’s an error with the last command in the block.

The clean_up function steps through the file names in our array of temporary files and deletes them. This is meant to be called just before we exit the script so that we don’t leave any stray temp files laying around. The function checks to make sure that it doesn’t try to delete files that have already been removed. This is to prevent a warning from being displayed when we have an error, thus calling clean_up and then exit which also calls clean_up. There are other ways to handle this type of problem, but for our purposes the “skip if already deleted” method works fine. The propagate function uses the kill command to resend the INT signal on up to the shell, and then uses the exit command to set the exit status of the script to 130. This tells anyone checking the ? built-in variable that the script exited because of SIGINT.

The fatal_err and nonfatal_err functions are very similar, with the only difference being that fatal_err calls the clean_up function and exit command when it runs. Both functions take 2 arguments which are a line number and an error code or string. The line number is presumably the line near where the error occurred, but won’t be exact. It’s designed to get a shell script developer close enough to the error that they should be able to find it. The error code is a 4 digit number that’s used in an if statement (a case statement would be a little cleaner here) to see what error message should be given to the user. The else part of the statement allows the caller to provide their own custom error string. This way the caller isn’t stuck if they can’t find a code that fits their situation. If the script was going to see wide spread general use, it might be best to dump all of the error codes into a separate function that fatal_err and nonfatal_err could both call. That way you would have consistent and reusable error codes across all of the functions.

To make sure that the functions are called properly, the script defines several traps at the top. The ERR signal is used to catch any errors that we haven’t handled ourselves. These are treated as “unknown” fatal errors since we obviously didn’t see them coming. The HUP and TERM signals are trapped so that we have a chance to run our clean_up function before exiting. Keep in mind that the KILL signal cannot be trapped, so if somebody runs kill -9 on our script, we’re still going to be leaving temp files behind. The INT signal is trapped to give us a chance clean up as well, but we also take the opportunity to propagate the signal up to the shell. That way we’re not just absorbing SIGINT and not allowing the world around us to react to it. The final trap is set on the EXIT condition and is our last chance to make sure that the temp files have been removed.

Listing 33

#!/bin/bash -u # File: err_src_test.sh # Tests the modular error_source.sh script which holds error handling functions. # Include the modular error handling script so that we can use its functions. . error_source.sh # Use our function to create a random "safe" temp file create_temp # Be proactive in checking for problems like a file that doesn't exist if [ -e doesnotexist ] then ls doesnotexist else nonfatal_err $LINENO 1003 fi # Check a bad pipeline with a function we've created true|false|true # Error not caught because of last true PIPEST="${PIPESTATUS[@]}" check_pipe $LINENO "$PIPEST" # Check a good pipeline with the same function true|true|true|true PIPEST="${PIPESTATUS[@]}" check_pipe $LINENO "$PIPEST" # Generate a custom non-fatal error nonfatal_err $LINENO "This is a custom error message." # Generate an unhandled error false echo "The script shouldn't still be running here."

The Listing 33 implementation shows just a few ways to use the modular error handling script in one of your own scripts. The first thing that the script does is source the error_source.sh script so that it is treated like a part of our own. Once that’s done, the error handling functions can be called as if we had typed them directly into our script. That’s why we can call the create_temp function. Normally we would do something with the temporary file path/name that is created, but in this case I only want to create a temp file that can be removed later by the clean_up function. The next thing I do is be proactive in checking to see if a file/directory exists before I try to use it. If it doesn’t exist I throw a non-fatal error to warn the user. Normally you would want to throw a fatal error that would cause an exit here, but I want the script to fall all the way through to the last error so that the output in Listing 34 will be a little cleaner. Ultimately with this error handling method it’s your call on whether or not the script should exit on an error, but I would suggest erring on the side of exiting rather than letting the script continue with a potentially dangerous error in place.

The next section of Listing 33 has code that checks a pipeline with an error (the false in the middle), and after that there’s a check of a pipeline with no errors. This is done using the check_pipe function that we wrote earlier. You can see that I’ve basically converted the PIPESTATUS array elements into a string list before passing that to check_pipe. The list works a little more cleanly in the for loop that’s used to check each part of the pipeline.

Next, I’ve shown how to generate your own custom error by passing the nonfatal_err function a string instead of an error code. A custom string should fail all of the tests in the nonfatal_err if construct, causing the else to be triggered. This gives us the ability to create compact error handling code in our own scripts using error codes, but still gives us the flexibility to throw errors that haven’t been defined yet.

The last interesting thing that the script does is use the false command to generate an unhandled error which is caught by the ERR signal’s trap. You can see that even if we miss handling an error manually, it still gets caught overall. The drawback is that although the user gets a line number for the error, they are given a message telling them that and unknown error has occurred which doesn’t tell them very much. This is still preferable to letting your script run with an unhandled error though. The very last line of the script is just there to alert us that something very wrong has happened if our script reaches that point.

Listing 34 shows what happens when I run the script in Listing 33.

Listing 34

$./err_src_test.sh Near line 16 in err_src_test.sh: A file you tried to access doesn't exist. Near line 22 in err_src_test.sh: There was an error in a pipe. Near line 30 in err_src_test.sh: This is a custom error message. Near line 33 in err_src_test.sh: There has been an unknown fatal error.

If you have any additions or changes to the script(s) above don’t hesitate to tell us about it in the comments section. I would especially like to see what changes all of you would make to the script in Listing 32 to make it more useful and/or correct any flaws that it may have. Feel free to paste your updates to the code in the comments section.

Troubleshooting

This post was developed using BASH 4.0.x, so if you’re running an earlier version keep an eye out for subtle syntax differences and missing features. Post something in the comments section if you have any trouble so that we can try to help you out. Also, don’t forget to apply the debugging knowledge that you got from reading Post 1 in this series as you’re experimenting with these concepts.

Conclusion

As with shell script debugging, we can see that script error handling is a very in-depth subject. Unfortunately, error handling is often overlooked in shell scripts but is an important part of creating and maintaining production scripts. My goal with this post has been to give you a diverse set of tools to help you efficiently and effectively add error handling to your scripts. I know that opinions on this topic vary widely, so if you’ve got any suggestions or thoughts on the content of this post it would be great to hear from you. Leave a comment to let us know what you think. Thanks for reading.

Resources

Books

Links

  1. Linux Journal, May 2008, Work The Shell, By Dave Taylor, “Handling Errors and Making Scripts Bulletproof”, pp 26-27
  2. Writing Robust Shell Scripts – DavidPashley.com
  3. Linux Planet Article On Making Friendlier Error Messages
  4. Linux Planet Article With A Good Example Of A Modularized Error Handling Script
  5. Errors and Signals and Traps (Oh My!) – Part 1 By William Shotts, Jr.
  6. Errors and Signals and Traps (Oh My!) – Part 2 By William Shotts, Jr.
  7. Turnkey Linux Article With Good Discussion In Comments Section
  8. Script Error Handling Overview
  9. Article On The “Proper handling of SIGINT/SIGQUIT”
  10. Script Error Handling Slide Presentation (Download Link)
  11. General UNIX Scripting Guide With Error Handling By Steve Parker
  12. Some General Thoughts On Making Scripts Better And Less Error Prone
  13. OpenGroup.org Article On Scripting Including A Section On “Exit Status and Errors”
  14. A checkbashisms man Page Entry
  15. Common Shell Mistakes and Error Handling Article
  16. CSIRO Advanced Scientific Computing Article
  17. Opinions On Error Handling On stackoverflow
  18. A Way To Handle Errors Using Their Error Messages
  19. Simple BASH Error Handling
  20. BASH FAQ Including Broken Pipe Warning Information
  21. Linux Journal Article On Named Pipes
  22. Example Use Of command_not_found_handle

Writing Better Shell Scripts – Part 1

Quick Start

The information presented in this post doesn’t really lend itself to having a “Quick Start” section, but if you’re in a hurry we have a How-To section along with Video and Audio included with this post that may be a good quick reference for you. There are some really great general references in the Resources section that may help you as well.

Video

General Debugging

BASHDB Overview

Audio

Download

Preface

To make things easier on you, all of the black command line and script areas are set up so that you can copy the text from them. This does make using the commands and scripts easier, but if you’re not already familiar with the concepts presented here, typing things yourself and working through why you’re typing them will help you learn more. If you hit problems along the way, take a look at the Troubleshooting section near the end of this post for help.

There are formatting conventions that are used throughout this post that you should be aware of. The following is a list outlining the color and font formats used.

Command Name or Directory Path
Warning or Error
Command Line Snippet With Commands/Options/Arguments
Command Options and Their Arguments Only
Hyperlink

Overview

This post is the first in a series on shell script debugging, error handling, and security. Although I’ll be presenting some methodologies and techniques that apply to all shell languages (and most programming languages), this series will focus very heavily on BASH. Users of other shells like CSH will need to do some homework to see what information transfers and what does not.

One of the difficulties with debugging a shell script is that BASH typically doesn’t give you very much information to go on. You might get error output showing a line number, but that’s just the line where the shell became aware of the error, not necessarily the line where the error actually occurred. Add in a vague error message such as the one in Listing 1, and it gets difficult to tell what’s going on inside your script.

Listing 1

$ ./buggy_script.sh ./buggy_script.sh: line 23: syntax error: unexpected end of file

This post is written with the intent of giving you knowledge that will help when you see an error like the one in Listing 1 while trying to run a script. This type of error is just one of many errors that the shell may give you, and is more easily dealt with when you have a good understanding of scripting syntax and the debugging tools at your disposal.

Along with talking about debugging tools/techniques, I’m going to introduce a handy script debugger called BASHDB. BASHDB allows you to step through a script in much the same way as a program debugger like GNU’s GDB does with C code.

By the end of this post you should be armed with enough knowledge to handle the majority of debugging needs that you have. There’s a lot of information here, but taking the time to learn it will help make you more effective in your work with Linux.

Command Line Script Debugging

BASH has several command line options for debugging your shell scripts, and some of these are shown in Listing 2. These options will be applied to your entire script though, so it’s an all-or-nothing trade off. Later in this post I’ll talk about more selective methods of debugging.

Listing 2

-n Checks for syntax errors without executing the script (noexec). -u Causes an error to be thrown whenever you try to access a variable that has not been set (nounset). -v Sends all lines to standard error (stderr) as they are read, even comments. -x Turns on execution tracing (xtrace) which displays each command as it is executed.

All of the options in Listing 2 can be used just like options with other programs (bash -x scriptname), or with the built-in set command as shown later. With the -x option, the number of + characters before each of the lines of output denotes the subshell level. The more + characters there are, the further down into nested subshells you are. If there are no + characters at the start of the line, then the line is the normal output from the execution of the script. You can use the -x and -v options together for verbose execution tracing, but the amount of output can become a little overwhelming. Using the -n and -v options together provides a verbose syntax check without executing the script.

If you decide to use the -x and -v options together, it can be helpful to use redirection in conjunction with a pager like less, or the tee command to help you handle the information. The shell sends debugging output to stderr and the normal output to stdout, so you’ll need to redirect both of them if you want the full picture of what’s going on. To do this and use the less pager to handle the information, you would use a command line like bash -xv scriptname 2>&1 | less . Instead of seeing the debugging output scroll by in the shell, you’ll be placed into the less pager where you’ll have access to functions like scrolling and search. While using the pager in this way, it’s possible that you may get an error like Broken pipe if you exit the pager before the script is done executing. This error has to do with the script trying to write output to something (less) that’s no longer there, and in this case can be ignored.

If you would prefer to redirect the debugging output to a file for later review and/or processing, you can use tee: bash -xv scriptname 2>&1 | tee scriptname.dbg . You will see the debugging output scroll by on the screen, but if you check the current working directory you will also find the scriptname.dbg file which holds the redirected output. This is what the tee command does for you. It allows you to send the output to a file while still displaying it on the screen. If the script will take awhile to run you can alter the redirection operator slightly, put the script in the background, and then use tail -f scriptname.dbg to follow the updates to the file. You can see this in action in Listing 3, where I’ve created a script that runs in an infinite loop (the code is incorrect on purpose) generating output every 20 seconds. I start the script in the background, redirecting the output to the infinite_loop.dbg file only (not to the screen too). I then start the tail -f command to follow the file for a few iterations, and then hit Ctrl-C to interrupt the tail command. Once you understand how to redirect the debugging output in this way, it’s fairly easy to figure out how to split the debugging and regular output into separate files.

Listing 3

$ bash -xv infinite_loop.sh &> infinite_loop.dbg & [1] 9777 $ tail -f infinite_loop.dbg num=0 + num=0 while [ $num -le 10 ] do sleep 2 echo "Testing" done + '[' 0 -le 10 ']' + sleep 2 + echo Testing Testing + '[' 0 -le 10 ']' + sleep 2 ^C

Internal Script Debugging

This section is called “Internal Script Debugging” because it focuses on changes that you make to the script itself to add debugging functionality. The easiest change to make in order to enable debugging is to change the shebang line of the script (the first line) to include the shell’s normal command line switches. So, instead of a shebang line like #!/bin/bash - you would have #!/bin/bash -xv. There are also both external and built-in commands for the BASH shell that make it easier for you to debug your code, the first of which is set.

The set command allows you to set shell options while your script is running. The options of the most interest for our purposes are the ones from Listing 2. For example, you can enclose sections of your script between the set -x and set +x command lines. By doing this you enable debugging for only the section of code within those lines, giving you control over what specific section of the script is debugged. Listing 4 shows a very simple script using this technique, and Listing 5 shows the script in action.

Listing 4

#!/bin/bash - # File: set_example.sh echo "Output #1" set -x #Debugging on echo "Output #2" set +x #Debugging off echo "Output #3"

Listing 5

$ ./set_example.sh Output #1 + echo 'Output #2' Output #2 + set +x Output #3

As you can see, the debugging output looks like you started the script with the bash -x command line. The difference is that you get to control what is traced and what is not, instead of having the execution of the whole script traced. Notice that the command to disable execution tracing (set +x) is included in the execution trace. This makes sense because execution tracing is not actually turned off until after the set +x line is done executing.

Output statements (echo/print/printf) are useful for getting information from your script at specific points. You can use output statements to track the progression of logic throughout your script by doing things like evaluating variable values and shell expansions, and finding infinite loops. Another advantage of using output statements is that you can control the format. When using command line debugging switches you have little or no control over the format, but with echo, print, and printf, you have the opportunity to customize the output to display in a way that makes sense to you.

You can utilize a DEBUG function to provide a flexible and clean way to turn debugging output on and off in your script. Listing 6 shows the script in Listing 4 with the addition of the DEBUG function, and Listing 7 shows one way to switch the debugging on and off from the command line using a variable.

Listing 6

#!/bin/bash - # File: func_example.sh # This function can be used to selectively enable/disable debugging. # Use with the set command to debug sections of the script. function DEBUG() { # Check to see if the enable debugging variable is set if [ -n "${DEBUG_ENABLE+x}" ] then # Run whatever command/option/argument combo that was # passed to our DEBUG function. $@ fi } echo "Output #1" DEBUG set -x #Debugging on echo "Output #2" DEBUG set +x #Debugging off echo "Output #3"

Listing 7

$ ./func_example.sh #Without debugging Output #1 Output #2 Output #3 $ DEBUG_ENABLE=true ./func_example.sh #With debugging Output #1 + echo 'Output #2' Output #2 + DEBUG set +x + '[' -n x ']' + set +x Output #3

The DEBUG function treats the rest of the line after it as an argument. If the DEBUG_ENABLE variable is set, the DEBUG function will output it’s argument (the rest of the line) as a command via the $@ operator. So, any line that has DEBUG in front of it can be turned on or off by simply setting/unsetting one variable from the command line or inside your script. This method gives you a lot of flexibility in how you set up debugging in your script, and allows you to easily hide that functionality from your end users if needed.

Instead of requiring a user to set an environment variable on the command line to enable debugging, you can add command line options to your script. For instance, you could have the user run your script with a -d option (./scriptname -d) in order to enable debugging. The mechanism that you use could be as simple as having the -d option set the DEBUG_ENABLE variable inside of the script. An example of this, with the addition of multiple debugging levels, can be seen in the Scripting section.

Another technique that you can use to track down problems in your script is to write data to temporary files instead of using pipes. Temp files are many times slower than pipes though, so I would use them sparingly and in most cases only for temporary debugging. There is a Linux Journal article by Dave Taylor (April 2010) referenced in the Resources section that talks about using temporary files in the article’s script. In a nutshell, you replace the pipe operator (|) with a redirection to file (> $temp), where $temp is a variable holding the name of your temporary file. You read the temporary file back into the script with another redirection operator (< $temp). This allows you to examine the temporary file for errors in the script’s pipeline. Listing 8 shows a very simplified example of this.

Listing 8

#!/bin/bash - # Set the path and filename for the temp file temp="./example.tmp" # Dump a list of numbers into the temp file printf "1n2n3n4n5n" > $temp # Process the numbers in the temp file via a loop while read input_val do # We won't do any real work, just output the values echo $input_val done < $temp # Feeds the temp file into the loop # Clean up our temp file rm $temp

The last debugging technique that I'm going to touch on here is writing to the system log. You can use the logger command to write debugging output to /var/log/messages, or another file if you use the -f option. I consider this technique to be primarily for production scripts that have already been released to your users, and you don't want to abuse this mechanism. Flooding your system log with script debugging messages would be counter productive for you and/or your system administrator. It's best to only log mission critical messages like warnings or errors in this way.

To use the logger command to help track script debugging information, you would just add a line like logger "${BASH_SOURCE[0]} - My script failed somewhere before line $LINENO." to your script. The line that this adds in the system log looks like the output line in Listing 9. There are a couple of variables that I've thrown in here to make my entry in the system log more descriptive. One is BASH_SOURCE, which is an array that in this case holds the name and path of the script that logged the message. The other is LINENO, which holds the current line number that you are on in your script. There are several other useful environment variables built into the newer versions of BASH (>= 3.0). Some of these other variables (all arrays) include BASH_LINENO, BASH_ARGC, BASH_ARGV, BASH_COMMAND, BASH_EXECUTION_STRING, and BASH_SUBSHELL. See the BASH man page for details.

Listing 9

$ tail -1 /var/log/messages May 28 14:35:35 testhost jwright: ./logger_test.sh - My script failed somewhere before line 11.

Introducing BASHDB

As I mentioned before, BASHDB is a debugger that does for BASH scripts what GNU's GDB does for C/C++ programs. BASHDB can do a lot, and it has four main features to help you eliminate errors from your scripts. First, It can start a script with options, arguments, and anything else that might affect its operation. Second, it allows you to set conditions on which a script will stop. Third, it gives you the ability to examine what's going on at the point in a script where it's stopped. Fourth, BASHDB allows you to manipulate things like variable values before telling the script to move on.

You can type bashdb scriptname to start BASHDB and set it to debug the script scriptname. Listing 10 shows a couple of useful options for the bashdb program.

Listing 10

-X Traces the entire script from beginning to end without putting bashdb in interactive mode. Notice that it's capital X, not lowercase. -c Tests/traces a single string command. For example, "bashdb -c ls *" will allow you to step through the command string "ls *" inside the debugger.

In order to show where you're at, BASHDB displays the full path and current line number of the running script above the prompt. In interactive mode, the prompt BASHDB gives you looks something like bashdb<(1)> where 1 is the number of commands that have been executed. The parentheses around the command number denote the number of subshells you are nested within. The more parentheses there are, the deeper into subshells you are nested. Listing 11 gives a decent command reference that you can use when debugging scripts at the BASHDB interactive mode prompt.

Listing 11

- Lists the current line and up to 10 lines that came before it. backtrace Abbreviated "T". Shows the trace of calls including things like functions and sourced files that have brought the script to where it is now. You can follow "backtrace" with a number, and only that number of calls will be shown. break Abbreviated "b". Sets a persistent breakpoint at the current line unless followed by a number, in which case a breakpoint is set at the line specified by the number. See the "continue" command for a shortcut to specifying the line number. continue Abbreviated "c". Resumes execution of the script and moves to the next stopping point or breakpoint. If followed by a number, "continue" works in a similar way as issuing the "break" command followed by the number and then the continue command. The difference is that "continue" sets a one time breakpoint whereas "break" sets a persistent one. edit Opens the text editor specified by the EDITOR environment variable to allow you make and save changes to the current script. Typing "edit" by itself will start editing on the current line. If "edit" is followed by a number, editing will start on the line specified by that number. Once you're done editing you have to type "restart" or "R" to reload and restart the script with your changes. help Abbreviated "h". Lists all of the commands that are available when running in interactive mode. When you follow "help" or "h" with a command name, you are shown information on that command. list Abbreviated "l". Lists the current line and up to 10 lines that come after it. If followed by a number, "list" will start at the specified line and print the next 10 lines. If followed by a function name, "list" starts at the beginning of the function and prints up to 10 lines. next Abbreviated "n". Moves execution of the script to the next instruction, skipping over functions and sourced files. If followed by a number, "next" will move that number of instructions before stopping. print Abbreviated "p". When followed by a variable name, prints the value of a specified variable. Example: print $VARIABLE quit Exits from BASHDB. set Allows you to change the way BASH interacts with you while running BASHDB. You can follow "set" with an argument and then the words "on" or "off" to enable/disable a feature. Example: "set linetrace on". step Abbreviated "s". Moves execution of the script to the next instruction. "step" will move down into functions and sourced files. See the "next" command if you need behavior that skips these. If followed by a number, "step" will move that number of instructions before stopping. x Similar to the "print" command, but more powerful. Can print variable and function definitions, and can be used to explore the effects of a change to the current value of a variable. Example: "x n-1" subtracts 1 from the variable "n" and displays the result.

Normally when you hit the Enter/Return key without entering a command, BASHDB executes the next command. This behavior is overridden though when you have just run the step command. Once you've run step, pressing the Enter/Return key will re-execute step. The rest of the operation of BASHDB is fairly straight forward, and I'll run through an example session in the How-To section.

If you're a person who prefers to use a graphical interface, have a look at GNU DDD. DDD is a graphical front end for several debuggers including BASHDB, and includes some interesting features like the ability to display data structures as graphs.

How-To

If you've been reading this post straight through, you can see that there are a lot of script debugging tools at your disposal. In this section, I'm going to go through a simple example using a few of the different methods so that you can see some practical applications. Listing 12 shows a script that has several bugs intentionally added so that we can use it as our example.

Listing 12

#!/bin/bash - # buggy_script.sh is designed to help us learn about # shell script debugging # if [-z $1 ] # Space left out after first test bracket then echo "TEST" #fi #The closing fi is left out # Use of uninitialized variable echo "The value is: $VALUE1" # Infinite loop caused by not incrementing num num=0 while [ $num -le 10 ] do sleep 2 echo "Testing" done

When I try to run the script for the first time I get the same error that we got in Listing 1. The first thing that I'm going to do is use the -x and -u options of BASH to run the script with extra debugging output (bash -xu ./buggy_script.sh). When I rerun the script this way, I see that I don't really gain anything because BASH detects the unexpected end of file bug before it even tries to execute the script. The line number isn't any help either since it just points me to the very last line of the script, and that's not very likely to be where the error occurred. I'll run into the same problems if I try to run the script with BASHDB as well.

I remember that the rule of thumb with unexpected end of file errors is that they usually mean that I've forgotten to close something out. It could be an if statement without a fi at the end, a case statement that's missing an esac or ;;, or any number of other constructs that require closure. When I start looking through the script I notice that my if statement is missing a fi, so I add (uncomment) that. This particular bug teaches us an important lesson - that there will always be some errors that will require us to do some digging on our own. We may be able to use our debugging techniques to get us close to the error, but in the end we have to know
the language well enough to be able to spot syntax errors. Once I add the fi statement, I'm ready to rerun the script. The second time the script runs, I get an unbound variable error.

Listing 13

$ bash -xu ./buggy_script.sh ./buggy_script.sh: line 6: $1: unbound variable

You can see in the error that a command line argument ($1) is unbound. This tells me that I forgot to add an argument after ./buggy_script.sh . I end up with the command line bash -xu ./buggy_script.sh testarg1 which gives me the next two errors shown in Listing 14.

Listing 14

$ bash -xu ./buggy_script.sh testarg1 + '[-z' testarg1 ']' ./buggy_script.sh: line 6: [-z: command not found ./buggy_script.sh: line 12: VALUE1: unbound variable

Execution tracing shows me that the last command executed is [-z' testarg1 '] . The first error tells me that for some reason the start of the test statement ([-z) is being treated as a command. I think about it for a second and remember that there has to be a space between test brackets and what they enclose. The statement [-z $1 ] should read [ -z $1 ] . Since I try to focus on one error at a time, I fix the test statement and rerun the script. The first error from Listing 14 goes away, but the second error remains. You can see that it's another unbound variable error, but this time it's referencing a variable that I created and not a command line argument. The problem is that I use the variable VALUE1 in an echo statement before I've even set a value for it. In this case that would just leave a blank at the end of the echo statement, but in some cases it can cause more serious problems. This is what using the -u option of BASH does for you. It warns you that a variable doesn't have a value before you try to use it. To correct this error, I add a statement right above the echo line that sets a value for the variable (VALUE1="1").

After fixing the above errors and rerunning the script, everything seems to work fine. The only problem is that even though I set the while loop up to quit after the variable num gets to 10, the loop doesn't exit. It seems that I have an infinite loop problem. This loop is simple enough that you can probably just glance at it and see the problem, but for the sake of the example we're going to take the long way around. I add an echo statement (echo "num Value: $num") to show me the value of the num variable right above the sleep 2 line. When I run the script again without the BASH -x option (to cut out some clutter), I get the output shown in Listing 15.

Listing 15

$ bash -u ./buggy_script.sh testarg1 The value is: 1 num Value: 0 Testing num Value: 0 Testing num Value: 0

You can see that the output from the echo statement I added is always the same (num Value: 0). This tells me that the value of num is never incremented and so it will never reach the limit of 10 that I set for the while loop. The fix is to use arithmetic expansion to increment the num variable by 1 each time around the while loop: num=$((num+1)) . When I run the script now, num increments like it should and the script exits when it's supposed to. With this bug fixed, it looks like we've eliminated all of the errors from our script. The finalized script with the num evaluation echo statement removed can be seen in Listing 16.

Listing 16

#!/bin/bash - # buggy_script.sh is designed to help us learn about # shell script debugging. if [ -z $1 ] # Space added after first test bracket then echo "TEST" fi #The closing fi was added # Set a value for our variable VALUE1="1" # Use of initialized variable echo "The value is: $VALUE1" # Finite loop caused by incrementing num num=0 while [ $num -le 10 ] do sleep 2 echo "Testing" num=$((num+1)) done

Now I'll walk you through correcting the same buggy script using BASHDB. As I said above, the unexpected end of file error is best solved by applying your understanding of shell scripting syntax. Because of this, I'm going to start debugging the script right after we notice and fix the unclosed if statement. To start the debugging process, I use the line bashdb ./buggy_script.sh to launch BASHDB and have it start to step through the script. If you compiled BASHDB from source and haven't installed it, you'll need to adjust the paths in the command line accordingly.

BASHDB starts the script and then stops at line 7, the if statement. I then use the step command to move to the next instruction and get the output in Listing 17.

Listing 17

$ bashdb ./buggy_script.sh bash Shell Debugger, release 4.0-0.4 Copyright 2002, 2003, 2004, 2006, 2007, 2008, 2009 Rocky Bernstein This is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. (/home/jwright/Documents/Scripts/Learning/buggy_script.sh:7): 7: if [-z $1 ] # Space left out after first test bracket bashdb<0> step ./buggy_script.sh: line 7: [-z: command not found (/home/jwright/Documents/Scripts/Learning/buggy_script.sh:13): 13: echo "The value is: $VALUE1"

Notice that until I run the step command, BASHDB doesn't give me an error for line 7. That's because it has stopped on the line 7 instruction, but hasn't executed it yet. When I step through that instruction and on to the next one, I get the same error as the BASH shell gives us ([-z: command not found). As before, we realize that we've left a space out between the test bracket and the statement. To fix this, I type the edit command to open the script in the text editor specified by the EDITOR environment variable. In my case this is vim. I have to type visual to go to normal mode, and then I'm able to edit and save my changes to the script like I would in any vi/vim session. With the space added, I save the file and exit vim which puts me back at the BASHDB prompt. I type the R character and hit the Enter/Return key to restart the script, which also loads my changes. I end up right back at line 7 again.

This time when I use the step command, BASHDB moves past the if statement and stops right before executing line 13 (the next instruction). Everything looks good, so I use the step command again by simply hitting the Enter/Return key. The output in Listing 18 is what I see.

Listing 18

bashdb<1> edit bashdb<2> R Restarting with: /usr/local/bin/bashdb ./buggy_script.sh bash Shell Debugger, release 4.0-0.4 Copyright 2002, 2003, 2004, 2006, 2007, 2008, 2009 Rocky Bernstein This is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. (/home/jwright/Documents/Scripts/Learning/buggy_script.sh:7): 7: if [ -z $1 ] # Space left out after first test bracket bashdb<0> step (/home/jwright/Documents/Scripts/Learning/buggy_script.sh:13): 13: echo "The value is: $VALUE1" bashdb<1> The value is: (/home/jwright/Documents/Scripts/Learning/buggy_script.sh:16): 16: num=0

We see that the echo statement ends up not having any text after the colon, which is not what we want. What I'll do is issue an R (restart) command and then step back to line 13 so that I can check the value of the variable. Once I'm back at the echo statement on line 13, I use the command print $VALUE1 to inspect the value of that variable. A snippet of the output from the print command is in Listing 19.

Listing 19

7: if [ -z $1 ] # Space left out after first test bracket bashdb<0> step (/home/jwright/Documents/Scripts/Learning/buggy_script.sh:13): 13: echo "The value is: $VALUE1" bashdb<1> print $VALUE1 bashdb<2>

There's a blank line between the bashdb<1> print $VALUE1 and bashdb<2> lines. This tells me that there is definitely not a value (or there's a blank string) set for the VALUE1 variable. To correct this I go back into edit mode, and add the variable declaration VALUE1="1" just above our echo statement. I follow the same edit, save, exit, restart (with the R character) routine as before, and then step down through the echo statement again.

This time the output from the echo statement is The value is: 1 which is what we would expect. With that error fixed, we continue to step down through the script until we realize that we're stuck in our infinite while loop. We can use the print statement here as well, and with the line print $num we see that the num variable is not being incremented. Once again, we enter edit mode to fix the problem. We add the statement num=$((num+1)) at the bottom of our while loop, save, exit, and restart. We now see that the num variable is incrementing properly and that the loop will exit. We can type the continue command to let the loop finish without any more intervention.

After the script has run successfully, you'll see the message Debugged program terminated normally. Use q to quit or R to restart. If you haven't been adding comments as you go, it would be a good idea at this point to re-enter edit mode and add those comments to any changes that you made. Make sure to run your script through one more time though to make sure that you didn't break anything during the process of commenting.

That's a pretty simple BASHDB session, but my hope is that it will give you a good start. BASHDB is a great tool to add to your shell script development toolbox.

Tips and Tricks

  • If you're like many of us, you may have trouble with quoting in your scripts from time to time. If you need a hint on how quoted sections are being interpreted by the shell, you can replace the command that's acting on the quoted section with the echo command. This will give you output showing how your quotes are being interpreted. This can also be a handy trick to use when you need insight into other issues like shell expansion too.
  • If you don't indent temporary (debugging) code, it will be easier to find in order to remove it before releasing your script to users. If you don't already make a habit of indenting your scripts in the first place, I would recommend that you start. It greatly increases the readability, and thus maintainability, of your scripts.
  • You can set the PS4 environment variable to include more information with the shell's debugging output. You can add things like line numbers, filenames, and more. For example, you would use the line export PS4='$LINENO ' to add line numbers to your script's debugging output. The creator of the bashdb script debugger sets the PS4 variable to (${BASH_SOURCE}:${LINENO}): ${FUNCNAME[0]} - [${SHLVL},${BASH_SUBSHELL}, $?] which gives you very detailed information about where you're at in your script. You can make this change to the variable permanent by adding an export declaration to one of your bash configuration files.
  • Make sure to use unique names for your shell scripts. You can run into problems if you name your shell script the same as a system or built-in command (i.e. test). I like to make my shell script names distinctive, and for added protection I almost always add a .sh extension onto the end of the filename.

Scripting

These scripts are somewhat simplified and in most cases could be done other ways too, but they will work to illustrate the concepts. If you use these scripts, make sure you adapt them to your situation. Never run a script or command without understanding what it will do to your system.

Our first script example is going to have two separate parts to it. The first is a script in which we've enclosed our debugging functionality from above. This is a case where it's helpful to create modular code so that other scripts can add debugging functionality simply by sourcing one file. That way you're not duplicating code needlessly for commonly used functionality. The second script implements the debugging script, and uses a command line option (-d) to enable debugging. The script also uses multiple debugging levels to allow the user to control how verbose the output is by passing an argument to the -d option.

Listing 20

#!/bin/bash - # File:debug_module.sh # Holds common script debugging functionality # Set the PS4 variable to add line #s to our debug output PS4='Line $LINENO : ' # The function that enables the enabling/disabling of # debugging in the script, and also takes the user # specified debug level into account. # 0 = No debugging # 1 = Debug executed statements only # 2 = Debug all lines and executed statements function DEBUG() { # We need to see what level (0-2) of debugging is set if [ "$1" = "0" ] #User disabled debugging then echo "Debugging Off" set +xv # Set the variable that tracks the debugging state _DEBUG=0 elif [ "$1" = "1" ] #User wants minimal debugging then echo "Minimal Debugging" set -x # Set the variable that tracks the debugging state _DEBUG=0 elif [ "$1" = "2" ] #User wants maximum debugging then echo "Maximum Debugging" set -xv # Set the variable that tracks the debugging state _DEBUG=0 else #Run/suppress a command line depending on debug level # If debugging is turned on, output the line # that this function was passed as a parameter if [ $_DEBUG -gt 0 ] then $@ fi fi }

This script has two main purposes. One is to set the PS4 variable so that line numbers are added to the debugging output to make it easier to trace errors. The other is to provide a function that takes an argument of either a number (0-2), or a command line and then decides what to do with it. If the argument is a number from 0 to 2, the function sets a debugging level accordingly. Level 0 turns off all debugging (set +xv), level 1 turns on execution tracing only (set -x), and level 2 turns on execution tracing and line echoing (set -xv). Anything else that is passed to the function is treated as a command line that is either run or suppressed depending on what the debugging level is.

As always, there are many ways to improve this script. One would be to add more debugging levels to it. I created three (0-2), which accommodated only the -x and -v options. You could add another level for the -u option, or create your own custom levels. Listing 21 shows an implementation of our simple modular debugging script.

Listing 21

#!/bin/bash - # File: debug_module_test.sh # Used as a test of the debug_module.sh script # Source the debug_module.sh script so that its # function(s) will be used as this script's own . ./debug_module.sh # Parse the command line options and set this script up for use while getopts "d:h" opt do case $opt in d) _DEBUG=$OPTARG # Enable debugging DEBUG $_DEBUG ;; h) echo "Usage: $0 [-dh]" #Give the user usage info echo " -d Enables debugging mode" echo " -h Displays this help message" exit 0 ;; '?') echo "$0: Invalid Option - $OPTARG" echo "Usage: $0 [-dh]" exit 1 ;; esac done # Begin our test statements DEBUG echo "Debugging 1" DEBUG echo "Debugging 2" echo "Regular Output Line" # Turn debugging off DEBUG 0 # Test to make sure debugging is off DEBUG echo "Debugging 3" # You can also create your own custom debugging output sections _DEBUG=2 #Manually set debugging back to max for last section [ $_DEBUG -gt 0 ] && echo "First debugging level" [ $_DEBUG -gt 1 ] && echo "Second debugging level"

The first statement that you see in the Listing 21 script is a source statement reading the modular debugging script (debug_module.sh). This treats the debugging script as if it was part of the script we're currently running. The next major section that you see is the while loop that parses the command line options and arguments. The main option to be concerned with is "d", since it's the one that enables or disables debugging output. The getopts command requires the -d option to have an argument on the command line via the getopts "d:h" statement. The user passes a 0, 1, or 2 to the option and that in turn sets the debugging level via the _DEBUG variable and the DEBUG function. The DEBUG function is called 4 more times throughout the rest of the script. Three of those times it is used as a switch to run or suppress a line of the script, and once it is used to reset the debugging level to 0 (debugging off).

The last three lines of the script are a little different. I put them in there to show how you could implement your own custom debugging functionality. In the first of those lines, the _DEBUG variable is set to 2 (maximum debugging output). The next two lines are used to select how much debugging output you see. When you set _DEBUG to 1, the line "First debugging level" is output. If you set _DEBUG to 2 as in the script, the conditions for both the "First debugging level" (> 0) and the "Second debugging level" (> 1) statements are met, so both lines are output. Listing 22 shows the output that you get from running this script, and if you look at the bottom you'll see that the lines "First debugging level" and "Second debugging level" are output.

Listing 22

$ ./debug_module_test.sh -d 1 Minimal Debugging Line 29 : _DEBUG=0 Line 11 : getopts d:h opt Line 30 : DEBUG echo 'Debugging 1' Line 18 : '[' echo = 0 ']' Line 24 : '[' echo = 1 ']' Line 30 : '[' echo = 2 ']' Line 39 : '[' 0 -gt 0 ']' Line 32 : DEBUG echo 'Debugging 2' Line 18 : '[' echo = 0 ']' Line 24 : '[' echo = 1 ']' Line 30 : '[' echo = 2 ']' Line 39 : '[' 0 -gt 0 ']' Line 34 : echo 'Regular Output Line' Regular Output Line Line 37 : DEBUG 0 Line 18 : '[' 0 = 0 ']' Line 20 : echo 'Debugging Off' Debugging Off Line 21 : set +xv First debugging level Second debugging level

This next script is somewhat like an automated unit test. It's a wrapper script that automatically runs another script with varying combinations of options and arguments so that you can easily look for errors. It takes some time up front to create this script, but it allows you to quickly test how any changes you make to a test script might cause problems for the end user. It could take a lot of time to step through and test all of the option/argument combinations manually on a complex script, and with that extra work (if we're honest) this test might get left out all together. That's where the automation of the script in Listing 23 comes in.

Listing 23

#!/bin/bash - # File unit_test.sh # A wrapper script that automatically runs another script with # a varying combination of predefined options and arguments, # to help find any errors. # Variables to make the script a little more readable. _TESTSCRIPT=$1 #The script that the user wants to test _OPTSFILE=$2 #The file holding the predefined options _ARGSFILE=$3 #The file holding the predefined arguments # Read the options and arguments from their files into arrays. _OPTSARRAY=($(cat $_OPTSFILE)) _ARGSARRAY=($(cat $_ARGSFILE)) # The string that holds the option/argument combos to try. _TRIALSTRING="" # Step through all of the arguments one at a time. for _ARG in ${_ARGSARRAY[*]} do # The string of multiple command line options that we'll # build as we step through the available options. _OPTSTRING="" # Step through all of the options one at a time. for _OPT in ${_OPTSARRAY[*]} do # Append the new option onto the multi-option string. _OPTSTRING="${_OPTSTRING}$_OPT " # Accumulate the command lines that will be tacked onto # the command as we're testing it. _TRIALSTRING="${_TRIALSTRING}${_OPT} $_ARGn" #Single option _TRIALSTRING="${_TRIALSTRING}${_OPTSTRING}$_ARGn" #Multi-option done done # Change the Internal Field Separator to avoid newline/space troubles # with the command list array assignment. IFS=":" # Sort the lines and make sure we only have unique entries. This could # be taken care of by more clever coding above, but I'm going to let # the shell do some extra work for me instead. An array is used to hold # the command lines. _CLIST=($(echo -e $_TRIALSTRING | sort | uniq | sed '/^$/d' | tr "n" ":")) # Step through each of the command lines that were built. for _CMD in ${_CLIST[*]} do # We can pipe the full concatenated command string into bash to run it. echo $_TESTSCRIPT $_CMD | bash done

There are two files that I created to go along with this test script. The first is sample_opts, which holds a single line of possible options separated by spaces (-d -v -q). These options stand for debugging mode, verbose mode, and quiet mode respectively. The second file that I create is sample_args, which contains two possible arguments separated by a space (/etc/passwd /etc/shadow). I'll run our unit_test.sh script by passing it the name of the script to test, the sample_opts argument, and the sample_args argument. For this example, it really doesn't matter what the test script (./test_script.sh) is designed to do. We just provide the options and arguments that we want to test, and that's all the unit_test.sh script needs to know. Listing 24 shows what happens when I run the test.

Listing 24

$ ./unit_test.sh ./test_script.sh sample_opts sample_args Debug mode Debug mode Debug mode Verbose mode Debug mode Verbose mode Debug mode Verbose mode Quiet mode The -v and -q options are conflicting. Debug mode Verbose mode Quiet mode The -v and -q options are conflicting. Quiet mode Quiet mode Verbose mode Verbose mod

Notice that the output from the unit test script shows that the -v and -q options cause a conflict. I have hard coded that error in the test script for clarity, but in everyday use you would have to look for things like real errors or output that doesn't match what is expected. The error about the -v and -q options makes sense in this case because you wouldn't want to run verbose (chatty) mode and quiet (non-chatty) mode at the same time. They are mutually exclusive options that should not be used together. This unit test script not only finds errors that I may miss with manual inspection, it allows you to easily recheck your script whenever you make a change, and ensures that your script is checked the same way every time.

There are a lot of improvements that can be made to this unit test script. For starters, the script doesn't check every possible combination of options. It's limited by the order that the options are in the sample_opts file. The script never reorders those options. Another improvement would be to have the script automatically check for common errors like illegal option, file not found, etc. As it stands now though, you can pipe the output of the script to grep in order to look for a specific error yourself.

Troubleshooting

The version of BASHDB that came with my chosen Linux distribution had a bug causing an error when a BASHDB function tried to return the value of -1. The problem went away though once I downloaded and compiled the latest version straight from the BASHDB website.

If a script you're debugging causes BASHDB to hang, you can try the CTRL+C key combination. This should exit from the script you're debugging and return you to the BASHDB prompt.

Conclusion

There are quite a few tools and methods at your disposal when debugging scripts. From BASH command line options, to a full debugger like BASHDB, to your own custom debugging and test scripts, there's a lot of room for creativity in making your scripts more error free. Better and more thorough debugging of your scripts from the outset will help lessen problems down the line, reducing down time and user frustration. In the future, I'll talk about handling runtime errors and security as the next steps in ensuring the quality and reliability of your shell scripts. Look for another post in this series soon.

Resources

  1. Expert Shell Scripting (Expert's Voice in Open Source) Book
  2. Learning the bash Shell: Unix Shell Programming (In a Nutshell (O'Reilly))
  3. BigAdmin Community Debugging Tip
  4. Shell Script Debugging Gotchas
  5. NixCraft Debugging Article
  6. Linux Journal, April 2010, Work The Shell, By Dave Taylor, "Our Twitter Autoresponder Goes Live!", pp 24-26
  7. The Linux Documentation Project Debugging Article
  8. BASHDB Homepage
  9. BASHDB Documentation
  10. Line Number Output In set -x Debugging Output
  11. 6 Cool Bash Tricks Article
  12. Using VIM as a BASH IDE
  13. General BASH Debugging Info
  14. Good Debugging Reference With Sample Error-Filled Scripts
  15. Good Debugging Tips Page By Bash-Hackers
  16. Modularizing The Debug Function To A Separate Script