Perl is a simple, yet powerful, scripting language that can be very useful for automating repetitive tasks, processing text and acting as a glue between other programs (e.g. as a job submission script). Perl is a well-established language, with the current version (version 5) released in 1994. Version 5 is still the standard version of the language, and it is installed by default on nearly all modern UNIX systems. Perl is also available for OS X and Windows.
Perl was first released in 1987. You can read about the history of Perl at its Wikipedia page or by going to one of the many Perl websites. The best books to learn about Perl are Programming Perl (good for the absolute beginner - I used it to learn Perl!) and Learning Perl (good for someone who can already program in another language).
This is a short course that will provide you with a quick taste of Perl. Please work through this course at your own pace. Perl is best learned by using it, so please copy out and play with the examples provided, and also have a go at the exercises.
This course is a mirror of the Python course, with exactly the same pages, examples and exercises. If you want to compare Python with Perl, then please click the Compare with Python links.
You write Perl using a simple text editor, like pico. Log on to a UNIX computer and use a text editor to open a file called script.pl, e.g.
$ pico script.pl
Perl scripts traditionally end in .pl. This isn't a requirement, but it does make it easier to recognise the file.
Now type the following into the file;
print "Hello from Perl!\n";
Save the file. You have just written a simple Perl script! To run it, type
$ perl script.pl
(note that the $ sign here indicates that the above is a command that you have to type in the shell - you don't need to type the $ sign itself - just type perl script.pl)
This line uses the Perl interpreter (called perl) to read your perl script and to follow the instructions that it finds. In this case you have told Perl to print to the screen the line "Hello from Perl!". The \n represents a return (newline). Try removing the \n, or adding multiple \n's and rerunning the script to see what I mean.
This was a simple script, but Perl is a language designed to help you write small and simple scripts. Indeed, in my opinion Perl is the best language around for writing small and simple scripts (less than 100 lines of code).
This script has introduced three of the basic building blocks of Perl;
$ pico variables.pl
Type into the script the following lines (remember to include the semicolons at the end of each line!);
$a = "Hello"; $b = "from"; $c = "Perl!"; print "$a $b $c\n";
What do you think will be printed when you run this script? Run the script by typing;
$ perl variables.pl
Did you see what you expected? In this script we created three variables, $a, $b and $c. The line $a = "Hello"; sets the variable $a equal to the string Hello. $b is set equal to the string from while $c is set equal to Perl!.
The last line is interesting! The print command prints the string that follows it. In this case the string is equal to "$a $b $c\n". However, Perl knows that $a, $b and $c are variables, so it substitutes their values into this string (so $a is replaced by its value, Hello, $b is replaced with from and $c is replaced with Perl!). Thus the print command prints the string Hello from Perl!\n to the screen.
Perl can also put numbers into variables. Create a new script (numbers.pl) and write this;
$x = 5; $pi = 3.14159265; $n = -6; $n_plus_one = $n + 1; $five_times_x = 5 * $x; $pi_over_two = $pi / 2; print "x equals $x. pi equals $pi. n equals $n.\n"; print "Five times x equals $five_times_x.\n"; print "pi divided by two equals $pi_over_two.\n"; print "n plus one equals $n_plus_one.\n";
What do you think will be printed to the screen when you run this script?
Run this script (perl numbers.pl). Did you see what you expected?
A Perl script is a file that contains instructions to the perl interpreter, with one instruction per line, that are read one at a time from the top of the script to the bottom. You can, however, divert this flow using a loop. Open a new Perl script loop.pl and write this;
for ($i = 1; $i <= 10; $i = $i + 1)
{
$five_times_i = 5 * $i;
print "5 times $i equals $five_times_i\n";
}
What do you think will be printed to the screen? Run the script ($ perl loop.pl). Did you see what you expected?
This script has introduced a for loop. The loop has five parts;
Loops are very powerful. For example;
for ($i = 0; $i <= 200; $i = $i + 2)
{
print "$i\n";
}
prints all of the even numbers from 0 to 200.
for ($i = 10; $i > 0; $i = $i - 1)
{
print "$i...\n";
}
print "We have lift off!\n";
prints out a count down.
for ($i = 1; $i <= 3; $i = $i + 1)
{
for ($j = 1; $j <= 3; $j = $j + 1)
{
$i_times_j = $i * $j;
print "$i_times_j ";
}
print "\n";
}
prints out a 3*3 matrix where the element at (i,j) equals i times j.
Arguments are important for all programs. Arguments for programs have nothing to do with shouting, but are additional bits of information supplied to the program when it is run. Open a new Perl script (pico arguments.pl) and type this;
$n_arguments = @ARGV;
for ($i = 0; $i < $n_arguments; $i = $i + 1)
{
print "Argument $i equals $ARGV[$i]\n";
}
Run this script by typing
$ perl arguments.pl here are some arguments
What do you see? Can you work out what happened?
In this case you passed four arguments to your script; here, are, some and arguments. The Perl interpreter read those arguments and placed them into a special variable called ARGV that you can access from your script.
Because there can be more than one argument, the ARGV variable must be capable of holding more than one value (remember that $a holds just a single value). ARGV must be able to hold multiple values. Arrays are variables that can hold multiple values. An array is identified using an at sign (@). I remember the difference between a single variable and an array variable by noticing that $ looks like an S (for single variable), while @ looks like an a (for array variable). @ARGV is therefore an array that holds all of the values of the arguments passed to this script.
The size of an array (the number of values it contains) can be found by typing $size_of_array = @array;, so in this case the number of arguments was found by typing $n_arguments = @ARGV;. You can access an individual value within the array using square brackets, e.g. $array[0] is the first value in the array, $array[1] is the second value etc. (Note that we start counting from zero - the first item is at $array[0] not $array[1]) In the case of our script, we loop over each value in the array @ARGV and print out each value (via $ARGV[$i]).
Exercise
Use the knowledge you've gained so far to write a Perl script that can print out any times table. Call your script times_table.pl, and have it read two arguments. The first argument should be the times table to print (e.g. the five times table) while the second should be the highest value of the times table to go up to. So
$ perl times_table.pl 5 12
should print the five times table from 1 times 5 to 12 times 5.
Answer (don't peek at this unless you are stuck or until you have finished!)
As an extension, can you think of a way to use arrays to print out the times table using words rather than using numbers? To do this you will need to know that you can assign values to an array using the following syntax;
@a = ( 1, 2, 3, 4, 5 ); @b = ( "cat", "dog", "fish", "bird" ); @c = ( "zero", "one", "two", "three" );
Answer (don't peek at this unless you are stuck or until you have finished!)
$t = $ARGV[0];
$n = $ARGV[1];
print "This is the $t times table.\n";
for ( $i = 1; $i <= $n; $i = $i + 1 )
{
$t_times_i = $t * $i;
print "$i times $t equals $t_times_i\n";
}
$t = $ARGV[0];
$n = $ARGV[1];
@numbers = ( "zero", "one", "two", "three", "four",
"five", "six", "seven", "eight", "nine",
"ten", "eleven", "twelve" );
print "This is the $t times table.\n";
for ( $i = 1; $i <= $n; $i = $i + 1 )
{
$t_times_i = $t * $i;
print "$numbers[$i] times $numbers[$t] equals $t_times_i\n";
}
Loops provide a means to execute part of the script multiple times. Conditions provide the route to choose whether or not to execute part of a script. Open a new Perl script (pico conditions.pl) and type the following;
for ($i = 1; $i <= 10; $i = $i + 1)
{
if ( $i < 5 )
{
print "$i is less than 5.\n";
}
elsif ( $i > 5 )
{
print "$i is greater than 5.\n";
}
else
{
print "$i is equal to 5.\n";
}
}
This script loops $i over all values from 1 to 10, and uses an if block to test each value of $i. There are three sections to the if block;
If blocks can be used, for example, to correct input, e.g.
$n = $ARGV[0];
if ($n < 0)
{
die "We cannot process negative numbers!\n";
}
(in this case the die command is like print, except that it prints the string and then exits (kills!) the script)
If blocks are very powerful. For example type and run the below script; (you may want to use copy-and-paste rather than typing it in by hand!)
$n = $ARGV[0];
if ($n < 0)
{
print "$n is negative.\n";
}
elsif ($n > 100)
{
print "$n is large and positive.\n";
}
elsif ($n == 10)
{
for ($i = $n; $i >= 1; $i = $i - 1)
{
print "$i...\n";
}
print "Blast off!\n";
}
elsif ($n == 42)
{
print "The answer to life the universe and everything!\n";
}
else
{
print "What is $n?\n";
}
Can you work out what it does before you run it? Run it with some different arguments. Does it do what you expect?
Perl is at its best when it is processing text and reading and writing files. Open a new Perl script (pico files.pl) and type the following lines.
$filename = $ARGV[0];
open FILE,"<$filename" or die "Cannot read the file $filename: $!\n";
$i = 0;
while ($line = <FILE>)
{
$i = $i + 1;
print "$i : $line";
}
Run this script by passing as an argument the path to any file, e.g.
$ perl files.pl ./files.pl
What you should see is that Perl has printed out every line of the file, with each line preceeded by its line number. Lets go through each line of the script to see how Perl has achieved this feat
First we got the filename as the first argument to the script via the line $filename = $ARGV[0];
The next step was to open the file. You open files using the open command. The part open FILE,"<$filename" says to open the file whose path is the value of the variable $filename and attach that file to the filehandle FILE. The less than sign before the filename ("<$filename") tells Perl that you only want to read the file (you won't be writing to it). This means that there will be an error if the file does not exist, or is not readable. This error is handled by the rest of the line (or die "Cannot read the file $filename: $!\n"). The command die tells the Perl script to exit with an error (literally keel over and die!), printing the error string which comes after the die command. The variable $! is a special variable which has a value equal to the last system error. This variable allows you to get a little more information as to why you can't read the file.
In the next line $i = 0; we are just initialising the counter variable $i so that it is equal to zero.
The next line while ($line = <FILE>) is interesting. First, it is a while loop. A while loop is like a for loop, except it only has a condition (there is no initialise or increment section). The while loop keeps looping while the condition is true, and only exits when the condition becomes false. In this case the condition is $line = <FILE>. The
In the body of the loop, $i = $i + 1; just increments the count of how many lines have been read.
Then the line print "$i : $line" prints the value of the counter and the value of the line. Note that we don't need to add a \n onto the end, as the line read in from the file already has its own newline character on the end.
Exercise
head and tail are two useful UNIX programs that can be used to print out the first few, or last few lines of a file (this is useful if you are monitoring log files). Can you write a Perl script that does the same thing?
For example
$ perl head.pl 5 filename
should print out the first five lines of a file, and
$ perl tail.pl 10 filename
should print out the last ten lines of a file.
Answer head.pl and tail.pl. (don't peek unless you are stuck or until you have finished!)
Can you go one better and write a body command, that prints the middle of a file? For example
$ perl body.pl 20 25 filename
prints lines 20 to 25 of a file. Can you write the code so that
$ perl body.pl 25 20 filename
would print lines 25 to 20 (so reversing the file)? (Hint - you will need to use an array)
Answer. (again, don't peek until you have finished!)
$n = $ARGV[0];
$filename = $ARGV[1];
open FILE,"<$filename" or die "Cannot read $filename: $!\n";
$i = 0;
while ($line = <FILE>)
{
$i = $i + 1;
if ($i <= $n)
{
print $line;
}
}
$n = $ARGV[0];
$filename = $ARGV[1];
open FILE,"<$filename" or die "Cannot read $filename: $!\n";
$nlines = 0;
while ($line = <FILE>)
{
$nlines = $nlines + 1;
}
open FILE,"<$filename" or die "Cannot read $filename: $!\n";
$i = 0;
while ($line = <FILE>)
{
$i = $i + 1;
if ($i > $nlines - $n)
{
print $line;
}
}
$start = $ARGV[0];
$end = $ARGV[1];
$filename = $ARGV[2];
open FILE,"<$filename" or die "Cannot read $filename: $!\n";
#read all of the lines into an array
$i = 0;
while ($line = <FILE>)
{
$i = $i + 1;
$lines[$i] = $line;
}
if ($start <= $end)
{
for ($i = $start; $i <= $end; $i = $i + 1)
{
print $lines[$i];
}
}
else
{
for ($i = $start; $i >= $end; $i = $i - 1)
{
print $lines[$i];
}
}
Perl is equally good at writing to files as it is at reading them. Open a new Perl script (pico write_times_table.pl) and type;
$filename = $ARGV[0];
$n = $ARGV[1];
open FILE,">$filename" or die "Cannot write to the file $filename: $!\n";
for ( $i = 1; $i <= 10; $i = $i + 1 )
{
$i_times_n = $i * $n;
print FILE "$i times $n equals $i_times_n\n";
}
close(FILE);
Run this script by typing;
$ perl write_times_table.pl five.txt 5
This should result in the five times table being written to the file five.txt in the current directory.
The part of the line open FILE,">$filename" opens the file whose path is the variable $filename and connects it to the filehandle FILE. This time however, a greater than sign is used (">$filename"), so the file is opened for writing, not reading. If the file does not exist, then the file is created, and it it does exist, then the file is overwritten (so be careful not to overwrite any of your important files!).
There are three different modes for opening files;
To write to the file, supply the filehandle to the print command, e.g. as in the script type print FILE "$i times $n equals $i_times_n\n". The filehandle is placed between the print command and the string to be printed.
Finally, when you have finished writing to a file you should close it using the close command. This ensures that what you have written is properly copied to disc (as it may up to this point be buffered in memory).
Filehandles allow you to refer to more than one file at a time. For example, we could modify the script that numbered each line of the file so that it wrote the numbered lines to another file. For example;
$filename = $ARGV[0];
$numbered_filename = "$filename" . "_numbered";
open RFILE,"<$filename" or die "Cannot read from $filename: $!\n";
open WFILE,">$numbered_filename"
or die "Cannot write to $numbered_filename: $!\n";
$i = 0;
while ($line = <RFILE>)
{
$i = $i + 1;
print WFILE "$i : $line";
}
close(WFILE);
(note that $numbered_filename = "$filename" . "_numbered" uses the . operator, which joins together two strings. So if $filename contained the string file.txt, then $numbered_filename would be set equal to file.txt_numbered).
Most files are arranged into words. It is very easy to split a line of text using Perl into an array of words. Create a new Perl script (pico words.pl) and type the following;
$filename = $ARGV[0];
open FILE,"<$filename" or die "Cannot read the file $filename: $!\n";
$total_nwords = 0;
while ($line = <FILE>)
{
@words = split(" ",$line);
$nwords = @words;
$total_nwords = $total_nwords + $nwords;
}
print "The total number of words in the file $filename equals $total_nwords.\n";
The new command in this script is split. This command splits a string into an array of strings. split(" ",$line) splits the string contained in the variable $line, splitting the string whenever it sees a space character (the " "). You can split by whatever you wish, so split(":",$line) would split the line using colons, while split("the",$line) would split the line using the word the.
Because multiple values are returned by split, they are returned as an array (hence the @ sign on the words variable). The number of words is given by the size of the array, and the words can be accessed using square brackets (e.g. $words[0] is the first word of the line).
If the splitting string is empty, then split will split the string into individual letters. For example, take a look at this script that counts the number of lines, words and letters in a file;
$filename = $ARGV[0];
open FILE,"<$filename" or die "Cannot read $filename: $!\n";
$total_nlines = 0;
$total_nwords = 0;
$total_nletters = 0;
while ($line = <FILE>)
{
@words = split(" ",$line);
$nwords = @words;
for ($i = 0; $i < $nwords; $i = $i + 1)
{
@letters = split("",$words[$i]);
$nletters = @letters;
$total_nletters = $total_nletters + $nletters;
}
$total_nwords = $total_nwords + $nwords;
$total_nlines = $total_nlines + 1;
}
print "$filename contains $total_nlines lines, $total_nwords words " .
"and $total_nletters letters.\n";
Exercises
Write a Perl script that prints out the first word of the first five lines of an arbitrary file (Here is the answer).
Here is a comma-separated table of values;
Make,Insurance Class,Premium ($),Age (years) Ferrari,10,2432.50,3 BMW,8,1231.10,1 VW,6,862.20,4 Fiat,4,591.10,2 Bugatti,15,4312.00,1
Copy this into a text file using pico
Write a Perl script that turns this from a comma separated file with headings Make,Insurance Class, Premium ($),Age (years) into a space separated file with headings Make Premium($) Insurance_Class (answer).
(Hint. To print a dollar sign you must print "\$", so it would be print "Premium(\$)". Also, you may want to strip the newline characters from the end of each line of the file. You can do this by using the chomp command, e.g. chomp $line removes any newline characters from the end of $line)
Write a Perl script that will print out the mean average premium, the make of the oldest car in the list, and the makes of the car in the highest and lowest insurance groups (answer).
$filename = $ARGV[0];
open FILE,"<$filename" or die "Cannot read the file $filename: $!\n";
$i = 5;
while ($line = <FILE>)
{
$i = $i - 1;
if ($i >= 0)
{
@words = split(" ",$line);
print "$words[0]\n";
}
}
$filename = $ARGV[0];
open FILE,"<$filename" or die "Cannot read from $filename: $!\n";
#skip the header line
$line = <FILE>;
#print our own header
print "Make Premium(\$) Insurance_class\n";
#read in the rest of the file
while ($line = <FILE>)
{
chomp $line;
@words = split(",",$line);
print "$words[0] $words[2] $words[3]\n";
}
$filename = $ARGV[0];
open FILE,"<$filename" or die "Cannot read from $filename: $!\n";
#ignore the header
$line = <FILE>;
$total_premium = 0;
$nmakes = 0;
$oldest_age = 0;
$highest_class = 0;
$lowest_class = 1000;
while ($line = <FILE>)
{
chomp $line;
@words = split(",",$line);
$nwords = @words;
if ($nwords == 4)
{
$make = $words[0];
$class = $words[1];
$premium = $words[2];
$age = $words[3];
$nmakes = $nmakes + 1;
$total_premium = $total_premium + $premium;
if ($age > $oldest_age)
{
$oldest_make = $make;
$oldest_age = $age;
}
if ($class > $highest_class)
{
$highest_make = $make;
$highest_class = $class;
}
if ($class < $lowest_class)
{
$lowest_make = $make;
$lowest_class = $class;
}
}
}
$avg_premium = $total_premium / $nmakes;
print "The average premium is \$$avg_premium.\n";
print "The oldest make is $oldest_make.\n";
print "The make in the lowest class is $lowest_make \n";
print "The make in the highest class is $highest_make.\n";
Perl is an excellent language to use when searching within files. Searching is very useful, for example you could imagine using Perl to search an output file to find the results of a calculation. Searching in Perl is straight-forward. Open a new Perl script (pico search.pl) and type the following;
$filename = $ARGV[0];
open FILE,"<$filename" or die "Cannot read the file $filename: $!\n";
while ($line = <FILE>)
{
if ($line =~ m/the/)
{
print $line;
}
}
This script will search a file and print out all of the lines that contain the word the. Try it out!
The key line of this script is if ($line =~ m/the/). This is a condition that uses Perl's pattern operator, =~ together with a match string, m/the/. The match string is just an m with two slashes, with the searched-for text being placed between the slashes. For example;
#does the line contain a lowercase a? $line =~ m/a/; #does the line contain an uppercase A? $line =~ m/A/; #does the line contain the word "cat" $line =~ m/cat/;
To make the search case-insensitive, just place an i after the second slash, e.g.
#search for an upper case or lower case a $line =~ m/a/i; #search for "cat", "CAT", "CaT", "caT" etc. $line =~ m/cat/i;
The combination of search with split provides a powerful tool to help you process simulation output files. Imagine you have run a simulation that calculates the energy of a molecule. Lets imagine that the output file from the simulation looks something like this;
Starting program... Loading molecule... Initialising variables... Starting the calculation - this could take a while! Molecule energy = 2432.6 kcal mol-1 Calculation finished. Bye!
You can get the energy by searching for lines that contain Molecule energy =, and then using split to break this line into words. The value of the energy is the fourth word. Here is an example script that does just this;
$filename = $ARGV[0];
open FILE,"<$filename" or die "Cannot read the logfile $filename: $!\n";
while ($line = <FILE>)
{
if ($line =~ m/Molecule energy =/)
{
@words = split(" ",$line);
$energy = $words[3];
print "The energy of the molecule is $energy kcal mol-1\n";
last;
}
}
Try copying this example output to a file (logfile.txt) and copying the above Perl script (search_log.pl) to see that this works. Or try to write a similar Perl script that processes an output file from one of the programs that you use.
Perl's text search is very flexible. For example, you can search for the contents of a variable, e.g.
$search_string = "the"; $line =~ m/$search_string/;
This will match if $line contains the value of $search_string (namely the).
Exercise
grep is a useful UNIX program that lets you print out lines in a file that match some passed text, for example
$ grep the file.txt
will print out all of the lines that contain the word the.
Write a Perl script (grep.pl) that acts like grep. (Answer)
$search_text = $ARGV[0];
$filename = $ARGV[1];
open FILE,"<$filename" or die "Cannot read the file $filename: $!\n";
while ($line = <FILE>)
{
if ($line =~ m/$search_text/)
{
print $line;
}
}
As well as being excellent for search, Perl is also great at doing search and replace. Create a new Perl script (replace.pl) and copy the following;
$filename = $ARGV[0];
open FILE,"<$filename" or die "Cannot read the file $filename: $!\n";
while ($line = <FILE>)
{
$line =~ s/the/THE/;
print $line;
}
This script reads in a file and prints out every line to the screen. However, before printing the line, it modifies it using the substitute string, s/the/THE/. This substitute string is a lot like the match string, except that now there is an s followed by three slashes. This searches for the text between the first two slashes and replaces it with the text between the last two slashes (so in this case this replaces the with THE). Note that this only replaces the first occurance of the in the line. If you want to replace all of the occurances of the then you have to tell say that you want a global substitution. You do this by adding a g onto the end of the substitute string. This also performs a case-sensitive substitution. You can perform a case-insensitive substitution by adding an i onto the end. For example;
#replace all occurances of "the" with "THE" $line =~ s/the/THE/g; #replace all occurances of "the", "The", "THe" etc. with "THE" $line =~ s/the/THE/ig;
Modify your script (replace.pl) to use g and ig and see how that changes the output.
Sometimes you may want to perform the substitution at a specific place in the line, e.g. only at the beginning of the line or only at the end. You can do this by adding either a carat (^) to the beginning of the search string, to force matching at the beginning, or by adding a dollar sign ($) to the end of the search string to force matching at the end, e.g.;
#case-insensitve substitution of "the" with "THE" only #at the beginning of a line $line =~ s/^the/THE/i; #substitute jpeg with png at the end of a line $filename =~ s/jpeg$/png/;
You can also use variables in the search and replace parts of the substitute string, e.g.
$search = "the"; $replace = "THE"; #case-insensitive replace "the" with "THE" $line =~ s/$search/$replace/i;
Exercise
Use search and replace to update your grep.pl script so that it not only prints matching lines, but it also highlights the matched string in the line (e.g. by adding asterisks around the word, or by capitalising the word).
(note, you can capitalise a string by writing $string = uc($string);. Similarly, you can lower-case a string by writing $string = lc($string);)
Here a possible answer.
$search_text = $ARGV[0];
$filename = $ARGV[1];
$capital_search_text = uc($search_text);
open FILE,"<$filename" or die "Cannot read the file $filename: $!\n";
while ($line = <FILE>)
{
if ($line =~ m/$search_text/)
{
$line =~ s/$search_text/**$capital_search_text**/ig;
print $line;
}
}
So far you've seen how you can use Perl to process your output files. However, what makes Perl a glue language is its ability to actually run programs as well. There are several ways to run a program from your Perl script. I'll only present a couple of ways here. Open a new Perl script (system_run.pl) and copy the following;
$directory = $ARGV[0];
system("ls $directory");
This is a simple script that just lists the contents of a directory. The key line is system("ls $directory");. The system command is passed a string, and executes the value of that string in pretty much exactly the same way that the same text would have been executed if you had typed it yourself at the command line. The output of the command is printed to the screen.
Lets imagine that we have to run ten simulations to calculate the energy of ten different molecules, that are held in the files input1.mol to input10.mol. The energy is calculated using the program molnrg, which is passed the name of the file to process. Here is a simple script that can run all ten simulations, outputting the results to ten log files, called output1.log to output10.log.
for ($i = 1; $i <= 10; $i = $i + 1)
{
system("molnrg input$i.mol > output$i.log");
}
Wasn't that easier than running each simulation individually?
system is good if you want to just run a program. However, there are times when you would like to process the output of the program within Perl. To do this, you have to use backticks. Open a new Perl script (pico backticks.pl) and copy the following;
$directory = $ARGV[0];
@files = `ls $directory`;
$nfiles = @files;
print "There are $nfiles files in $directory\n";
for ($i = 0; $i < $nfiles; $i = $i + 1)
{
print "$i : $files[$i]";
}
This script lists the contents of a directory, but first says how many files are in the directory, and then prints each one preceded by its number.
The key line here is @files = `ls $directory`;. The string contained in the backticks (ls $directory) is executed, and all of the lines of output are returned and placed into the array @files. Note that the newline (\n) character is left on the end of each output line. Use the chomp command if you want to remove the newline character, e.g. chomp $files[$i];.
Exercises
convert is a UNIX program that can convert an image from one file format to another (e.g. convert a JPEG file to a PNG). Write a Perl script that can convert all of the JPEG files in a directory into PNG files.
(the command to convert file.jpg to file.png is convert file.jpg file.png)
Here's a possible answer.
$directory = $ARGV[0];
@jpeg_files = `ls $directory/*.jpg`;
$njpeg_files = @jpeg_files;
for ($i = 0; $i < $njpeg_files; $i = $i + 1)
{
$jpeg_file = $jpeg_files[$i];
chomp $jpeg_file;
$png_file = $jpeg_file;
$png_file =~ s/jpg$/png/;
$command = "convert $jpeg_file $png_file";
print "Running '$command'\n";
system($command);
}
Perl can be used for all stages of controlling jobs submitted to a compute cluster. You've seen how you can write files using Perl. This lets you use Perl to write the command and input files for your programs. You have also seen how to run programs from within Perl, so you can use Perl to run the job using the newly-written input file. You can then process the output using split, search and replace. You could then use the processed output to write new input files to run more programs. In this way, Perl can act as the glue that can stick a chain of programs together, with the output of one program being used to provide the input of the next program.
If this has whet your appetite for Perl then I really recommend that you get hold of a Perl book (like Programming Perl and Learning Perl). There are also hundreds of Perl tutorials on the web (just perform a web search for perl tutorial or perl for beginners). The best way to learn Perl though is to read other people's Perl and copy it. Please feel free to copy, adapt and play with the examples in this workshop. They should hopefully provide the starting points for a range of simple tasks that you may wish to perform using Perl.
Happy perling!