Perls of knowledge

Date: 2 July, 2025
Category: code
Tags: Perl

A few weeks ago, I was asked by a client to look into how long it would take to get a Perl script from 15 - 20 years ago running again. Fortunately, I got it up and running and talking to a Ruby on Rails app, database, and AWS S3! I am as pleasantly surprised as you are!

I had no Perl experience before embarking on this voyage. The learning curve isn't too bad, as languages go, but I've been keeping a mental list of things I had to learn along the way that I wouldn't like to have to figure out again. To help me with the list, I have recruited a previous Chancellor of the University of Nebraska-Lincoln, who once allowed the Communications office to put him in a video series called Perls of Knowledge. They were amazing. There may not be a better Harlem Shake video.

Golden words on a black background that say Perls of Knowledge — "I've been thinking..."

Dang, I miss Perlman. But that's a blogpost for another day.

Why Perl?

I know what your first thought was, because it was my first thought, as well:

"Why don't you just rewrite the script in a language the client is currently using? Wouldn't that be faster?"

I wish! The Perl script has extremely complex regular expressions and no tests, making it difficult to ensure that porting it to another language would actually recreate the functionality they used to have.

I thought about redoing everything except for the regex, but figured well, by the time I get a Perl environment running at ALL I might as well just see if I can get the whole thing running. And here we are!

Perlman with a straight face with the hashtag yolo in giant letters superimposed over him — Let's do this!

The bad news to start with was that the previous environment where the script ran no longer existed. There was no documentation about the version, about the module versions, about environment variables...I was a little worried initially about my odds.

Getting set up to run Perl

The good news about trying to run very old Perl scripts is that Perl has been on version 5 for a long time. We're talking decades! But since I didn't know which version I might ultimately need, I started my journey by learning about how to switch between them.

perlbrew

My Macbook came with Perl installed, although I was surprised to find that it was using Perl 34, which was released in 2022 and is nearing the end of maintenance support. Happily, there is a tool called perlbrew which is quite easy to install and got me swapping between local perl versions within minutes.

perlbrew is straightforward, with commands like install and use to grab specific versions and switch to them. I didn't use, but appreciated, that there's a feature where you can run your test suite with all installed versions of Perl:

perlbrew exec perl my-tests.pl

Although it was refreshingly simple to get different versions of Perl working so quickly, I didn't want to recreate the original problem where the script can only run in one place and one place only. It was time for some containerization and package management!

Docker

Cool news! There is an official Perl image on Dockerhub.

What is a Docker image? Think of it like a computer already set up with the basics you'll need to run Perl and not much else. They've got a number of different versions of Perl to grab, although unfortunately, no surprise here, the official image doesn't support versions from 15 - 20 years ago. If I needed that sort of environment, I was going to be on my own with a blank slate of an image.

I decided to heck with it, I'd just try to get the script running with the latest version and see what kind of errors I got as a starting point.

My Dockerfile was pretty bare bones, but I was ready to move onto the next step:

FROM perl:5.40

WORKDIR /src

COPY . /src/

Package management

Although the code I was given didn't have instructions for a package manager, I was given a starting point by referring to the top of the files where they imported libraries:

use File::Find;
use DBI;
use Digest::MD5 qw(md5_base64);

Some of these were not 3rd party libraries, but rather modules that ship with Perl. I compared them against the standard Perl modules list to come up with a set of modules I was going to need to run the script.

Now, it's very possible to install Perl modules one by one using the cpanm command: cpanm module_name. However, I wanted to make something repeatable / self-documenting, so I opted to put them into a cpanfile, which is the Perl equivalent of a package.json file for Javascript, a Gemfile for Ruby, or a requirements.txt file for Python.

It seems that the Perl community has the choice between two package managers that can read a cpanfile: cpanm and Carton. I didn't look into them enough to be able to talk about the pros and cons of each, unfortunately, although I selected cpanm because it came with the Perl docker image! Once I had my cpanfile assembled, I was able to add the following line to my Dockerfile and install the modules:

RUN cpanm -n --installdeps .

For the curious: -n skips tests run by each module after it downloads, which I decided to do because it was taking ages (like, 15 - 20 minutes!). The --installdeps flag points it towards the location of my cpanfile.

It is my semi-uninformed impression that to install from a cpanfile in Carton you simply run the following to first get Carton, and then use it as a package manager:

cpanm Carton
carton install

Perlman browses on his phone, studiously ignoring a large inflatable mascot, Lil Red, dancing next to him — If you don't skip the tests, you better find some quality reading to do while you install the modules.

@INC and using modules

I wasn't quite up and running yet, though, because the script also had a number of local modules that needed to be loaded. In the scripts themselves, they were imported with this style of syntax:

use namespace::module;

But when I tried to run the Perl scripts, I got errors that they couldn't be located at any of the several paths Perl was trying. There were a couple ways to deal with this. One was to point at each module specifically:

use lib "../namespace/module.pm";

This worked but I didn't like it. There was definitely a better way. Looking online, I found that you could add your module's directory to the PERL5LIB environment variable or you could add it to a Perl variable, @INC.

What I ultimately ended up doing was providing @INC with more context. So when I ran my Perl commands I used:

perl -I "/path/to/my/local/modules" script.pl

Perl syntax and gotchas

Overall, I didn't mind Perl. It wasn't my favorite language but I found it pleasantly easy to understand error messages, find fixes, and keep going. I liked how stable a lot of the libraries seemed. After at least 15 years, I only had to replace one or two external modules to get the code working!

But there were some things about Perl that I found quite confusing or perturbing, like when I first looked up "Perl arrow operator" ( -> ) and got hits about "infix dereference operators" instead of just that it's a method call. And also how difficult it is to get the length of... anything? Like, anything. Anyway, here are some things I tripped over.

Variables and expletives

Scope

The first thing to know is that when you define a variable in Perl, you also set its scope using either my to limit it to its current block / function / module / etc or our to make it available more widely.

# mymodule.pm

package mymodule {
  our $announcement = "Hello";
  my $secret = "Shhhh";
}

say $mymodule::announcement;
=> "Hello"

say $mymodule::secret;
=>

Scalars, arrays, and hashes

What's with the dollar signs? Well, Perl wants you to distinguish between the type of data you're working with when you declare a variable, and the dollar sign indicates you're working with a scalar. See variable names in the perldoc for more info.

# scalars are single units like strings, integers, etc.
my $scalar_str = "Some string";
my $scalar_int = 1;

# use an @ sign to denote multiple items
my @array = (1, 2, 3);

# and finally, use a % for hashes
my %hash = ( Key => 'Value' );

I kinda liked that naming convention until things got weird:

my @array = ("a", "b", "c");

# get the first element from @array, but note this uses $ instead of @
$array[0];

# wait a minute, what the
$array[$#array];

What that is doing is getting the last index of the array, which can then be fed back in to get the actual scalar in the array.

Perlman with his finger on a globe, wincing at the result he got — I'm not having as much fun anymore

And just like that, we've hit upon one of my main complaints about Perl. I like Ruby because for the most part when you read it out loud, you can tell what's going on. But I don't know how to read $array[$#array] out loud, unless if I should just be swearing?

I must admit I also didn't love that the $, @, or % in front of a variable needed to be adjusted depending on what was being returned, because it reads a bit backwards to me.

my @array = (1, 2, 3);

# what I expect
my $array_item = @array[0];

# what Perl wants
my $array_item = $array[0];

I suspect because I had use strict mode enabled, I ran into that problem a LOT.

Perlman looking at his cell phone and waving at us to go away — Enough about dollar signs, let's move on

Comparison operators, ugh

It took me longer than I care to admit to realize that in Perl there are different comparison operators for numbers than there are for strings. This explains my confusion when $ENV{'SOME_CRITERIA'} == true was failing to behave the way I expected.

Condition	Number	String
Less than	<	lt
Less than or equal	<=	le
Greater than	>	gt
Greater than or equal	>=	ge
Equals	==	eq
Not equal	!=	ne

Arguments

I think I've gotten spoiled by things like Ruby and TypeScript when it comes to arguments. I was a little shocked by how DIY Perl felt when it came to passing information to functions (or subroutines, as Perl prefers to call them).

sub my_subroutine {
  my $first_param = shift;
  my $second_param = shift;
}

In the above example, shift is actually manipulating the array of parameters passed in, stored as @_ (that is, @ to mark it as an array and the name being _). This means you can also do something like:

sub my_subroutine {
  my($first_param, $second_param) = @_;
}

OR, if we remember back to how we work with arrays, you could do:

sub my_subroutine {
  my $first_param = $_[0];
  my $second_param = $_[1];
}

Perlman sitting in a leather chair, the captioned text says 'this phrase is stupid' underneath him — Not how I would have put it, but big mood

I tended to go with the first approach because while shift was initially confusing to read in the unfamiliar code, once I was used to it, it felt easier to understand what was happening than looking at a bunch of at signs and underscores.

But wait, there's more! There are so many more ways to pass information to a subroutine and save it to a variable, it's a little dizzying: https://perldoc.perl.org/perlsub

You may have noticed, though, that all of the above examples are essentially the same thing; they are all dealing with positional arguments. I'm sure that people use hashes to function as named parameters, but from what I could tell at a glance, Perl just seems to lean hard into positional parameters. I saw a lot of examples with things like this when you needed to skip an argument:

sub my_subroutine {
  my($first, $second, $third, $fourth) = @_;
  $second ||= 2
}

my_subroutine(1, undef, 3, 4)

Length

Arrays

Of all my complaints about Perl, I think getting the length of different items was the most frustrating for me.

Let's try getting the length of an array.

# we already know how to get the index, so just add one
1 + $#array;

# or we can use "scalar" to tell it to represent the array as a scalar value
scalar @array;

# or add zero to force it to a scalar value
0 + @array;

All of those feel pretty hacky to me, but I guess one nice thing about this way of doing things is that you can use comparison operators directly against an array to test its length:

my @array = (1, 2, 3);

if (@array < 5) { ... }

Database results

Where I really ran into trouble with length was when I was getting results from a database query. I was trying to debug a little and wanted to know how many results I had gotten. I thought this would be pretty simple, but it seems as though you have to iterate through each result to get the count, rather than just doing something simple like scalar @results. I looked at several stack overflow pages where people just said "why don't you do a COUNT query if you want the count?" Okay, thanks.

Perlman with with subtitled text that says 'you say it before you are gonna do something dumb.' — I don't understand why Stack Overflow commenters acted like this was such a ridiculous thing to want to know

STDIN and writing to files

After some grousing about Perl I'm ready to sing its praises. It makes reading from STDIN and writing to STDOUT, STDERR, and arbitrary files very easy!

STDIN

Ready for this?

my $stdin_value = <>;

It was not very readable to me during my first pass through the script, but once I knew what to look for, I appreciated how simple it was to get the value.

STDOUT and STDERR

If you want to direct something to STDOUT or STDERR you simply write:

say STDOUT "some message";
say STDERR "some error";

This made it easy for me to pass errors to the Docker logs, just by writing say STDERR. I love it.

Perlman with a forced smile — My face when

Files

Although, as with many languages, there are multiple ways to work with files, the one I found easiest to use looked like this:

my $file;

open $file, ">>", "/path/to/file.txt"
  or die "Something went wrong opening file.txt: $!";

say $file "some message";

close $file;

Trust me when I say that you'll want that or die associated with opening the file. My Docker setup initially was having trouble writing files to the file system and I couldn't figure out which part was going wrong until I sprinkled in a bunch of or dies.

Troubleshooting

I enjoyed the "or die" functionality quite a bit. It fits into my preferred programming worldview where I want to be able to read code out loud to understand what it's doing.

    mkdir $tmp
      or $!{EEXIST}  # don't die if the directory already exists
      or die "Could not make directory: $!";

Other than die-ing here and there in my code, I never really worked out a better system than the time honored "lots of print statements" method of debugging. To that end, I enjoyed using Data::Dumper to inspect objects.

use Data::Dumper;

print Dumper($some_object);

Did you know that try catch blocks are considered an experimental feature in Perl? This was surprising to me, although I think that the or die catches for exceptions are probably why try catch blocks are less necessary? But if you want to use one, you have to opt in:

use feature 'try';

try {
  ...
}
catch ($e) {
  ...
}

Perlman sitting next to an exceptional looking pie — $perlman->knowledge() or pie

Connecting to AWS S3

I was worried that connecting to AWS S3 from an older Perl script would be a pain in the tookus, but actually it went very well!

use Net::Amazon::S3;
use Net::Amazon::S3::Authorization::Basic;
use LWP::Protocol::https;

my %config {
  authorization_context => Net::Amazon::S3::Authorization::Basic->new (
    aws_access_key_id => $ENV{'AWS_ACCESS_KEY_ID'},
    aws_secret_access_key => $ENV{'AWS_SECRET_ACCESS_KEY'}
  ),
  retry => 1
}

my $s3 = Net::Amazon::S3->new(%config);

Then to use it, you just say things like:

my $bucket = $s3->bucket("bucket_name");
$bucket->get_key_filename(
  $key,
  "GET",
  $download_path
) or die "Problem downloading file from S3: " . $s3->err . " -- " . $s3->errstr);

One small bump I came across was when I wanted to use localstack to do development without hitting an actual S3 bucket. Ultimately, I just had to add the following to my %config hash:

# you'll need to include the generic module at the top of the file
use Net::Amazon::S3::Vendor::Generic;

  vendor => Net::Amazon::S3::Vendor::Generic->new (
    host => "s3:4566",
    use_https => 0
  )

The Perls of knowledge I made along the way

I was never afraid of Perl the way that I am of, say, Java, but I think this experience was largely positive. Yes, there are some funky syntax things. True, the scripts I was working with didn't use much in the way of classes, so there's more to learn there. But at the end of the day, Perl seems extremely stable and that was its greatest strength in my book. The fact that I could get a script that was written in the early 2000s running so easily is a testament to the language and its community.

The environment was easy to set up, the modules were easy to download and work with, and although it felt a little clunky, I was able to do what I needed to do. More than anything, the error messages are very helpful, and that's 90% of the battle in an unfamiliar language!

Overall, I don't have a desire to become a Perl developer, but I kind of enjoyed my time in Perl land.

Perlman waving his hands in the air excitedly, there is a model wooden ship next to him — Wow, it worked!

My Dockerfile and docker-compose.yml

Just in case anybody is in a similar situation and doesn't want to think too hard, here's my setup:

Dockerfile

I wrote a little Perl webserver that calls the modules when you GET or POST a URL, but if you don't have a web server and just need the container to stay up and running, you could swap out the ENTRYPOINT for ENTRYPOINT ["tail", "-f", "/dev/null"].

FROM perl:5.40

WORKDIR /src

COPY . /src/

# -n no-test (tests slow installation down substantially)
RUN cpanm -n --installdeps .

EXPOSE 8080
ENTRYPOINT ["perl", "bin/server.pm"]

docker-compose.yml

The following uses setup scripts for the database and AWS. All they do is populate a little data into their respective tools. If there's popular demand I can publish what those look like, too.

services:
  tools:
    # no image needed because we're using the local Dockerfile image
    build: ./

    working_dir: /src
    # mount the files in this repo as a volume so we can edit them at will
    volumes:
      - ./:/src
    ports:
      - 8080:8080
    environment:
      - HOME=/src
      - PROD_DATABASE_URL=postgresql://postgres:password@db/dev
      - AWS_ACCESS_KEY_ID=test
      - AWS_SECRET_ACCESS_KEY=test
      - AWS_BUCKET=test-bucket
      - AWS_CUSTOM_HOST=s3:4566  # only use this variable in local dev
      - LOG_DEBUG_MODE=false

  db:
    image: postgres
    restart: always
    environment:
      POSTGRES_DB: dev
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: password
    volumes:
      - ./docker-db-setup:/docker-entrypoint-initdb.d
    ports:
      - 5432:5432

  s3:
    image: localstack/localstack:4.5.0
    ports:
      - 4566:4566
    volumes:
      - ./docker-aws-setup/localstack.sh:/etc/localstack/init/ready.d/script.sh
      - ./test/fixtures:/src
    environment:
      - AWS_ACCESS_KEY_ID=test
      - AWS_SECRET_ACCESS_KEY=test