Feeds

Safe signals in Perl

The new gotcha

Choosing a cloud hosting partner with confidence

Once upon a time, handling signals in Perl code had a pretty big gotcha — one that you couldn’t work around. Perl 5.8 changed signal handling in a way that eliminated that gotcha, but replaced it with a different one, harder to trigger, but no less surprising.

The original gotcha

Signals are delivered asynchronously — by design, you can’t predict when they’re going to arrive. They can’t arrive during the execution of a low-level machine instruction, but that’s pretty much the only guarantee you get.

Now, that might be fine and dandy if you’re writing in assembler, but that’s not true of most of us these days. In other languages, you’re going to run into a problem. Suppose your main code is busily modifying some data structure, and that your signal handler wants to modify the same data structure. Then when the signal is delivered, and your handler is invoked, the data structure could well be in a temporarily-inconsistent state. So if the handler does anything that modifies the data structure, it’s likely to end up corrupting it.

This can be particularly awkward for Perl. The problem is that when your code executes, that involves manipulating data structures inside the runtime (or “interpreter” if you prefer). If the signal handler does the same sort of manipulation — and let me tell you, it almost certainly does — then that’s a good way of corrupting the runtime’s internal data structures in ways that can cause all manner of exciting crashes.

Safe signals

The change in Perl 5.8 that deals with problem is called “safe signals”. Rather than having your signal-handling Perl code be directly invoked during the asynchronous receipt of the signal itself, Perl just has the signal handler note that a particular signal has been delivered. Then at suitable safe moments, it checks whether any signals have been delivered but not yet handled, and if so, invokes the appropriate Perl handler.

This neatly deals with the issue. The runtime never does anything from a signal handler that could corrupt its own data structures. But Perl-side handlers still get invoked asynchronously with respect to the main program.

The new gotcha

Unfortunately, this introduces a problem of its own. It’s all down to what counts as a suitable safe moment to invoke your Perl signal handler. To a first approximation, that means between execution of the individual operations that your code is compiled into. (There’s some additional complexity to allow handlers to be invoked during interrupted I/O system calls, but that’s a relatively minor detail.)

In most circumstances, this works just fine. The problem is when Perl-internal ops take a long time to execute.

Sam Tregar recently posted a message to the Perl-XML mailing-list about a problem he was having with XML::LibXML. When he fed certain badly-broken documents to XML::LibXML’s HTML parser, that triggered an infinite-loop bug in the underlying libxml2 library. He was attempting to work around that bug by using a timeout:

eval {
    local $SIG{ALRM} = sub { die "TIMEOUT\n" };
    alarm 10;
    $libxml->parse_html_string($html);
    alarm 0;
};

First, Sam sets up a signal handler for the duration of this eval block, to raise a distinguishable exception on receipt of a SIGALRM. The first alarm call asks for a SIGALRM signal to be delivered after 10 seconds. Then the HTML is parsed; if that finishes in a reasonable amount of time, the second alarm call disables the timeout. Once it’s all finished, surrounding code checks whether the timeout occurred, and acts appropriately if so.

This looks like perfectly reasonable code. The only problem is that it didn’t work — Sam’s timeout exception was never getting raised.

Knowing about the way Perl safe signals work offers a simple explanation of what’s happening here. XML::LibXML’s parse_html_string method calls an XS function which uses libxml2 to do the parsing. When the Perl runtime invokes an XS function, that’s a single op. So if the XS (or one of the C functions it uses) goes into an infinite loop, that op never finishes executing. But the Perl runtime waits for an op to finish before invoking your signal handler, so the handler never runs.

Ouch.

Workarounds

There are workarounds for this. I suggested to Sam on the list that he just switch back to the pre-5.8 signal-handling behaviour. That’s done by setting the PERL_SIGNALS environment variable to the string unsafe. Unfortunately, you can’t do it from within the program you want to be affected — the variable has to have been set at the point the Perl runtime starts up. A simple option is to put an env(1) wrapper round your code:

$ env PERL_SIGNALS=unsafe perl html_parser.pl

Failing that, in some situations, it might be possible to get your program to wrap itself:

BEGIN {
    if (!$ENV{PERL_SIGNALS} || $ENV{PERL_SIGNALS} ne 'unsafe') {
        $ENV{PERL_SIGNALS} = 'unsafe';
        exec $^X, $0, @ARGV;
    }
}

This isn’t without problems, though — remember that the modern “safe” signal handling is specifically intended to prevent a large class of unpredictable, hard-to-debug memory-corruption bugs that happen only on receipt of signals at certain times.

There isn’t really a perfect solution to Sam’s problem. The only other option is to disable the safe handling just for this specific signal. Instead of using the normal %SIG variable to install a suitable handler, you can ask the POSIX module to install an unsafe SIGALRM handler; Perl’s perlipc documentation has the details. That obviously doesn’t eliminate the problems caused by immediate signal delivery, but it does minimise the situations in which they can bite.

Aaron Crane is El Reg’s Technical Overlord. This piece was originally published on his personal website.

Choosing a cloud hosting partner with confidence

More from The Register

next story
Nexus 7 fandroids tell of salty taste after sucking on Google's Lollipop
Web giant looking into why version 5.0 of Android is crippling older slabs
Be real, Apple: In-app goodie grab games AREN'T FREE – EU
Cupertino stands down after Euro legal threats
Download alert: Nearly ALL top 100 Android, iOS paid apps hacked
Attack of the Clones? Yeah, but much, much scarier – report
Microsoft: Your Linux Docker containers are now OURS to command
New tool lets admins wrangle Linux apps from Windows
Bada-Bing! Mozilla flips Firefox to YAHOO! for search
Microsoft system will be the default for browser in US until 2020
Facebook, working on Facebook at Work, works on Facebook. At Work
You don't want your cat or drunk pics at the office
prev story

Whitepapers

Why and how to choose the right cloud vendor
The benefits of cloud-based storage in your processes. Eliminate onsite, disk-based backup and archiving in favor of cloud-based data protection.
Getting started with customer-focused identity management
Learn why identity is a fundamental requirement to digital growth, and how without it there is no way to identify and engage customers in a meaningful way.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
The hidden costs of self-signed SSL certificates
Exploring the true TCO for self-signed SSL certificates, including a side-by-side comparison of a self-signed architecture versus working with a third-party SSL vendor.
Intelligent flash storage arrays
Tegile Intelligent Storage Arrays with IntelliFlash helps IT boost storage utilization and effciency while delivering unmatched storage savings and performance.