A "switch" for Perl that compiles away: introducing Switch::Declare
DEV Community Grade 10 5h ago

A "switch" for Perl that compiles away: introducing Switch::Declare

Perl has had a complicated relationship with switch . given / when arrived leaning on smartmatch, and smartmatch turned out to be a pit of surprising behaviour — so the whole feature was been experimental, then discouraged, then warned-about for years. Switch::Back.pm is given/when for a post-given/when Perl then there is the old Switch.pm that reached for a source filter, which means it rewrites your program text before Perl ever sees it; and then finally there are a handful of other implementations on cpan including my own Switch::Again . So most of us generally just write if / elsif chains and move on. They're honest and fast, but they're also noisy: the scrutinee gets repeated on every branch, and a six way string dispatch turns into six eq comparisons stacked on top of each other. Switch::Declare is an attempt to get the nice syntax without any of the historical baggage. The pitch in one line: a real switch / case keyword that is parsed entirely at compile time , scoped lexically like a proper pragma, and lowered to the same op tree no source filter, no smartmatch, no runtime dependencies. use Switch:: Declare ; switch ( $value ) { case 200 { handle_ok () } # numeric -> == case " GET " { handle_get () } # string-> eq case /^\d+$/ { all_digits () } # regex -> =~ case [ 400 .. 499 ] { client_error () } # range -> >= && <= case [" a "," b "," c "] { in_set () } # list-> membership case \ &is_weekend { weekend () } # predicate -> $code->($topic) default { fallback () } } It's an expression, too The construct yields the value of the arm that ran, so you can use it on the right hand side of an assignment instead of mutating a variable in each branch: my $label = switch ( $status ) { case 200 { " ok " } case 404 { " missing " } default { " other " } }; The scrutinee — $status here — is evaluated exactly once . The first matching case wins, there is no implicit fall-through, and a trailing default catches everything else. As an expression with no match and no default , you get undef . The patterns are a small, predictable grammar Rather than try to be clever, case recognises a deliberately tiny set of literal pattern shapes, and each one lowers to the cheapest operator that does the job: Pattern Example Becomes number literal case 200 numeric == string literal case "GET" string eq regex case /^\d+$/ $topic =~ /.../ range [LO .. HI] case [400..499] inclusive bounds list [a, b, c] case [1, 2, 3] membership (OR) predicate case \&is_even $code->($topic) The predicate form takes either a code reference ( \&name , package-qualified names work too) or an inline sub { ... } that closes over the surrounding lexicals, which is the escape hatch for anything the literal grammar deliberately doesn't cover: my $limit = 100 ; switch ( $n ) { case sub { $_ [ 0 ] > $limit } { " over " } default { " ok " } } Because the grammar is literals rather than arbitrary expressions, classification is never ambiguous, and the compiler always knows exactly which operator to emit. Why "compile-time" matters for speed This is the part I'm most pleased with. switch is installed through Perl's core keyword-plugin and lexer APIs. When the parser reaches the keyword, the module reads the whole construct, builds an op tree for it then and there, and hands that back in place of the keyword. After compilation, nothing of the parser remains — there is no dispatcher subroutine sitting between you and your code at runtime, no per-call wrapper, no closure to invoke. For the common case — a plain variable or constant scrutinee with single-expression arms — switch compiles to exactly a hand-written if / elsif (ternary) chain: no temporary, no extra scope, no extra ops. In the bundled benchmark ( xt/bench.pl ) the two run within measurement noise of each other (0–2%). Dispatch mode: O(n) chain → O(1) lookup There's a nice bonus. When a switch is effectively a lookup table , every case maps a string literal to a constant value, and there are at least a handful of arms the module quietly lowers it to a single hash lookup against a table built once at compile time , instead of walking a chain of eq tests: # compiles to one hash lookup, not six string comparisons my $name = switch ( $code ) { case " GET " { " read " } case " PUT " { " update " } case " POST " { " create " } case " DELETE " { " remove " } case " PATCH " { " modify " } case " HEAD " { " peek " } default { " ? " } }; In the benchmark, a 20-arm string switch in dispatch mode runs about 2.5× faster than the equivalent if / elsif chain. You never opt in or out, it's chosen automatically, and it never changes behaviour. A real lexical pragma The switch keyword only exists inside the lexical scope of a use Switch::Declare , and you can turn it off again with no Switch::Declare : { use Switch:: Declare ; switch ( $x ) { ... } # 'switch' is the keyword here } switch (); # ...and an ordinary sub call out here Outside that scope, switch is just an identifier. So the keyword can't collide with a switch f

Perl has had a complicated relationship with switch . given /when arrived leaning on smartmatch, and smartmatch turned out to be a pit of surprising behaviour — so the whole feature was been experimental, then discouraged, then warned-about for years. Switch::Back.pm is given/when for a post-given/when Perl then there is the old Switch.pm that reached for a source filter, which means it rewrites your program text before Perl ever sees it; and then finally there are a handful of other implementations on cpan including my own Switch::Again . So most of us generally just write if /elsif chains and move on. They're honest and fast, but they're also noisy: the scrutinee gets repeated on every branch, and a six way string dispatch turns into six eq comparisons stacked on top of each other. Switch::Declare is an attempt to get the nice syntax without any of the historical baggage. The pitch in one line: a real switch /case keyword that is parsed entirely at compile time, scoped lexically like a proper pragma, and lowered to the same op tree no source filter, no smartmatch, no runtime dependencies. use Switch::Declare; switch ($value) { case 200 { handle_ok() } # numeric -> == case "GET" { handle_get() } # string -> eq case /^\d+$/ { all_digits() } # regex -> =~ case [400 .. 499] { client_error() } # range -> >= && membership case \&is_weekend { weekend() } # predicate -> $code->($topic) default { fallback() } } It's an expression, too The construct yields the value of the arm that ran, so you can use it on the right hand side of an assignment instead of mutating a variable in each branch: my $label = switch ($status) { case 200 { "ok" } case 404 { "missing" } default { "other" } }; The scrutinee — $status here — is evaluated exactly once. The first matching case wins, there is no implicit fall-through, and a trailing default catches everything else. As an expression with no match and no default , you get undef . The patterns are a small, predictable grammar Rather than try to be clever, case recognises a deliberately tiny set of literal pattern shapes, and each one lowers to the cheapest operator that does the job: | Pattern | Example | Becomes | |---|---|---| | number literal | case 200 | numeric == | | string literal | case "GET" | string eq | | regex | case /^\d+$/ | $topic =~ /.../ | range [LO .. HI] | case [400..499] | inclusive bounds | list [a, b, c] | case [1, 2, 3] | membership (OR) | | predicate | case \&is_even | $code->($topic) | The predicate form takes either a code reference (\&name , package-qualified names work too) or an inline sub { ... } that closes over the surrounding lexicals, which is the escape hatch for anything the literal grammar deliberately doesn't cover: my $limit = 100; switch ($n) { case sub { $_[0] > $limit } { "over" } default { "ok" } } Because the grammar is literals rather than arbitrary expressions, classification is never ambiguous, and the compiler always knows exactly which operator to emit. Why "compile-time" matters for speed This is the part I'm most pleased with. switch is installed through Perl's core keyword-plugin and lexer APIs. When the parser reaches the keyword, the module reads the whole construct, builds an op tree for it then and there, and hands that back in place of the keyword. After compilation, nothing of the parser remains — there is no dispatcher subroutine sitting between you and your code at runtime, no per-call wrapper, no closure to invoke. For the common case — a plain variable or constant scrutinee with single-expression arms — switch compiles to exactly a hand-written if /elsif (ternary) chain: no temporary, no extra scope, no extra ops. In the bundled benchmark (xt/bench.pl ) the two run within measurement noise of each other (0–2%). Dispatch mode: O(n) chain → O(1) lookup There's a nice bonus. When a switch is effectively a lookup table, every case maps a string literal to a constant value, and there are at least a handful of arms the module quietly lowers it to a single hash lookup against a table built once at compile time, instead of walking a chain of eq tests: # compiles to one hash lookup, not six string comparisons my $name = switch ($code) { case "GET" { "read" } case "PUT" { "update" } case "POST" { "create" } case "DELETE" { "remove" } case "PATCH" { "modify" } case "HEAD" { "peek" } default { "?" } }; In the benchmark, a 20-arm string switch in dispatch mode runs about 2.5× faster than the equivalent if /elsif chain. You never opt in or out, it's chosen automatically, and it never changes behaviour. A real lexical pragma The switch keyword only exists inside the lexical scope of a use Switch::Declare , and you can turn it off again with no Switch::Declare : { use Switch::Declare; switch ($x) { ... } # 'switch' is the keyword here } switch(); # ...and an ordinary sub call out here Outside that scope, switch is just an identifier. So the keyword can't collide with a switch function in unrelated code, and importing the module has no spooky action at a distance. Benchmarking Numbers help. As I mentioned years ago I wrote Switch::Again, which solves the same problem from the opposite end: switch LIST builds a closure at run time that matches each call's argument against its keys (via Struct::Match ). Here is the same 20-way string dispatch written both ways: # Switch::Declare — parsed once at compile time, lowered to a hash lookup my $n = switch ($key) { case "k0" { 0 } case "k1" { 1 } ... case "k19" { 19 } default { -1 } }; # Switch::Again — dispatcher closure built once, then called my $sw = switch k0 => sub { 0 }, k1 => sub { 1 }, ... k19 => sub { 19 }, default => sub { -1 }; my $n = $sw->($key); The benchmark builds the Switch::Again dispatcher once, outside the timed loop, so we're comparing steady-state per-call cost — the fairest possible footing for a runtime matcher. Results on perl 5.42: == 6 string arms (Switch::Declare chain mode) == Rate Switch::Again if/elsif Switch::Declare Switch::Again 411555/s -- -97% -98% if/elsif 13282260/s 3127% -- -33% Switch::Declare 19891605/s 4733% 50% -- == 20 string arms (Switch::Declare dispatch mode -> hash lookup) == Rate Switch::Again hand %dispatch Switch::Declare Switch::Again 183089/s -- -98% -99% hand %dispatch 11888675/s 6393% -- -40% Switch::Declare 19837923/s 10735% 67% -- == 3 regex arms == Rate Switch::Again Switch::Declare Switch::Again 1233188/s -- -82% Switch::Declare 6736915/s 446% -- So at six string arms Switch::Declare is about 48× faster than Switch::Again; at twenty arms, where dispatch mode kicks in, it's about 108× faster — and still 67% faster than a hand rolled %dispatch table, because it returns the matched constant straight out of the hash instead of calling a coderef per hit. Even on regexes, where there's no hash trick to play, it's about 5.5× ahead. Getting it cpanm Switch::Declare Give it a try, and let me know how it goes in the comment section below. Top comments (0)

Comments

No comments yet. Start the discussion.