Skip to content

Latest commit

ย 

History

History
3177 lines (2560 loc) ยท 89.7 KB

perl_the_swiss_knife.md

File metadata and controls

3177 lines (2560 loc) ยท 89.7 KB

Perl one liners

Table of Contents


$ perl -le 'print $^V'
v5.22.1

$ man perl
PERL(1)                Perl Programmers Reference Guide                PERL(1)

NAME
       perl - The Perl 5 language interpreter

SYNOPSIS
       perl [ -sTtuUWX ]      [ -hv ] [ -V[:configvar] ]
            [ -cw ] [ -d[t][:debugger] ] [ -D[number/list] ]
            [ -pna ] [ -Fpattern ] [ -l[octal] ] [ -0[octal/hexadecimal] ]
            [ -Idir ] [ -m[-]module ] [ -M[-]'module...' ] [ -f ]
            [ -C [number/list] ]      [ -S ]      [ -x[dir] ]
            [ -i[extension] ]
            [ [-e|-E] 'command' ] [ -- ] [ programfile ] [ argument ]...

       For more information on these options, you can run "perldoc perlrun".
...

Prerequisites and notes

  • familiarity with programming concepts like variables, printing, control structures, arrays, etc
  • Perl borrows syntax/features from C, shell scripting, awk, sed etc. Prior experience working with them would help a lot
  • familiarity with regular expression basics
  • this tutorial is primarily focussed on short programs that are easily usable from command line, similar to using grep, sed, awk etc
    • do NOT use style/syntax presented here when writing full fledged Perl programs which should use strict, warnings etc
    • see perldoc - perlintro and learnxinyminutes - perl for quick intro to using Perl for full fledged programs
  • links to Perl documentation will be added as necessary
  • unless otherwise specified, consider input as ASCII encoded text only

Executing Perl code

  • One way is to put code in a file and use perl command with filename as argument
  • Another is to use shebang at beginning of script, make the file executable and directly run it
$ cat code.pl
print "Hello Perl\n"
$ perl code.pl
Hello Perl

$ # similar to bash
$ cat code.sh
echo 'Hello Bash'
$ bash code.sh
Hello Bash
  • For short programs, one can use -e commandline option to provide code from command line itself
  • This entire chapter is about using perl this way from commandline
$ perl -e 'print "Hello Perl\n"'
Hello Perl

$ # say automatically adds newline character
$ perl -E 'say "Hello Perl"'
Hello Perl

$ # similar to
$ bash -c 'echo "Hello Bash"'
Hello Bash

$ # multiple commands can be issued separated by ;
$ # -l will be covered later, here used to append newline to print
$ perl -le '$x=25; $y=12; print $x**$y'
59604644775390625
  • Perl is (in)famous for being able to things more than one way
  • examples in this chapter will mostly try to use the syntax that avoids (){}
$ # shows different syntax usage of if/say/print
$ perl -e 'if(2<3){print("2 is less than 3\n")}'
2 is less than 3
$ perl -E 'say "2 is less than 3" if 2<3'
2 is less than 3

$ # string comparison uses eq for ==, lt for < and so on
$ perl -e 'if("a" lt "b"){$x=5; $y=10} print "x=$x; y=$y\n"'
x=5; y=10
$ # x/y assignment will happen only if condition evaluates to true
$ perl -E 'say "x=$x; y=$y" if "a" lt "b" and $x=5,$y=10'
x=5; y=10

$ # variables will be interpolated within double quotes
$ # so, use q operator if single quoting is needed
$ # as single quote is already being used to group perl code for -e option
$ perl -le 'print "ab $x 123"'
ab  123
$ perl -le 'print q/ab $x 123/'
ab $x 123

Further Reading


Simple search and replace

  • substitution command syntax is very similar to sed for search and replace
    • syntax is variable =~ s/REGEXP/REPLACEMENT/FLAGS and by default acts on $_ if variable is not specified
    • see perldoc - SPECIAL VARIABLES for explanation on $_ and other such special variables
    • more detailed examples will be covered in later sections
  • Just like other text processing commands, perl will automatically loop over input line by line when -n or -p option is used
    • like sed, the -n option won't print the record
    • -p will print the record, including any changes made
    • newline character being default record separator
    • $_ will contain the input record content, including the record separator (unlike sed and awk)
    • any directory name appearing in file arguments passed will be automatically ignored
  • and similar to other commands, perl will work with both stdin and file input
    • See other chapters for examples of seq, paste, etc
$ # sample stdin data
$ seq 10 | paste -sd,
1,2,3,4,5,6,7,8,9,10

$ # change only first ',' to ' : '
$ # same as: sed 's/,/ : /'
$ seq 10 | paste -sd, | perl -pe 's/,/ : /'
1 : 2,3,4,5,6,7,8,9,10

$ # change all ',' to ' : ' by using 'g' modifier
$ # same as: sed 's/,/ : /g'
$ seq 10 | paste -sd, | perl -pe 's/,/ : /g'
1 : 2 : 3 : 4 : 5 : 6 : 7 : 8 : 9 : 10

$ cat greeting.txt
Hi there
Have a nice day
$ # same as: sed 's/nice day/safe journey/' greeting.txt
$ perl -pe 's/nice day/safe journey/' greeting.txt
Hi there
Have a safe journey

inplace editing

$ # same as: sed -i.bkp 's/Hi/Hello/' greeting.txt
$ perl -i.bkp -pe 's/Hi/Hello/' greeting.txt
$ # original file gets preserved in 'greeting.txt.bkp'
$ cat greeting.txt
Hello there
Have a nice day

$ # using -i'bkp.*' will save backup file as 'bkp.greeting.txt'

$ # use empty argument to -i with caution, changes made cannot be undone
$ perl -i -pe 's/nice day/safe journey/' greeting.txt
$ cat greeting.txt
Hello there
Have a safe journey
  • Multiple input files are treated individually and changes are written back to respective files
$ cat f1
I ate 3 apples
$ cat f2
I bought two bananas and 3 mangoes

$ perl -i.bkp -pe 's/3/three/' f1 f2
$ cat f1
I ate three apples
$ cat f2
I bought two bananas and three mangoes

Line filtering


Regular expressions based filtering

  • syntax is variable =~ m/REGEXP/FLAGS to check for a match
    • variable !~ m/REGEXP/FLAGS for negated match
    • by default acts on $_ if variable is not specified
  • as we need to print only selective lines, use -n option
    • by default, contents of $_ will be printed if no argument is passed to print
$ cat poem.txt
Roses are red,
Violets are blue,
Sugar is sweet,
And so are you.

$ # same as: grep '^[RS]' or sed -n '/^[RS]/p' or awk '/^[RS]/'
$ # /^[RS]/ is shortcut for $_ =~ m/^[RS]/
$ perl -ne 'print if /^[RS]/' poem.txt
Roses are red,
Sugar is sweet,

$ # same as: grep -i 'and' poem.txt
$ perl -ne 'print if /and/i' poem.txt
And so are you.

$ # same as: grep -v 'are' poem.txt
$ # !/are/ is shortcut for $_ !~ m/are/
$ perl -ne 'print if !/are/' poem.txt
Sugar is sweet,

$ # same as: awk '/are/ && !/so/' poem.txt
$ perl -ne 'print if /are/ && !/so/' poem.txt
Roses are red,
Violets are blue,

With the m you can use any pair of non-alphanumeric, non-whitespace characters as delimiters

$ cat paths.txt
/foo/a/report.log
/foo/y/power.log
/foo/abc/errors.log

$ perl -ne 'print if /\/foo\/a\//' paths.txt
/foo/a/report.log

$ perl -ne 'print if m#/foo/a/#' paths.txt
/foo/a/report.log

$ perl -ne 'print if !m#/foo/a/#' paths.txt
/foo/y/power.log
/foo/abc/errors.log

Fixed string matching

$ # same as: grep -F 'a[5]' or awk 'index($0, "a[5]")'
$ # index returns matching position(starts at 0) and -1 if not found
$ echo 'int a[5]' | perl -ne 'print if index($_, "a[5]") != -1'
int a[5]

$ # however, string within double quotes gets interpolated, for ex
$ x='123'; echo "$x"
123
$ perl -e '$x=123; print "$x\n"'
123

$ # so, for commandline usage, better to pass string as environment variable
$ # they are accessible via the %ENV hash variable
$ perl -le 'print $ENV{PWD}'
/home/learnbyexample
$ perl -le 'print $ENV{SHELL}'
/bin/bash

$ echo 'a#$%d' | perl -ne 'print if index($_, "#$%") != -1'
$ echo 'a#$%d' | s='#$%' perl -ne 'print if index($_, $ENV{s}) != -1'
a#$%d
  • return value is useful to match at specific position
  • for ex: at start/end of line
$ cat eqns.txt
a=b,a-b=c,c*d
a+b,pi=3.14,5e12
i*(t+9-g)/8,4-a+b

$ # start of line
$ # same as: s='a+b' awk 'index($0, ENVIRON["s"])==1' eqns.txt
$ s='a+b' perl -ne 'print if index($_, $ENV{s})==0' eqns.txt
a+b,pi=3.14,5e12

$ # end of line
$ # length function returns number of characters, by default acts on $_
$ s='a+b' perl -ne '$pos = length() - length($ENV{s}) - 1;
                    print if index($_, $ENV{s}) == $pos' eqns.txt
i*(t+9-g)/8,4-a+b

Line number based filtering

$ # same as: head -n2 poem.txt | tail -n1
$ # or sed -n '2p' or awk 'NR==2'
$ perl -ne 'print if $.==2' poem.txt
Violets are blue,

$ # print 2nd and 4th line
$ # same as: sed -n '2p; 4p' or awk 'NR==2 || NR==4'
$ perl -ne 'print if $.==2 || $.==4' poem.txt
Violets are blue,
And so are you.

$ # same as: tail -n1 poem.txt
$ # or sed -n '$p' or awk 'END{print}'
$ perl -ne 'print if eof' poem.txt
And so are you.
  • for large input, use exit to avoid unnecessary record processing
$ # can also use: perl -ne 'print and exit if $.==234'
$ seq 14323 14563435 | perl -ne 'if($.==234){print; exit}'
14556

$ # sample time comparison
$ time seq 14323 14563435 | perl -ne 'if($.==234){print; exit}' > /dev/null
real    0m0.005s
$ time seq 14323 14563435 | perl -ne 'print if $.==234' > /dev/null
real    0m2.439s

$ # mimicking head command, same as: head -n3 or sed '3q'
$ seq 14 25 | perl -pe 'exit if $.>3'
14
15
16

$ # same as: sed '3Q'
$ seq 14 25 | perl -pe 'exit if $.==3'
14
15
$ # same as: sed -n '3,5p' or awk 'NR>=3 && NR<=5'
$ # in this context, the range is compared against $.
$ seq 14 25 | perl -ne 'print if 3..5'
16
17
18

$ # selecting from particular line number to end of input
$ # same as: sed -n '10,$p' or awk 'NR>=10'
$ seq 14 25 | perl -ne 'print if $.>=10'
23
24
25

Field processing

  • -a option will auto-split each input record based on one or more continuous white-space, similar to default behavior in awk
  • Special variable array @F will contain all the elements, indexing starts from 0
    • negative indexing is also supported, -1 gives last element, -2 gives last-but-one and so on
    • see Array operations section for examples on array usage
$ cat fruits.txt
fruit   qty
apple   42
banana  31
fig     90
guava   6

$ # print only first field, indexing starts from 0
$ # same as: awk '{print $1}' fruits.txt
$ perl -lane 'print $F[0]' fruits.txt
fruit
apple
banana
fig
guava

$ # print only second field
$ # same as: awk '{print $2}' fruits.txt
$ perl -lane 'print $F[1]' fruits.txt
qty
42
31
90
6
  • by default, leading and trailing whitespaces won't be considered when splitting the input record
    • mimicking awk's default behavior
$ printf ' a    ate b\tc   \n'
 a    ate b     c
$ printf ' a    ate b\tc   \n' | perl -lane 'print $F[0]'
a
$ printf ' a    ate b\tc   \n' | perl -lane 'print $F[-1]'
c

$ # number of fields, $#F gives index of last element - so add 1
$ echo '1 a 7' | perl -lane 'print $#F+1'
3
$ printf ' a    ate b\tc   \n' | perl -lane 'print $#F+1'
4
$ # or use scalar context
$ echo '1 a 7' | perl -lane 'print scalar @F'
3

Field comparison

  • for numeric context, Perl automatically tries to convert the string to number, ignoring white-space
  • for string comparison, use eq for ==, ne for != and so on
$ # if first field exactly matches the string 'apple'
$ # same as: awk '$1=="apple"{print $2}' fruits.txt
$ perl -lane 'print $F[1] if $F[0] eq "apple"' fruits.txt
42

$ # print first field if second field > 35 (excluding header)
$ # same as: awk 'NR>1 && $2>35{print $1}' fruits.txt
$ perl -lane 'print $F[0] if $F[1]>35 && $.>1' fruits.txt
apple
fig

$ # print header and lines with qty < 35
$ # same as: awk 'NR==1 || $2<35' fruits.txt
$ perl -ane 'print if $F[1]<35 || $.==1' fruits.txt
fruit   qty
banana  31
guava   6

$ # if first field does NOT contain 'a'
$ # same as: awk '$1 !~ /a/' fruits.txt
$ perl -ane 'print if $F[0] !~ /a/' fruits.txt
fruit   qty
fig     90

Specifying different input field separator

  • by using -F command line option
    • See also split section, which covers details about trailing empty fields
$ # second field where input field separator is :
$ # same as: awk -F: '{print $2}'
$ echo 'foo:123:bar:789' | perl -F: -lane 'print $F[1]'
123

$ # last field, same as: awk -F: '{print $NF}'
$ echo 'foo:123:bar:789' | perl -F: -lane 'print $F[-1]'
789
$ # second last field, same as: awk -F: '{print $(NF-1)}'
$ echo 'foo:123:bar:789' | perl -F: -lane 'print $F[-2]'
bar

$ # second and last field
$ # other ways to print more than 1 element will be covered later
$ echo 'foo:123:bar:789' | perl -F: -lane 'print "$F[1] $F[-1]"'
123 789

$ # use quotes to avoid clashes with shell special characters
$ echo 'one;two;three;four' | perl -F';' -lane 'print $F[2]'
three
  • Regular expressions based input field separator
$ # same as: awk -F'[0-9]+' '{print $2}'
$ echo 'Sample123string54with908numbers' | perl -F'\d+' -lane 'print $F[1]'
string

$ # first field will be empty as there is nothing before '{'
$ # same as: awk -F'[{}= ]+' '{print $1}'
$ # \x20 is space character, can't use literal space within [] when using -F
$ echo '{foo}   bar=baz' | perl -F'[{}=\x20]+' -lane 'print $F[0]'

$ echo '{foo}   bar=baz' | perl -F'[{}=\x20]+' -lane 'print $F[1]'
foo
$ echo '{foo}   bar=baz' | perl -F'[{}=\x20]+' -lane 'print $F[2]'
bar
  • empty argument to -F will split the input record character wise
$ # same as: gawk -v FS= '{print $1}'
$ echo 'apple' | perl -F -lane 'print $F[0]'
a
$ echo 'apple' | perl -F -lane 'print $F[1]'
p
$ echo 'apple' | perl -F -lane 'print $F[-1]'
e

$ # use -C option when dealing with unicode characters
$ # S will turn on UTF-8 for stdin/stdout/stderr streams
$ printf 'hi๐Ÿ‘ how are you?' | perl -CS -F -lane 'print $F[2]'
๐Ÿ‘

Specifying different output field separator

  • Method 1: use $, to change separator between print arguments
    • could be remembered easily by noting that , is used to separate print arguments
$ # by default, the various arguments are concatenated
$ echo 'foo:123:bar:789' | perl -F: -lane 'print $F[1], $F[-1]'
123789

$ # change $, if different separator is needed
$ echo 'foo:123:bar:789' | perl -F: -lane '$,=" "; print $F[1], $F[-1]'
123 789
$ echo 'foo:123:bar:789' | perl -F: -lane '$,="-"; print $F[1], $F[-1]'
123-789

$ # argument can be array too
$ echo 'foo:123:bar:789' | perl -F: -lane '$,="-"; print @F[1,-1]'
123-789
$ echo 'foo:123:bar:789' | perl -F: -lane '$,=" - "; print @F'
foo - 123 - bar - 789
  • Method 2: use join
$ echo 'foo:123:bar:789' | perl -F: -lane 'print join "-", $F[1], $F[-1]'
123-789

$ echo 'foo:123:bar:789' | perl -F: -lane 'print join "-", @F[1,-1]'
123-789

$ echo 'foo:123:bar:789' | perl -F: -lane 'print join " - ", @F'
foo - 123 - bar - 789
  • Method 3: use $" to change separator when array is interpolated, default is space character
    • could be remembered easily by noting that interpolation happens within double quotes
$ # default is space
$ echo 'foo:123:bar:789' | perl -F: -lane 'print "@F[1,-1]"'
123 789

$ echo 'foo:123:bar:789' | perl -F: -lane '$"="-"; print "@F[1,-1]"'
123-789

$ echo 'foo:123:bar:789' | perl -F: -lane '$"=","; print "@F"'
foo,123,bar,789
  • use BEGIN if same separator is to be used for all lines
    • statements inside BEGIN are executed before processing any input text
$ # can also use: perl -lane 'BEGIN{$"=","} print "@F"' fruits.txt
$ perl -lane 'BEGIN{$,=","} print @F' fruits.txt
fruit,qty
apple,42
banana,31
fig,90
guava,6

Changing record separators

  • Before seeing examples for changing record separators, let's cover a detail about contents of input record and use of -l option
  • See also perldoc - chomp
$ # input record includes the record separator as well
$ # can also use: perl -pe 's/$/ 123/'
$ echo 'foo' | perl -pe 's/\n/ 123\n/'
foo 123

$ # this example shows better use case
$ # similar to paste -sd but with ability to use multi-character delimiter
$ seq 5 | perl -pe 's/\n/ : / if !eof'
1 : 2 : 3 : 4 : 5

$ # -l option will chomp off the record separator (among other things)
$ echo 'foo' | perl -l -pe 's/\n/ 123\n/'
foo

$ # -l also sets output record separator which gets added to print statements
$ # ORS gets input record separator value if no argument is passed to -l
$ # hence the newline automatically getting added for print in this example
$ perl -lane 'print $F[0] if $F[1]<35 && $.>1' fruits.txt
banana
guava

Input record separator

  • by default, newline character is used as input record separator
  • use $/ to specify a different input record separator
    • unlike awk, only string can be used, no regular expressions
  • for single character separator, can also use -0 command line option which accepts octal/hexadecimal value as argument
  • if -l option is also used
    • input record separator will be chomped from input record
    • in addition, if argument is not passed to -l, output record separator will get whatever is current value of input record separator
    • so, order of -l, -0 and/or $/ usage becomes important
$ s='this is a sample string'

$ # space as input record separator, printing all records
$ # same as: awk -v RS=' ' '{print NR, $0}'
$ # ORS is newline as -l is used before $/ gets changed
$ printf "$s" | perl -lne 'BEGIN{$/=" "} print "$. $_"'
1 this
2 is
3 a
4 sample
5 string

$ # print all records containing 'a'
$ # same as: awk -v RS=' ' '/a/'
$ printf "$s" | perl -l -0040 -ne 'print if /a/'
a
sample

$ # if the order is changed, ORS will be space, not newline
$ printf "$s" | perl -0040 -l -ne 'print if /a/'
a sample 
  • -0 option used without argument will use the ASCII NUL character as input record separator
$ printf 'foo\0bar\0' | cat -A
foo^@bar^@$
$ printf 'foo\0bar\0' | perl -l -0 -ne 'print'
foo
bar

$ # could be golfed to: perl -l -0pe ''
$ # but dont use `-l0` as `0` will be treated as argument to `-l`
  • values -0400 to -0777 will cause entire file to be slurped
    • idiomatically, -0777 is used
$ # s modifier allows . to match newline as well
$ perl -0777 -pe 's/red.*are //s' poem.txt
Roses are you.

$ # replace first newline with '. '
$ perl -0777 -pe 's/\n/. /' greeting.txt
Hello there. Have a safe journey
  • for paragraph mode (two more more consecutive newline characters), use -00 or assign empty string to $/

Consider the below sample file

$ cat sample.txt
Hello World

Good day
How are you

Just do-it
Believe it

Today is sunny
Not a bit funny
No doubt you like it too

Much ado about nothing
He he he
  • again, input record will have the separator too and using -l will chomp it
  • however, if more than two consecutive newline characters separate the paragraphs, only two newlines will be preserved and the rest discarded
    • use $/="\n\n" to avoid this behavior
$ # print all paragraphs containing 'it'
$ # same as: awk -v RS= -v ORS='\n\n' '/it/' sample.txt
$ perl -00 -ne 'print if /it/' sample.txt
Just do-it
Believe it

Today is sunny
Not a bit funny
No doubt you like it too

$ # based on number of lines in each paragraph
$ perl -F'\n' -00 -ane 'print if $#F==0' sample.txt
Hello World

$ # unlike awk -F'\n' -v RS= -v ORS='\n\n' 'NF==2 && /do/' sample.txt
$ # there wont be empty line at end because input file didn't have it
$ perl -F'\n' -00 -ane 'print if $#F==1 && /do/' sample.txt
Just do-it
Believe it

Much ado about nothing
He he he
  • Re-structuring paragraphs
$ # same as: awk 'BEGIN{FS="\n"; OFS=". "; RS=""; ORS="\n\n"} {$1=$1} 1'
$ perl -F'\n' -00 -ane 'print join ". ", @F; print "\n\n"' sample.txt
Hello World

Good day. How are you

Just do-it. Believe it

Today is sunny. Not a bit funny. No doubt you like it too

Much ado about nothing. He he he
  • multi-character separator
$ cat report.log
blah blah
Error: something went wrong
more blah
whatever
Error: something surely went wrong
some text
some more text
blah blah blah

$ # number of records, same as: awk -v RS='Error:' 'END{print NR}'
$ perl -lne 'BEGIN{$/="Error:"} print $. if eof' report.log
3
$ # print first record
$ perl -lne 'BEGIN{$/="Error:"} print if $.==1' report.log
blah blah

$ # same as: awk -v RS='Error:' '/surely/{print RS $0}' report.log
$ perl -lne 'BEGIN{$/="Error:"} print "$/$_" if /surely/' report.log
Error: something surely went wrong
some text
some more text
blah blah blah
  • Joining lines based on specific end of line condition
$ cat msg.txt
Hello there.
It will rain to-
day. Have a safe
and pleasant jou-
rney.

$ # same as: awk -v RS='-\n' -v ORS= '1' msg.txt
$ # can also use: perl -pe 's/-\n//' msg.txt
$ perl -pe 'BEGIN{$/="-\n"} chomp' msg.txt
Hello there.
It will rain today. Have a safe
and pleasant journey.

Output record separator

  • one way is to use $\ to specify a different output record separator
    • by default it doesn't have a value
$ # note that despite $\ not having a value, output has newlines
$ # because the input record still has the input record separator
$ seq 3 | perl -ne 'print'
1
2
3
$ # same as: awk -v ORS='\n\n' '{print $0}'
$ seq 3 | perl -ne 'BEGIN{$\="\n"} print'
1

2

3

$ seq 2 | perl -ne 'BEGIN{$\="---\n"} print'
1
---
2
---
  • dynamically changing output record separator
$ # same as: awk '{ORS = NR%2 ? " " : "\n"} 1'
$ # note the use of -l to chomp the input record separator
$ seq 6 | perl -lpe '$\ = $.%2 ? " " : "\n"'
1 2
3 4
5 6

$ # -l also sets the output record separator
$ # but gets overridden by $\
$ seq 6 | perl -lpe '$\ = $.%3 ? "-" : "\n"'
1-2-3
4-5-6
  • passing argument to -l to set output record separator
$ seq 8 | perl -ne 'print if /[24]/'
2
4

$ # null separator, note how -l also chomps input record separator
$ seq 8 | perl -l0 -ne 'print if /[24]/' | cat -A
2^@4^@

$ # comma separator, won't have a newline at end
$ seq 8 | perl -l054 -ne 'print if /[24]/'
2,4,

$ # to add a final newline to output, use END and printf
$ seq 8 | perl -l054 -ne 'print if /[24]/; END{printf "\n"}'
2,4,

Multiline processing

  • Processing consecutive lines
$ cat poem.txt
Roses are red,
Violets are blue,
Sugar is sweet,
And so are you.

$ # match two consecutive lines
$ # same as: awk 'p~/are/ && /is/{print p ORS $0} {p=$0}' poem.txt
$ perl -ne 'print $p,$_ if /is/ && $p=~/are/; $p=$_' poem.txt
Violets are blue,
Sugar is sweet,
$ # if only the second line is needed, same as: awk 'p~/are/ && /is/; {p=$0}'
$ perl -ne 'print if /is/ && $p=~/are/; $p=$_' poem.txt
Sugar is sweet,

$ # print if line matches a condition as well as condition for next 2 lines
$ # same as: awk 'p2~/red/ && p1~/blue/ && /is/{print p2} {p2=p1; p1=$0}'
$ perl -ne 'print $p2 if /is/ && $p1=~/blue/ && $p2=~/red/;
            $p2=$p1; $p1=$_' poem.txt
Roses are red,

Consider this sample input file

$ cat range.txt
foo
BEGIN
1234
6789
END
bar
BEGIN
a
b
c
END
baz
  • extracting lines around matching line
  • how $n && $n-- works:
    • need to note that right hand side of && is processed only if left hand side is true
    • so for example, if initially $n=2, then we get
      • 2 && 2; $n=1 - evaluates to true
      • 1 && 1; $n=0 - evaluates to true
      • 0 && - evaluates to false ... no decrementing $n and hence will be false until $n is re-assigned non-zero value
$ # similar to: grep --no-group-separator -A1 'BEGIN' range.txt
$ # same as: awk '/BEGIN/{n=2} n && n--' range.txt
$ perl -ne '$n=2 if /BEGIN/; print if $n && $n--' range.txt
BEGIN
1234
BEGIN
a

$ # print only line after matching line, same as: awk 'n && n--; /BEGIN/{n=1}'
$ perl -ne 'print if $n && $n--; $n=1 if /BEGIN/' range.txt
1234
a

$ # generic case: print nth line after match, awk 'n && !--n; /BEGIN/{n=3}'
$ perl -ne 'print if $n && !--$n; $n=3 if /BEGIN/' range.txt
END
c

$ # print second line prior to matched line
$ # same as: awk '/END/{print p2} {p2=p1; p1=$0}' range.txt
$ perl -ne 'print $p2 if /END/; $p2=$p1; $p1=$_' range.txt
1234
b

$ # use reversing trick for generic case of nth line before match
$ # same as: tac range.txt | awk 'n && !--n; /END/{n=3}' | tac
$ tac range.txt | perl -ne 'print if $n && !--$n; $n=3 if /END/' | tac
BEGIN
a

Further Reading


Perl regular expressions

  • examples to showcase some of the features not present in ERE and modifiers not available in sed's substitute command
  • many features of Perl regular expressions will NOT be covered, but external links will be provided wherever relevant
  • examples/descriptions based only on ASCII encoding

sed vs perl subtle differences

  • input record separator being part of input record
$ echo 'foo:123:bar:789' | sed -E 's/[^:]+$/xyz/'
foo:123:bar:xyz
$ # newline character gets replaced too as shown by shell prompt
$ echo 'foo:123:bar:789' | perl -pe 's/[^:]+$/xyz/'
foo:123:bar:xyz$
$ # simple workaround is to use -l option
$ echo 'foo:123:bar:789' | perl -lpe 's/[^:]+$/xyz/'
foo:123:bar:xyz

$ # of course it has uses too
$ seq 10 | paste -sd, | sed 's/,/ : /g'
1 : 2 : 3 : 4 : 5 : 6 : 7 : 8 : 9 : 10
$ seq 10 | perl -pe 's/\n/ : / if !eof'
1 : 2 : 3 : 4 : 5 : 6 : 7 : 8 : 9 : 10
  • how much does * match?
$ # sed will choose biggest match
$ echo ',baz,,xyz,,,' | sed 's/[^,]*/A/g'
A,A,A,A,A,A,A
$ echo 'foo,baz,,xyz,,,123' | sed 's/[^,]*/A/g'
A,A,A,A,A,A,A

$ # but perl will match both empty and non-empty strings
$ echo ',baz,,xyz,,,' | perl -lpe 's/[^,]*/A/g'
A,AA,A,AA,A,A,A
$ echo 'foo,baz,,xyz,,,123' | perl -lpe 's/[^,]*/A/g'
AA,AA,A,AA,A,A,AA

$ echo '42,789' | sed 's/[0-9]*/"&"/g'
"42","789"
$ echo '42,789' | perl -lpe 's/\d*/"$&"/g'
"42""","789"""
$ echo '42,789' | perl -lpe 's/\d+/"$&"/g'
"42","789"
  • backslash sequences inside character classes
$ # \w would simply match w
$ echo 'w=y-x+9*3' | sed 's/[\w=]//g'
y-x+9*3

$ # \w would match any word character
$ echo 'w=y-x+9*3' | perl -pe 's/[\w=]//g'
-+*
$ echo 'foo:123:bar:baz' | sed 's/:/-/2'
foo:123-bar:baz

$ echo 'foo:123:bar:baz' | perl -pe 's/:/-/2'
Unknown regexp modifier "/2" at -e line 1, at end of line
Execution of -e aborted due to compilation errors.
$ # e modifier covered later, allows Perl code in replacement section
$ echo 'foo:123:bar:baz' | perl -pe '$c=0; s/:/++$c==2 ? "-" : $&/ge'
foo:123-bar:baz
$ # or use non-greedy and \K(covered later), same as: sed 's/and/-/3'
$ echo 'foo and bar and baz land good' | perl -pe 's/(and.*?){2}\Kand/-/'
foo and bar and baz l- good

$ # emulating GNU sed's number+g modifier
$ a='456:foo:123:bar:789:baz
x:y:z:a:v:xc:gf'
$ echo "$a" | sed 's/:/-/3g'
456:foo:123-bar-789-baz
x:y:z-a-v-xc-gf
$ echo "$a" | perl -pe '$c=0; s/:/++$c<3 ? $& : "-"/ge'
456:foo:123-bar-789-baz
x:y:z-a-v-xc-gf
$ seq 2 | sed 's/$x/xyz/'
1
2

$ # uninitialized variable, same applies for: perl -pe 's/@a/xyz/'
$ seq 2 | perl -pe 's/$x/xyz/'
xyz1
xyz2
$ # initialized variable
$ seq 2 | perl -pe '$x=2; s/$x/xyz/'
1
xyz

$ # using single quotes as delimiter won't interpolate
$ # not usable for one-liners given shell's own single/double quotes behavior
$ cat sub_sq.pl
s'$x'xyz'
$ seq 2 | perl -p sub_sq.pl
1
2
$ # use $& to refer entire matched string in replacement section
$ echo 'hello world' | sed 's/.*/"&"/'
"hello world"
$ echo 'hello world' | perl -pe 's/.*/"&"/'
"&"
$ echo 'hello world' | perl -pe 's/.*/"$&"/'
"hello world"

$ # use \1, \2, etc or \g1, \g2 etc for back referencing in search section
$ # use $1, $2, etc in replacement section
$ echo 'a a a walking for for a cause' | perl -pe 's/\b(\w+)( \1)+\b/$1/g'
a walking for a cause

Backslash sequences

  • \d for [0-9]
  • \s for [ \t\r\n\f\v]
  • \h for [ \t]
  • \n for newline character
  • \D, \S, \H, \N respectively for their opposites
  • See perldoc - perlrecharclass for full list and details
$ # same as: sed -E 's/[0-9]+/xxx/g'
$ echo 'like 42 and 37' | perl -pe 's/\d+/xxx/g'
like xxx and xxx

$ # same as: sed -E 's/[^0-9]+/xxx/g'
$ # note again the use of -l because of newline in input record
$ echo 'like 42 and 37' | perl -lpe 's/\D+/xxx/g'
xxx42xxx37

$ # no need -l here as \h won't match newline
$ echo 'a b c  ' | perl -pe 's/\h*$//'
a b c

Non-greedy quantifier

$ # greedy matching
$ echo 'foo and bar and baz land good' | perl -pe 's/foo.*and//'
 good
$ # non-greedy matching
$ echo 'foo and bar and baz land good' | perl -pe 's/foo.*?and//'
 bar and baz land good

$ echo '12342789' | perl -pe 's/\d{2,5}//'
789
$ echo '12342789' | perl -pe 's/\d{2,5}?//'
342789

$ # for single character, non-greedy is not always needed
$ echo '123:42:789:good:5:bad' | perl -pe 's/:.*?:/:/'
123:789:good:5:bad
$ echo '123:42:789:good:5:bad' | perl -pe 's/:[^:]*:/:/'
123:789:good:5:bad

$ # just like greedy, overall matching is considered, as minimal as possible
$ echo '123:42:789:good:5:bad' | perl -pe 's/:.*?:[a-z]/:/'
123:ood:5:bad
$ echo '123:42:789:good:5:bad' | perl -pe 's/:.*:[a-z]/:/'
123:ad

Lookarounds

  • Ability to add if conditions to match before/after required pattern
  • There are four types
    • positive lookahead (?=
    • negative lookahead (?!
    • positive lookbehind (?<=
    • negative lookbehind (?<!
  • One way to remember is that behind uses < and negative uses ! instead of =

The string matched by lookarounds are like word boundaries and anchors, do not constitute as part of matched string. They are termed as zero-width patterns

  • positive lookbehind (?<=
$ s='foo=5, bar=3; x=83, y=120'

$ # extract all digit sequences
$ echo "$s" | perl -lne 'print join " ", /\d+/g'
5 3 83 120

$ # extract digits only if preceded by two lowercase alphabets and =
$ # note how the characters matched by lookbehind isn't part of output
$ echo "$s" | perl -lne 'print join " ", /(?<=[a-z]{2}=)\d+/g'
5 3

$ # this can be done without lookbehind too
$ # taking advantage of behavior of //g when () is used
$ echo "$s" | perl -lne 'print join " ", /[a-z]{2}=(\d+)/g'
5 3

$ # change all digits preceded by single lowercase alphabet and =
$ echo "$s" | perl -pe 's/(?<=\b[a-z]=)\d+/42/g'
foo=5, bar=3; x=42, y=42
$ # alternate, without lookbehind
$ echo "$s" | perl -pe 's/(\b[a-z]=)\d+/${1}42/g'
foo=5, bar=3; x=42, y=42
  • positive lookahead (?=
$ s='foo=5, bar=3; x=83, y=120'

$ # extract digits that end with ,
$ # can also use: perl -lne 'print join ":", /(\d+),/g'
$ echo "$s" | perl -lne 'print join ":", /\d+(?=,)/g'
5:83

$ # change all digits ending with ,
$ # can also use: perl -pe 's/\d+,/42,/g'
$ echo "$s" | perl -pe 's/\d+(?=,)/42/g'
foo=42, bar=3; x=42, y=120

$ # both lookbehind and lookahead
$ echo 'foo,,baz,,,xyz' | perl -pe 's/,,/,NA,/g'
foo,NA,baz,NA,,xyz
$ echo 'foo,,baz,,,xyz' | perl -pe 's/(?<=,)(?=,)/NA/g'
foo,NA,baz,NA,NA,xyz
  • negative lookbehind (?<! and negative lookahead (?!
$ # change foo if not preceded by _
$ # note how 'foo' at start of line is matched as well
$ echo 'foo _foo 1foo' | perl -pe 's/(?<!_)foo/baz/g'
baz _foo 1baz

$ # join each line in paragraph by replacing newline character
$ # except the one at end of paragraph
$ perl -00 -pe 's/\n(?!$)/. /g' sample.txt
Hello World

Good day. How are you

Just do-it. Believe it

Today is sunny. Not a bit funny. No doubt you like it too

Much ado about nothing. He he he
$ # lookbehind is checking start of line (0 characters) and comma(1 character)
$ echo ',baz,,,xyz,,' | perl -pe 's/(?<=^|,)(?=,|$)/NA/g'
Variable length lookbehind not implemented in regex m/(?<=^|,)(?=,|$)/ at -e line 1.

$ # \K helps in such cases
$ echo ',baz,,,xyz,,' | perl -pe 's/(^|,)\K(?=,|$)/NA/g'
NA,baz,NA,NA,xyz,NA,NA
  • some more examples
$ # helps to avoid , within fields for field splitting
$ # note how the quotes are still part of field value
$ echo '"foo","12,34","good"' | perl -F'/"\K,(?=")/' -lane 'print $F[1]'
"12,34"
$ echo '"foo","12,34","good"' | perl -F'/"\K,(?=")/' -lane 'print $F[2]'
"good"

$ # capture groups inside lookarounds
$ echo 'a b c d e' | perl -pe 's/(\H+\h+)(?=(\H+)\h)/$1$2\n/g'
a b
b c
c d
d e
$ # generic formula :)
$ echo 'a b c d e' | perl -pe 's/(\H+\h+)(?=(\H+(\h+\H+){1})\h)/$1$2\n/g'
a b c
b c d
c d e
$ echo 'a b c d e' | perl -pe 's/(\H+\h+)(?=(\H+(\h+\H+){2})\h)/$1$2\n/g'
a b c d
b c d e

Further Reading


Ignoring specific matches

  • A useful construct is (*SKIP)(*F) which allows to discard matches not needed
    • regular expression which should be discarded is written first, (*SKIP)(*F) is appended and then required regular expression is added after |
$ s='Car Bat cod12 Map foo_bar'
$ # all words except those starting with 'c' or 'C'
$ echo "$s" | perl -lne 'print join "\n", /\bc\w+(*SKIP)(*F)|\w+/gi'
Bat
Map
foo_bar

$ s='I like "mango" and "guava"'
$ # all words except those surrounded by double quotes
$ echo "$s" | perl -lne 'print join "\n", /"[^"]+"(*SKIP)(*F)|\w+/g'
I
like
and
$ # change words except those surrounded by double quotes
$ echo "$s" | perl -pe 's/"[^"]+"(*SKIP)(*F)|\w+/\U$&/g'
I LIKE "mango" AND "guava"
  • for line based decisions, simple if-else might help
$ cat nums.txt
42
-2
10101
-3.14
-75

$ # change +ve number to -ve and vice versa
$ # note that empty regexp will reuse last successfully matched regexp
$ perl -pe '/^-/ ? s/// : s/^/-/' nums.txt
-42
2
-10101
3.14
75

Further Reading


Special capture groups

  • \1, \2 etc only matches exact string
  • (?1), (?2) etc re-uses the regular expression itself
$ s='baz 2008-03-24 and 2012-08-12 foo 2016-03-25'
$ # (?1) refers to first capture group (\d{4}-\d{2}-\d{2})
$ echo "$s" | perl -pe 's/(\d{4}-\d{2}-\d{2}) and (?1)/XYZ/'
baz XYZ foo 2016-03-25

$ # using \1 won't work as the two dates are different
$ echo "$s" | perl -pe 's/(\d{4}-\d{2}-\d{2}) and \1//'
baz 2008-03-24 and 2012-08-12 foo 2016-03-25
$ s='Car Bat cod12 Map foo_bar'
$ # check what happens if ?: is not used
$ echo "$s" | perl -lne 'print join "\n", /(?:Bat|Map)(*SKIP)(*F)|\w+/gi'
Car
cod12
foo_bar

$ # using ?: helps to focus only on required capture groups
$ echo 'cod1 foo_bar' | perl -pe 's/(?:co|fo)\K(\w)(\w)/$2$1/g'
co1d fo_obar
$ # without ?: you'd need to remember all the other groups as well
$ echo 'cod1 foo_bar' | perl -pe 's/(co|fo)\K(\w)(\w)/$3$2/g'
co1d fo_obar
  • named capture groups (?<name>
    • for backreference, use \k<name>
    • accessible via %+ hash in replacement section
$ s='baz 2008-03-24 and 2012-08-12 foo 2016-03-25'
$ echo "$s" | perl -pe 's/(\d{4})-(\d{2})-(\d{2})/$3-$2-$1/g'
baz 24-03-2008 and 12-08-2012 foo 25-03-2016

$ # naming the capture groups might offer clarity
$ echo "$s" | perl -pe 's/(?<y>\d{4})-(?<m>\d{2})-(?<d>\d{2})/$+{d}-$+{m}-$+{y}/g'
baz 24-03-2008 and 12-08-2012 foo 25-03-2016
$ echo "$s" | perl -pe 's/(?<y>\d{4})-(?<m>\d{2})-(?<d>\d{2})/$+{m}-$+{d}-$+{y}/g'
baz 03-24-2008 and 08-12-2012 foo 03-25-2016

$ # and useful to transform different capture groups
$ s='"foo,bar",123,"x,y,z",42'
$ echo "$s" | perl -lpe 's/"(?<a>[^"]+)",|(?<a>[^,]+),/$+{a}|/g'
foo,bar|123|x,y,z|42
$ # can also use (?| branch reset
$ echo "$s" | perl -lpe 's/(?|"([^"]+)",|([^,]+),)/$1|/g'
foo,bar|123|x,y,z|42

Further Reading


Modifiers

  • some are already seen, like the g (global match) and i (case insensitive matching)
  • first up, the r modifier which returns the substitution result instead of modifying the variable it is acting upon
$ perl -e '$x="feed"; $y=$x=~s/e/E/gr; print "x=$x\ny=$y\n"'
x=feed
y=fEEd

$ # the r modifier is available for transliteration operator too
$ perl -e '$x="food"; $y=$x=~tr/a-z/A-Z/r; print "x=$x\ny=$y\n"'
x=food
y=FOOD
  • e modifier allows to use Perl code in replacement section instead of string
  • use ee if you need to construct a string and then apply evaluation
$ # replace numbers with their squares
$ echo '4 and 10' | perl -pe 's/\d+/$&*$&/ge'
16 and 100

$ # replace matched string with incremental value
$ echo '4 and 10 foo 57' | perl -pe 's/\d+/++$c/ge'
1 and 2 foo 3
$ # passing initial value
$ echo '4 and 10 foo 57' | c=100 perl -pe 's/\d+/$ENV{c}++/ge'
100 and 101 foo 102

$ # formatting string
$ echo 'a1-2-deed' | perl -lpe 's/[^-]+/sprintf "%04s", $&/ge'
00a1-0002-deed

$ # calling a function
$ echo 'food:12:explain:789' | perl -pe 's/\w+/length($&)/ge'
4:2:7:3

$ # applying another substitution to matched string
$ echo '"mango" and "guava"' | perl -pe 's/"[^"]+"/$&=~s|a|A|gr/ge'
"mAngo" and "guAvA"
  • multiline modifiers
$ # m modifier to match beginning/end of each line within multiline string
$ perl -00 -ne 'print if /^Believe/' sample.txt
$ perl -00 -ne 'print if /^Believe/m' sample.txt
Just do-it
Believe it

$ perl -00 -ne 'print if /funny$/' sample.txt
$ perl -00 -ne 'print if /funny$/m' sample.txt
Today is sunny
Not a bit funny
No doubt you like it too

$ # s modifier to allow . meta character to match newlines as well
$ perl -00 -ne 'print if /do.*he/' sample.txt
$ perl -00 -ne 'print if /do.*he/s' sample.txt
Much ado about nothing
He he he

Further Reading


Quoting metacharacters

  • part of regular expression can be surrounded within \Q and \E to prevent matching meta characters within that portion
    • however, $ and @ would still be interpolated as long as delimiter isn't single quotes
    • \E is optional if applying \Q till end of search expression
  • typical use case is string to be protected is already present in a variable, for ex: user input or result of another command
  • quotemeta will add a backslash to all characters other than \w characters
  • See also perldoc - Quoting metacharacters
$ # quotemeta in action
$ perl -le '$x="[a].b+c^"; print quotemeta $x'
\[a\]\.b\+c\^

$ # same as: s='a+b' perl -ne 'print if index($_, $ENV{s})==0' eqns.txt
$ s='a+b' perl -ne 'print if /^\Q$ENV{s}/' eqns.txt
a+b,pi=3.14,5e12

$ s='a+b' perl -pe 's/^\Q$ENV{s}/ABC/' eqns.txt
a=b,a-b=c,c*d
ABC,pi=3.14,5e12
i*(t+9-g)/8,4-a+b

$ s='a+b' perl -pe 's/\Q$ENV{s}\E.*,/ABC,/' eqns.txt
a=b,a-b=c,c*d
ABC,5e12
i*(t+9-g)/8,4-a+b
$ # q in action
$ perl -le '$x="[a].b+c^$@123"; print $x'
[a].b+c^123
$ perl -le '$x=q([a].b+c^$@123); print $x'
[a].b+c^$@123
$ perl -le '$x=q([a].b+c^$@123); print quotemeta $x'
\[a\]\.b\+c\^\$\@123

$ echo 'foo 123' | perl -pe 's/foo/$foo/'
 123
$ echo 'foo 123' | perl -pe 's/foo/q($foo)/e'
$foo 123
$ echo 'foo 123' | perl -pe 's/foo/q{$f)oo}/e'
$f)oo 123

$ # string saved in other variables do not need special attention
$ echo 'foo 123' | s='a$b' perl -pe 's/foo/$ENV{s}/'
a$b 123
$ echo 'foo 123' | perl -pe 's/foo/a$b/'
a 123

Matching position

$-[0] is the offset of the start of the last successful match

$+[0] is the offset into the string of the end of the entire match

$ cat poem.txt
Roses are red,
Violets are blue,
Sugar is sweet,
And so are you.

$ # starting position of match
$ perl -lne 'print "line: $., offset: $-[0]" if /are/' poem.txt
line: 1, offset: 6
line: 2, offset: 8
line: 4, offset: 7
$ # if offset is needed starting from 1 instead of 0
$ perl -lne 'print "line: $., offset: ",$-[0]+1 if /are/' poem.txt
line: 1, offset: 7
line: 2, offset: 9
line: 4, offset: 8

$ # ending position of match
$ perl -lne 'print "line: $., offset: $+[0]" if /are/' poem.txt
line: 1, offset: 9
line: 2, offset: 11
line: 4, offset: 10
  • for multiple matches, use while loop to go over all the matches
$ perl -lne 'print "$.:$&:$-[0]" while /is|so|are/g' poem.txt
1:are:6
2:are:8
3:is:6
4:so:4
4:are:7

Using modules

$ echo '34,17,6' | perl -F, -lane 'BEGIN{use List::Util qw(max)} print max @F'
34
$ # -M option provides a way to specify modules from command line
$ echo '34,17,6' | perl -MList::Util=max -F, -lane 'print max @F'
34
$ echo '34,17,6' | perl -MList::Util=sum0 -F, -lane 'print sum0 @F'
57
$ echo '34,17,6' | perl -MList::Util=product -F, -lane 'print product @F'
3468

$ s='1,2,3,4,5'
$ echo "$s" | perl -MList::Util=shuffle -F, -lane 'print join ",",shuffle @F'
5,3,4,1,2

$ s='3,b,a,c,d,1,d,c,2,3,1,b'
$ echo "$s" | perl -MList::MoreUtils=uniq -F, -lane 'print join ",",uniq @F'
3,b,a,c,d,1,2

$ echo 'foo 123 baz' | base64
Zm9vIDEyMyBiYXoK
$ echo 'foo 123 baz' | perl -MMIME::Base64 -ne 'print encode_base64 $_'
Zm9vIDEyMyBiYXoK
$ echo 'Zm9vIDEyMyBiYXoK' | perl -MMIME::Base64 -ne 'print decode_base64 $_'
foo 123 baz
  • a cool module O helps to convert one-liners to full fledged programs
    • similar to -o option for GNU awk
$ # command being deparsed is discussed in a later section
$ perl -MO=Deparse -ne 'if(!$#ARGV){$h{$_}=1; next}
            print if $h{$_}' colors_1.txt colors_2.txt
LINE: while (defined($_ = <ARGV>)) {
    unless ($#ARGV) {
        $h{$_} = 1;
        next;
    }
    print $_ if $h{$_};
}
-e syntax OK

$ perl -MO=Deparse -00 -ne 'print if /it/' sample.txt
BEGIN { $/ = ""; $\ = undef; }
LINE: while (defined($_ = <ARGV>)) {
    print $_ if /it/;
}
-e syntax OK

Further Reading


Two file processing

First, a bit about $#ARGV and hash variables

$ # $#ARGV can be used to know which file is being processed
$ perl -lne 'print $#ARGV' <(seq 2) <(seq 3) <(seq 1)
1
1
0
0
0
-1

$ # creating hash variable
$ # checking if a key is present using exists
$ # or if value is known to evaluate to true
$ perl -le '$h{"a"}=5; $h{"b"}=0; $h{1}="abc";
            print "key:a value=", $h{"a"};
            print "key:b present" if exists $h{"b"};
            print "key:1 present" if $h{1}'
key:a value=5
key:b present
key:1 present

Comparing whole lines

Consider the following test files

$ cat colors_1.txt
Blue
Brown
Purple
Red
Teal
Yellow

$ cat colors_2.txt
Black
Blue
Green
Red
White
  • For two files as input, $#ARGV will be 0 only when first file is being processed
  • Using next will skip rest of code
  • entire line is used as key
$ # common lines
$ # note that all duplicates matching in second file would get printed
$ # same as: grep -Fxf colors_1.txt colors_2.txt
$ # same as: awk 'NR==FNR{a[$0]; next} $0 in a' colors_1.txt colors_2.txt
$ perl -ne 'if(!$#ARGV){$h{$_}=1; next}
            print if $h{$_}' colors_1.txt colors_2.txt
Blue
Red
$ # can also use: perl -ne '!$#ARGV ? $h{$_}=1 : $h{$_} && print'

$ # lines from colors_2.txt not present in colors_1.txt
$ # same as: grep -vFxf colors_1.txt colors_2.txt
$ # same as: awk 'NR==FNR{a[$0]; next} !($0 in a)' colors_1.txt colors_2.txt
$ perl -ne 'if(!$#ARGV){$h{$_}=1; next}
            print if !$h{$_}' colors_1.txt colors_2.txt
Black
Green
White
  • alternative constructs
  • <FILEHANDLE> reads line(s) from the specified file
    • defaults to current file argument(includes stdin as well), so <> can be used as shortcut
    • <STDIN> will read only from stdin, there are also predefined handles for stdout/stderr
    • in list context, all the lines would be read
    • See perldoc - I/O Operators for details
$ # using if-else instead of next
$ perl -ne 'if(!$#ARGV){ $h{$_}=1 }
            else{ print if $h{$_} }' colors_1.txt colors_2.txt
Blue
Red

$ # read all lines of first file in BEGIN block
$ # <> reads a line from current file argument
$ # eof will ensure only first file is read
$ perl -ne 'BEGIN{ $h{<>}=1 while !eof; }
            print if $h{$_}' colors_1.txt colors_2.txt
Blue
Red
$ # this method also allows to easily reset line number
$ # close ARGV is similar to calling nextfile in GNU awk
$ perl -ne 'BEGIN{ $h{<>}=1 while !eof; close ARGV}
            print "$.\n" if $h{$_}' colors_1.txt colors_2.txt
2
4

$ # or pass 1st file content as STDIN, $. will be automatically reset as well
$ perl -ne 'BEGIN{ $h{$_}=1 while <STDIN> }
            print if $h{$_}' <colors_1.txt colors_2.txt
Blue
Red

Comparing specific fields

Consider the sample input file

$ cat marks.txt
Dept    Name    Marks
ECE     Raj     53
ECE     Joel    72
EEE     Moi     68
CSE     Surya   81
EEE     Tia     59
ECE     Om      92
CSE     Amy     67
  • single field
  • For ex: only first field comparison instead of entire line as key
$ cat list1
ECE
CSE

$ # extract only lines matching first field specified in list1
$ # same as: awk 'NR==FNR{a[$1]; next} $1 in a' list1 marks.txt
$ perl -ane 'if(!$#ARGV){ $h{$F[0]}=1 }
             else{ print if $h{$F[0]} }' list1 marks.txt
ECE     Raj     53
ECE     Joel    72
CSE     Surya   81
ECE     Om      92
CSE     Amy     67

$ # if header is needed as well
$ # same as: awk 'NR==FNR{a[$1]; next} FNR==1 || $1 in a' list1 marks.txt
$ perl -ane 'if(!$#ARGV){ $h{$F[0]}=1; $.=0 }
             else{ print if $h{$F[0]} || $.==1 }' list1 marks.txt
Dept    Name    Marks
ECE     Raj     53
ECE     Joel    72
CSE     Surya   81
ECE     Om      92
CSE     Amy     67
  • multiple field comparison
$ cat list2
EEE Moi
CSE Amy
ECE Raj

$ # extract only lines matching both fields specified in list2
$ # same as: awk 'NR==FNR{a[$1,$2]; next} ($1,$2) in a' list2 marks.txt
$ # default SUBSEP(stored in $;) is \034, same as GNU awk
$ perl -ane 'if(!$#ARGV){ $h{$F[0],$F[1]}=1 }
             else{ print if $h{$F[0],$F[1]} }' list2 marks.txt
ECE     Raj     53
EEE     Moi     68
CSE     Amy     67

$ # or use multidimensional hash
$ perl -ane 'if(!$#ARGV){ $h{$F[0]}{$F[1]}=1 }
             else{ print if $h{$F[0]}{$F[1]} }' list2 marks.txt
ECE     Raj     53
EEE     Moi     68
CSE     Amy     67
  • field and value comparison
$ cat list3
ECE 70
EEE 65
CSE 80

$ # extract line matching Dept and minimum marks specified in list3
$ # same as: awk 'NR==FNR{d[$1]; m[$1]=$2; next} $1 in d && $3 >= m[$1]'
$ perl -ane 'if(!$#ARGV){ $d{$F[0]}=1; $m{$F[0]}=$F[1] }
             else{ print if $d{$F[0]} && $F[2]>=$m{$F[0]} }' list3 marks.txt
ECE     Joel    72
EEE     Moi     68
CSE     Surya   81
ECE     Om      92

Line number matching

$ # replace mth line in poem.txt with nth line from nums.txt
$ # assumes that there are at least n lines in nums.txt
$ # same as: awk -v m=3 -v n=2 'BEGIN{while(n-- > 0) getline s < "nums.txt"}
$ #                             FNR==m{$0=s} 1' poem.txt
$ m=3 n=2 perl -pe 'BEGIN{ $s=<> while $ENV{n}-- > 0; close ARGV}
                    $_=$s if $.==$ENV{m}' nums.txt poem.txt
Roses are red,
Violets are blue,
-2
And so are you.

$ # print line from fruits.txt if corresponding line from nums.txt is +ve number
$ # same as: awk -v file='nums.txt' '(getline num < file)==1 && num>0'
$ <nums.txt perl -ne 'print if <STDIN> > 0' fruits.txt
fruit   qty
banana  31

Creating new fields

  • Number of fields in input record can be changed by simply manipulating $#F
$ s='foo,bar,123,baz'

$ # reducing fields
$ # same as: awk -F, -v OFS=, '{NF=2} 1'
$ echo "$s" | perl -F, -lane '$,=","; $#F=1; print @F'
foo,bar

$ # creating new empty field(s)
$ # same as: awk -F, -v OFS=, '{NF=5} 1'
$ echo "$s" | perl -F, -lane '$,=","; $#F=4; print @F'
foo,bar,123,baz,

$ # assigning to field greater than $#F will create empty fields as needed
$ # same as: awk -F, -v OFS=, '{$7=42} 1'
$ echo "$s" | perl -F, -lane '$,=","; $F[6]=42; print @F'
foo,bar,123,baz,,,42
$ # adding a new 'Grade' field
$ # same as: awk 'BEGIN{OFS="\t"; split("DCBAS",g,//)}
$ #          {NF++; $NF = NR==1 ? "Grade" : g[int($(NF-1)/10)-4]} 1' marks.txt
$ perl -lane 'BEGIN{$,="\t"; @g = split //, "DCBAS"} $#F++;
              $F[-1] = $.==1 ? "Grade" : $g[$F[-2]/10 - 5]; print @F' marks.txt
Dept    Name    Marks   Grade
ECE     Raj     53      D
ECE     Joel    72      B
EEE     Moi     68      C
CSE     Surya   81      A
EEE     Tia     59      D
ECE     Om      92      S
CSE     Amy     67      C

$ # alternate syntax: array initialization and appending array element
$ perl -lane 'BEGIN{$,="\t"; @g = qw(D C B A S)}
              push @F, $.==1 ? "Grade" : $g[$F[-1]/10 - 5]; print @F' marks.txt
  • two file example
$ cat list4
Raj class_rep
Amy sports_rep
Tia placement_rep

$ # same as: awk -v OFS='\t' 'NR==FNR{r[$1]=$2; next}
$ #          {NF++; $NF = FNR==1 ? "Role" : $NF=r[$2]} 1' list4 marks.txt
$ perl -lane 'if(!$#ARGV){ $r{$F[0]}=$F[1]; $.=0 }
              else{ push @F, $.==1 ? "Role" : $r{$F[1]};
                    print join "\t", @F }' list4 marks.txt
Dept    Name    Marks   Role
ECE     Raj     53      class_rep
ECE     Joel    72
EEE     Moi     68
CSE     Surya   81
EEE     Tia     59      placement_rep
ECE     Om      92
CSE     Amy     67      sports_rep

Multiple file input

  • there is no gawk's FNR/BEGINFILE/ENDFILE equivalent in perl, but it can be worked around
$ # same as: awk 'FNR==2' poem.txt greeting.txt
$ # close ARGV will reset $. to 0
$ perl -ne 'print if $.==2; close ARGV if eof' poem.txt greeting.txt
Violets are blue,
Have a safe journey

$ # same as: awk 'BEGINFILE{print "file: "FILENAME} ENDFILE{print $0"\n------"}'
$ perl -lne 'print "file: $ARGV" if $.==1;
             print "$_\n------" and close ARGV if eof' poem.txt greeting.txt
file: poem.txt
And so are you.
------
file: greeting.txt
Have a safe journey
------
  • workaround for gawk's nextfile
  • to skip remaining lines from current file being processed and move on to next file
$ # same as: head -q -n1 and awk 'FNR>1{nextfile} 1'
$ perl -pe 'close ARGV if $.>=1' poem.txt greeting.txt fruits.txt
Roses are red,
Hello there
fruit   qty

$ # same as: awk 'tolower($1) ~ /red/{print FILENAME; nextfile}' *
$ perl -lane 'print $ARGV and close ARGV if $F[0] =~ /red/i' *
colors_1.txt
colors_2.txt

Dealing with duplicates

  • retain only first copy of duplicates
$ cat duplicates.txt
abc  7   4
food toy ****
abc  7   4
test toy 123
good toy ****

$ # whole line, same as: awk '!seen[$0]++' duplicates.txt
$ perl -ne 'print if !$seen{$_}++' duplicates.txt
abc  7   4
food toy ****
test toy 123
good toy ****

$ # particular column, same as: awk '!seen[$2]++' duplicates.txt
$ perl -ane 'print if !$seen{$F[1]}++' duplicates.txt
abc  7   4
food toy ****

$ # total count, same as: awk '!seen[$2]++{c++} END{print +c}' duplicates.txt
$ perl -lane '$c++ if !$seen{$F[1]}++; END{print $c+0}' duplicates.txt
2
  • if input is so large that integer numbers can overflow
  • See also perldoc - bignum
$ perl -le 'print "equal" if
   102**33==1922231403943151831696327756255167543169267432774552016351387451392'
$ # -M option here enables the use of bignum module
$ perl -Mbignum -le 'print "equal" if
   102**33==1922231403943151831696327756255167543169267432774552016351387451392'
equal

$ # avoid unnecessary counting altogether
$ # same as: awk '!($2 in seen); {seen[$2]}' duplicates.txt
$ perl -ane 'print if !$seen{$F[1]}; $seen{$F[1]}=1' duplicates.txt
abc  7   4
food toy ****

$ # same as: awk -M '!($2 in seen){c++} {seen[$2]} END{print +c}' duplicates.txt
$ perl -Mbignum -lane '$c++ if !$seen{$F[1]}; $seen{$F[1]}=1;
                       END{print $c+0}' duplicates.txt
2
$ # same as: awk '!seen[$2,$3]++' duplicates.txt
$ # default SUBSEP(stored in $;) is \034, same as GNU awk
$ perl -ane 'print if !$seen{$F[1],$F[2]}++' duplicates.txt
abc  7   4
food toy ****
test toy 123

$ # or use multidimensional key
$ perl -ane 'print if !$seen{$F[1]}{$F[2]}++' duplicates.txt
abc  7   4
food toy ****
test toy 123
  • retaining specific copy
$ # second occurrence of duplicate
$ # same as: awk '++seen[$2]==2' duplicates.txt
$ perl -ane 'print if ++$seen{$F[1]}==2' duplicates.txt
abc  7   4
test toy 123

$ # third occurrence of duplicate
$ # same as: awk '++seen[$2]==3' duplicates.txt
$ perl -ane 'print if ++$seen{$F[1]}==3' duplicates.txt
good toy ****

$ # retaining only last copy of duplicate
$ # reverse the input line-wise, retain first copy and then reverse again
$ # same as: tac duplicates.txt | awk '!seen[$2]++' | tac
$ tac duplicates.txt | perl -ane 'print if !$seen{$F[1]}++' | tac
abc  7   4
good toy ****
  • filtering based on duplicate count
  • allows to emulate uniq command for specific fields
$ # all duplicates based on 1st column
$ # same as: awk 'NR==FNR{a[$1]++; next} a[$1]>1' duplicates.txt duplicates.txt
$ perl -ane 'if(!$#ARGV){ $x{$F[0]}++ }
             else{ print if $x{$F[0]}>1 }' duplicates.txt duplicates.txt
abc  7   4
abc  7   4

$ # more than 2 duplicates based on 2nd column
$ # same as: awk 'NR==FNR{a[$2]++; next} a[$2]>2' duplicates.txt duplicates.txt
$ perl -ane 'if(!$#ARGV){ $x{$F[1]}++ }
             else{ print if $x{$F[1]}>2 }' duplicates.txt duplicates.txt
food toy ****
test toy 123
good toy ****

$ # only unique lines based on 3rd column
$ # same as: awk 'NR==FNR{a[$3]++; next} a[$3]==1' duplicates.txt duplicates.txt
$ perl -ane 'if(!$#ARGV){ $x{$F[2]}++ }
             else{ print if $x{$F[2]}==1 }' duplicates.txt duplicates.txt
test toy 123

Lines between two REGEXPs

  • This section deals with filtering lines bound by two REGEXPs (referred to as blocks)
  • For simplicity the two REGEXPs usually used in below examples are the strings BEGIN and END

All unbroken blocks

Consider the below sample input file, which doesn't have any unbroken blocks (i.e BEGIN and END are always present in pairs)

$ cat range.txt
foo
BEGIN
1234
6789
END
bar
BEGIN
a
b
c
END
baz
  • Extracting lines between starting and ending REGEXP
$ # include both starting/ending REGEXP
$ # same as: awk '/BEGIN/{f=1} f; /END/{f=0}' range.txt
$ perl -ne '$f=1 if /BEGIN/; print if $f; $f=0 if /END/' range.txt
BEGIN
1234
6789
END
BEGIN
a
b
c
END

$ # can also use: perl -ne 'print if /BEGIN/../END/' range.txt
$ # which is similar to sed -n '/BEGIN/,/END/p'
$ # but not suitable to extend for other cases
  • other variations
$ # same as: awk '/END/{f=0} f; /BEGIN/{f=1}' range.txt
$ perl -ne '$f=0 if /END/; print if $f; $f=1 if /BEGIN/' range.txt
1234
6789
a
b
c

$ # check out what these do:
$ perl -ne '$f=1 if /BEGIN/; $f=0 if /END/; print if $f' range.txt
$ perl -ne 'print if $f; $f=0 if /END/; $f=1 if /BEGIN/' range.txt
  • Extracting lines other than lines between the two REGEXPs
$ # same as: awk '/BEGIN/{f=1} !f; /END/{f=0}' range.txt
$ # can also use: perl -ne 'print if !(/BEGIN/../END/)' range.txt
$ perl -ne '$f=1 if /BEGIN/; print if !$f; $f=0 if /END/' range.txt
foo
bar
baz

$ # the other three cases would be
$ perl -ne '$f=0 if /END/; print if !$f; $f=1 if /BEGIN/' range.txt
$ perl -ne 'print if !$f; $f=1 if /BEGIN/; $f=0 if /END/' range.txt
$ perl -ne '$f=1 if /BEGIN/; $f=0 if /END/; print if !$f' range.txt

Specific blocks

  • Getting first block
$ # same as: awk '/BEGIN/{f=1} f; /END/{exit}' range.txt
$ perl -ne '$f=1 if /BEGIN/; print if $f; exit if /END/' range.txt
BEGIN
1234
6789
END

$ # use other tricks discussed in previous section as needed
$ # same as: awk '/END/{exit} f; /BEGIN/{f=1}' range.txt
$ perl -ne 'exit if /END/; print if $f; $f=1 if /BEGIN/' range.txt
1234
6789
  • Getting last block
$ # reverse input linewise, change the order of REGEXPs, finally reverse again
$ # same as: tac range.txt | awk '/END/{f=1} f; /BEGIN/{exit}' | tac
$ tac range.txt | perl -ne '$f=1 if /END/; print if $f; exit if /BEGIN/' | tac
BEGIN
a
b
c
END

$ # or, save the blocks in a buffer and print the last one alone
$ # same as: awk '/4/{f=1; b=$0; next} f{b=b ORS $0} /6/{f=0} END{print b}'
$ seq 30 | perl -ne 'if(/4/){$f=1; $b=$_; next}
                     $b.=$_ if $f; $f=0 if /6/; END{print $b}'
24
25
26
  • Getting blocks based on a counter
$ # get only 2nd block
$ # same as: seq 30 | awk -v b=2 '/4/{c++} c==b{print; if(/6/) exit}'
$ seq 30 | b=2 perl -ne '$c++ if /4/; if($c==$ENV{b}){print; exit if /6/}'
14
15
16

$ # to get all blocks greater than 'b' blocks
$ # same as: seq 30 | awk -v b=1 '/4/{f=1; c++} f && c>b; /6/{f=0}'
$ seq 30 | b=1 perl -ne '$f=1, $c++ if /4/;
                         print if $f && $c>$ENV{b}; $f=0 if /6/'
14
15
16
24
25
26
  • excluding a particular block
$ # excludes 2nd block
$ # same as: seq 30 | awk -v b=2 '/4/{f=1; c++} f && c!=b; /6/{f=0}'
$ seq 30 | b=2 perl -ne '$f=1, $c++ if /4/;
                         print if $f && $c!=$ENV{b}; $f=0 if /6/'
4
5
6
24
25
26
  • extract block only if it matches another string as well
$ # string to match inside block: 23
$ perl -ne 'if(/BEGIN/){$f=1; $m=0; $b=""}; $m=1 if $f && /23/;
            $b.=$_ if $f; if(/END/){print $b if $m; $f=0}' range.txt
BEGIN
1234
6789
END

$ # line to match inside block: 5 or 25
$ seq 30 | perl -ne 'if(/4/){$f=1; $m=0; $b=""}; $m=1 if $f && /^(5|25)$/;
                     $b.=$_ if $f; if(/6/){print $b if $m; $f=0}'
4
5
6
24
25
26

Broken blocks

  • If there are blocks with ending REGEXP but without corresponding start, earlier techniques used will suffice
  • Consider the modified input file where starting REGEXP doesn't have corresponding ending
$ cat broken_range.txt
foo
BEGIN
1234
6789
END
bar
BEGIN
a
b
c
baz

$ # the file reversing trick comes in handy here as well
$ # same as: tac broken_range.txt | awk '/END/{f=1} f; /BEGIN/{f=0}' | tac
$ tac broken_range.txt | perl -ne '$f=1 if /END/;
                         print if $f; $f=0 if /BEGIN/' | tac
BEGIN
1234
6789
END
  • But if both kinds of broken blocks are present, for ex:
$ cat multiple_broken.txt
qqqqqqq
BEGIN
foo
BEGIN
1234
6789
END
bar
END
0-42-1
BEGIN
a
BEGIN
b
END
xyzabc

then use buffers to accumulate the records and print accordingly

$ # same as: awk '/BEGIN/{f=1; buf=$0; next} f{buf=buf ORS $0}
$ #          /END/{f=0; if(buf) print buf; buf=""}' multiple_broken.txt
$ perl -ne 'if(/BEGIN/){$f=1; $b=$_; next} $b.=$_ if $f;
            if(/END/){$f=0; print $b if $b; $b=""}' multiple_broken.txt
BEGIN
1234
6789
END
BEGIN
b
END

$ # note how buffer is initialized as well as cleared
$ # on matching beginning/end REGEXPs respectively
$ # 'undef $b' can also be used here instead of $b=""

Array operations

  • initialization
$ # list example, each value is separated by comma
$ perl -e '($x, $y) = (4, 5); print "$x:$y\n"'
4:5

$ # using list to initialize arrays, allows variable interpolation
$ # ($x, $y) = ($y, $x) will swap variables :)
$ perl -e '@nums = (4, 5, 84); print "@nums\n"'
4 5 84
$ perl -e '@nums = (4, 5, 84, "foo"); print "@nums\n"'
4 5 84 foo
$ perl -e '$x=5; @y=(3, 2); @nums = ($x, "good", @y); print "@nums\n"'
5 good 3 2

$ # use qw to specify string elements separated by space, no interpolation
$ perl -e '@nums = qw(4 5 84 "foo"); print "@nums\n"'
4 5 84 "foo"
$ perl -e '@nums = qw(a $x @y); print "@nums\n"'
a $x @y
$ # use different delimiter as needed
$ perl -e '@nums = qw/baz 1)foo/; print "@nums\n"'
baz 1)foo
$ # index starts from 0
$ perl -le '@nums = (4, "foo", 2, "x"); print $nums[0]'
4
$ # note the use of $ when accessing individual element
$ perl -le '@nums = (4, "foo", 2, "x"); print $nums[2]'
2
$ # to access elements from end, use -ve index from -1
$ perl -le '@nums = (4, "foo", 2, "x"); print $nums[-1]'
x

$ # index of last element in array
$ perl -le '@nums = (4, "foo", 2, "x"); print $#nums'
3
$ # size of array, i.e total number of elements
$ perl -le '@nums = (4, "foo", 2, "x"); $s=@nums; print $s'
4
$ perl -le '@nums = (4, "foo", 2, "x"); print scalar @nums'
4
$ # note the use of @ when accessing more than one element
$ echo 'a b c d' | perl -lane 'print "@F[0,-1,2]"'
a d c
$ # range operator
$ echo 'a b c d' | perl -lane 'print "@F[1..2]"'
b c
$ # rotating elements
$ echo 'a b c d' | perl -lane 'print "@F[1..$#F,0]"'
b c d a

$ # index needed can be given from another array too
$ echo 'a b c d' | perl -lane '@i=(3,1); print "@F[@i]"'
d b

$ # easy swapping of columns
$ perl -lane 'print join "\t", @F[1,0]' fruits.txt
qty     fruit
42      apple
31      banana
90      fig
6       guava
  • range operator also allows handy initialization
$ perl -le '@n = (12..17); print "@n"'
12 13 14 15 16 17

$ perl -le '@n = (l..ad); print "@n"'
l m n o p q r s t u v w x y z aa ab ac ad

Iteration and filtering

$ # foreach will return each value one by one
$ # can also use 'for' keyword instead of 'foreach'
$ perl -le 'print $_*2 foreach (12..14)'
24
26
28

$ # iterate using index
$ perl -le '@x = (a..e); foreach (0..$#x){print $x[$_]}'
a
b
c
d
e

$ # C-style for loop can be used as well
$ perl -le '@x = (a..c); for($i=0;$i<=$#x;$i++){print $x[$i]}'
a
b
c
$ # as usual, $_ will get the value each iteration
$ perl -le '$,=" "; print grep { /[35]/ } 2..26'
3 5 13 15 23 25
$ # alternate syntax
$ perl -le '$,=" "; print grep /[35]/, 2..26'
3 5 13 15 23 25

$ # to get index instead of matches
$ perl -le '$,=" "; @n=(2..26); print grep {$n[$_]=~/[35]/} 0..$#n'
1 3 11 13 21 23

$ # compare values
$ s='23 756 -983 5'
$ echo "$s" | perl -lane 'print join " ", grep $_<100, @F'
23 -983 5

$ # filters only those elements with successful substitution
$ # note that it would modify array elements as well
$ echo "$s" | perl -lane 'print join " ", grep s/3/E/, @F'
2E -98E
  • more examples
$ # filtering column(s) based on header
$ perl -lane '@i = grep {$F[$_] eq "Name"} 0..$#F if $.==1;
              print @F[@i]' marks.txt
Name
Raj
Joel
Moi
Surya
Tia
Om
Amy

$ cat split.txt
foo,1:2:5,baz
wry,4,look
free,3:8,oh
$ # print line if more than one column has a digit
$ perl -F: -lane 'print if (grep /\d/, @F) > 1' split.txt
foo,1:2:5,baz
free,3:8,oh
  • to get random element from array
$ s='65 23 756 -983 5'
$ echo "$s" | perl -lane 'print $F[rand @F]'
5
$ echo "$s" | perl -lane 'print $F[rand @F]'
23
$ echo "$s" | perl -lane 'print $F[rand @F]'
-983

$ # in scalar context, size of array gets passed to rand
$ # rand actually returns a float
$ # which then gets converted to int index

Sorting

  • See perldoc - sort for details
  • $a and $b are special variables used for sorting, avoid using them as user defined variables
$ # by default, sort does string comparison
$ s='foo baz v22 aimed'
$ echo "$s" | perl -lane 'print join " ", sort @F'
aimed baz foo v22

$ # same as default sort
$ echo "$s" | perl -lane 'print join " ", sort {$a cmp $b} @F'
aimed baz foo v22
$ # descending order, note how $a and $b are switched
$ echo "$s" | perl -lane 'print join " ", sort {$b cmp $a} @F'
v22 foo baz aimed

$ # functions can be used for custom sorting
$ # lc lowercases string, so this sorts case insensitively
$ perl -lane 'print join " ", sort {lc $a cmp lc $b} @F' poem.txt
are red, Roses
are blue, Violets
is Sugar sweet,
And are so you.
  • sorting characters within word
$ echo 'foobar' | perl -F -lane 'print sort @F'
abfoor

$ cat words.txt
bot
art
are
boat
toe
flee
reed

$ # words with characters in ascending order
$ perl -F -lane 'print if (join "", sort @F) eq $_' words.txt
bot
art

$ # words with characters in descending order
$ perl -F -lane 'print if (join "", sort {$b cmp $a} @F) eq $_' words.txt
toe
reed
  • for numeric comparison, use <=> instead of cmp
$ s='23 756 -983 5'
$ echo "$s" | perl -lane 'print join " ",sort {$a <=> $b} @F'
-983 5 23 756
$ echo "$s" | perl -lane 'print join " ",sort {$b <=> $a} @F'
756 23 5 -983

$ # sorting strings based on their length
$ s='floor bat to dubious four'
$ echo "$s" | perl -lane 'print join ":",sort {length $a <=> length $b} @F'
to:bat:four:floor:dubious
  • sorting columns based on header
$ # need to get indexes of order required for header, then use it for all lines
$ perl -lane '@i = sort {$F[$a] cmp $F[$b]} 0..$#F if $.==1;
              print join "\t", @F[@i]' marks.txt
Dept    Marks   Name
ECE     53      Raj
ECE     72      Joel
EEE     68      Moi
CSE     81      Surya
EEE     59      Tia
ECE     92      Om
CSE     67      Amy

$ perl -lane '@i = sort {$F[$b] cmp $F[$a]} 0..$#F if $.==1;
              print join "\t", @F[@i]' marks.txt
Name    Marks   Dept
Raj     53      ECE
Joel    72      ECE
Moi     68      EEE
Surya   81      CSE
Tia     59      EEE
Om      92      ECE
Amy     67      CSE

Further Reading


Transforming

  • shuffling list elements
$ s='23 756 -983 5'
$ # note that this doesn't change the input array
$ echo "$s" | perl -MList::Util=shuffle -lane 'print join " ", shuffle @F'
756 23 -983 5
$ echo "$s" | perl -MList::Util=shuffle -lane 'print join " ", shuffle @F'
5 756 23 -983

$ # randomizing file contents
$ perl -MList::Util=shuffle -e 'print shuffle <>' poem.txt
Sugar is sweet,
And so are you.
Violets are blue,
Roses are red,

$ # or if shuffle order is known
$ seq 5 | perl -e '@lines=<>; print @lines[3,1,0,2,4]'
4
2
1
3
5
  • use map to transform every element
$ echo '23 756 -983 5' | perl -lane 'print join " ", map {$_*$_} @F'
529 571536 966289 25
$ echo 'a b c' | perl -lane 'print join ",", map {qq/"$_"/} @F'
"a","b","c"
$ echo 'a b c' | perl -lane 'print join ",", map {uc qq/"$_"/} @F'
"A","B","C"

$ # changing the array itself
$ perl -le '@s=(4, 245, 12); map {$_*$_} @s; print join " ", @s'
4 245 12
$ perl -le '@s=(4, 245, 12); map {$_ = $_*$_} @s; print join " ", @s'
16 60025 144

$ # ASCII int values for each character
$ echo 'AaBbCc' | perl -F -lane 'print join " ", map ord, @F'
65 97 66 98 67 99

$ s='this is a sample sentence'
$ # shuffle each word, split here converts each element to character array
$ # join the characters after shuffling with empty string
$ # finally print each changed element with space as separator
$ echo "$s" | perl -MList::Util=shuffle -lane '$,=" ";
                    print map {join "", shuffle split//} @F;'
tshi si a mleasp ncstneee
  • fun little unreadable script...
$ cat para.txt
Why cannot I go back to my ignorant days with wild imaginations and fantasies?
Perhaps the answer lies in not being able to adapt to my freedom.
Those little dreams, goal setting, anticipation of results, used to be my world.
All joy within the soul and less dependent on outside world.
But all these are absent for a long time now.
Hope I can wake those dreams all over again.

$ perl -MList::Util=shuffle -F'/([^a-zA-Z]+)/' -lane '
        print map {@c=split//; $#c<3 || /[^a-zA-Z]/? $_ :
              join "",$c[0],(shuffle @c[1..$#c-1]),$c[-1]} @F;' para.txt
Why coannt I go back to my inoagrnt dyas wtih wild imiaintangos and fatenasis?
Phearps the awsenr lies in not bieng albe to aadpt to my fedoerm.
Toshe llttie draems, goal stetnig, aaioiciptntn of rtuelss, uesd to be my wrlod.
All joy witihn the suol and less dnenepedt on oiduste world.
But all tsehe are abenst for a lnog tmie now.
Hpoe I can wkae toshe daemrs all over aiagn.
$ s='23 756 -983 5'
$ echo "$s" | perl -lane 'print join " ", reverse @F'
5 -983 756 23

$ echo 'foobar' | perl -lne 'print reverse split//'
raboof
$ # can also use scalar context instead of using split
$ echo 'foobar' | perl -lne '$x=reverse; print $x'
raboof
$ echo 'foobar' | perl -lne 'print scalar reverse'
raboof

Miscellaneous


split

  • the -a command line option uses split and automatically saves the results in @F array
  • default separator is \s+
  • by default acts on $_
  • and by default all splits are performed
  • See also perldoc - split function
$ echo 'a 1 b 2 c' | perl -lane 'print $F[2]'
b
$ echo 'a 1 b 2 c' | perl -lne '@x=split; print $x[2]'
b
$ # temp variable can be avoided by using list context
$ echo 'a 1 b 2 c' | perl -lne 'print join ":", (split)[2,-1]'
b:c

$ # using digits as separator
$ echo 'a 1 b 2 c' | perl -lne '@x=split /\d+/; print ":$x[1]:"'
: b :

$ # specifying maximum number of splits
$ echo 'a 1 b 2 c' | perl -lne '@x=split /\h+/,$_,2; print "$x[0]:$x[1]:"'
a:1 b 2 c:
$ # specifying limit using -F option
$ echo 'a 1 b 2 c' | perl -F'/\h+/,$_,2' -lane 'print "$F[0]:$F[1]:"'
a:1 b 2 c:
  • by default, trailing empty fields are stripped
  • specify a negative value to preserve trailing empty fields
$ echo ':123::' | perl -lne 'print scalar split /:/'
2
$ echo ':123::' | perl -lne 'print scalar split /:/,$_,-1'
4

$ echo ':123::' | perl -F: -lane 'print scalar @F'
2
$ echo ':123::' | perl -F'/:/,$_,-1' -lane 'print scalar @F'
4
  • to save the separators as well, use capture groups
$ echo 'a 1 b 2 c' | perl -lne '@x=split /(\d+)/; print "$x[1],$x[3]"'
1,2
$ # or, without the temp variable
$ echo 'a 1 b 2 c' | perl -lne 'print join ",", (split /(\d+)/)[1,3]'
1,2

$ # same can be done for -F option
$ echo 'a 1 b 2 c' | perl -F'(\d+)' -lane 'print "$F[1],$F[3]"'
1,2
  • single line to multiple line by splitting a column
$ cat split.txt
foo,1:2:5,baz
wry,4,look
free,3:8,oh

$ perl -F, -ane 'print join ",", $F[0],$_,$F[2] for split /:/,$F[1]' split.txt
foo,1,baz
foo,2,baz
foo,5,baz
wry,4,look
free,3,oh
free,8,oh
  • weird behavior if literal space character is used with -F option
$ # only one element in @F array
$ echo 'a 1 b 2 c' | perl -F'/b /' -lane 'print $F[1]'

$ # space not being used by separator
$ echo 'a 1 b 2 c' | perl -F'b ' -lane 'print $F[1]'
 2 c
$ # correct behavior
$ echo 'a 1 b 2 c' | perl -F'b\x20' -lane 'print $F[1]'
2 c

$ # errors out if space used inside character class
$ echo 'a 1 b 2 c' | perl -F'/b[ ]/' -lane 'print $F[1]'
Unmatched [ in regex; marked by <-- HERE in m//b[ <-- HERE /.
$ echo 'a 1 b 2 c' | perl -lne '@x=split /b[ ]/; print $x[1]'
2 c

Fixed width processing

$ # here 'a' indicates arbitrary binary data
$ # the number that follows indicates length
$ # the 'x' indicates characters to ignore, use length after 'x' if needed
$ # and there are many other formats, see perldoc for details
$ echo 'b 123 good' | perl -lne '@x = unpack("a1xa3xa4", $_); print $x[0]'
b
$ echo 'b 123 good' | perl -lne '@x = unpack("a1xa3xa4", $_); print $x[1]'
123
$ echo 'b 123 good' | perl -lne '@x = unpack("a1xa3xa4", $_); print $x[2]'
good

$ # unpack not always needed, can simply capture characters needed
$ echo 'b 123 good' | perl -lne 'print /.{2}(.{3})/'
123
$ # or use substr to specify offset (starts from 0) and length
$ echo 'b 123 good' | perl -lne 'print substr $_, 6, 4'
good

$ # substr can also be used for replacing
$ echo 'b 123 good' | perl -lpe 'substr $_, 2, 3, "gleam"'
b gleam good

Further Reading


String and file replication

$ # replicate each line
$ seq 2 | perl -ne 'print $_ x 2'
1
1
2
2

$ # replicate a string
$ perl -le 'print "abc" x 5'
abcabcabcabcabc

$ # works for lists too
$ perl -le '@x = (3, 2, 1) x 2; print join " ",@x'
3 2 1 3 2 1

$ # replicating file
$ wc -c poem.txt
65 poem.txt
$ perl -0777 -ne 'print $_ x 100' poem.txt | wc -c
6500
  • the perldoc - glob function can be hacked to generate combinations of strings
$ # typical use case
$ # same as: echo *.log
$ perl -le 'print join " ", glob q/*.log/'
report.log
$ # same as: echo *.{log,pl}
$ perl -le 'print join " ", glob q/*.{log,pl}/'
report.log code.pl sub_sq.pl

$ # hacking
$ # same as: echo {1,3}{a,b}
$ perl -le '@x=glob q/{1,3}{a,b}/; print "@x"'
1a 1b 3a 3b
$ # same as: echo {1,3}{1,3}{1,3}
$ perl -le '@x=glob "{1,3}" x 3; print "@x"'
111 113 131 133 311 313 331 333

transliteration

  • See tr under perldoc - Quote-Like Operators section for details
  • similar to substitution, by default tr acts on $_ variable and modifies it unless r modifier is specified
  • however, characters $ and @ are treated as literals - i.e no interpolation
  • similar to sed, one can also use y instead of tr
$ # one-to-one mapping of characters, all occurrences are translated
$ echo 'foo bar cat baz' | perl -pe 'tr/abc/123/'
foo 21r 31t 21z

$ # use - to represent a range in ascending order
$ echo 'Hello World' | perl -pe 'tr/a-zA-Z/n-za-mN-ZA-M/'
Uryyb Jbeyq
$ echo 'Uryyb Jbeyq' | perl -pe 'tr|a-zA-Z|n-za-mN-ZA-M|'
Hello World
  • if arguments are of different lengths
$ # when second argument is longer, the extra characters are ignored
$ echo 'foo bar cat baz' | perl -pe 'tr/abc/1-9/'
foo 21r 31t 21z

$ # when first argument is longer
$ # the last character of second argument gets padded to make it equal
$ echo 'foo bar cat baz' | perl -pe 'tr/a-z/123/'
333 213 313 213
  • modifiers
$ # no padding, absent mappings are deleted
$ echo 'fob bar cat baz' | perl -pe 'tr/a-z/123/d'
2 21 31 21
$ echo 'Hello:123:World' | perl -pe 'tr/a-z//d'
H:123:W

$ # c modifier complements first argument characters
$ echo 'Hello:123:World' | perl -lpe 'tr/a-z//cd'
elloorld

$ # s modifier to keep only one copy of repeated characters
$ echo 'FFoo seed 11233' | perl -pe 'tr/a-z//s'
FFo sed 11233
$ # when replacement is done as well, only replaced characters are squeezed
$ # unlike 'tr -s' which squeezes characters specified by second argument
$ echo 'FFoo seed 11233' | perl -pe 'tr/A-Z/a-z/s'
foo seed 11233

$ perl -e '$x="food"; $y=$x=~tr/a-z/A-Z/r; print "x=$x\ny=$y\n"'
x=food
y=FOOD
  • since - is used for character ranges, place it at the start/end to represent it literally
  • similarly, to represent \ literally, use \\
$ echo '/foo-bar/baz/report' | perl -pe 'tr/-a-z/_A-Z/'
/FOO_BAR/BAZ/REPORT

$ echo '/foo-bar/baz/report' | perl -pe 'tr|/-|\\_|'
\foo_bar\baz\report
  • return value is number of replacements made
$ echo 'Hello there. How are you?' | grep -o '[a-z]' | wc -l
17

$ echo 'Hello there. How are you?' | perl -lne 'print tr/a-z//'
17
  • unicode examples
$ echo 'hello!' | perl -CS -pe 'tr/a-z/\x{1d5ee}-\x{1d607}/'
๐—ต๐—ฒ๐—น๐—น๐—ผ!

$ echo 'How are you?' | perl -Mopen=locale -Mutf8 -pe 'tr/a-zA-Z/๐—ฎ-๐˜‡๐—”-๐—ญ/'
๐—›๐—ผ๐˜„ ๐—ฎ๐—ฟ๐—ฒ ๐˜†๐—ผ๐˜‚?

Executing external commands

  • External commands can be issued using system function
  • Output would be as usual on stdout unless redirected while calling the command
$ perl -e 'system("echo Hello World")'
Hello World
$ # use q operator to avoid interpolation
$ perl -e 'system q/echo $HOME/'
/home/learnbyexample

$ perl -e 'system q/wc poem.txt/'
 4 13 65 poem.txt

$ perl -e 'system q/seq 10 | paste -sd, > out.txt/'
$ cat out.txt
1,2,3,4,5,6,7,8,9,10

$ cat f2
I bought two bananas and three mangoes
$ echo 'f1,f2,odd.txt' | perl -F, -lane 'system "cat $F[1]"'
I bought two bananas and three mangoes
  • return value of system will have exit status information or $? can be used
  • see perldoc - system for details
$ perl -le '$es=system q/ls poem.txt/; print "$es"'
poem.txt
0
$ perl -le 'system q/ls poem.txt/; print "exit status: $?"'
poem.txt
exit status: 0

$ perl -le 'system q/ls xyz.txt/; print "exit status: $?"'
ls: cannot access 'xyz.txt': No such file or directory
exit status: 512
  • to save result of external command, use backticks or qx operator
  • newline gets saved too, use chomp if needed
$ perl -e '$lines = `wc -l < poem.txt`; print $lines'
4
$ perl -e '$nums = qx/seq 3/; print $nums'
1
2
3

Further Reading