awk.txt

tiny.cc/Awk


awk.ref
a few things to remind myself various format/syntax of awk.


---

sed -i 's/ftp/FTP-disabled/' /etc/passwd	
# inline edit of file, no need to create tmp file and copy back. eg uss sed substitude command.

# field delimiter, awk use -F, cut use -d
# column, both awk and cut start with 1, awk $1, cut -f1
awk -F: '{print $1 ":" $3}' < /etc/passwd
cut -d: -f1,3 < /etc/passwd			
# cut -f1-3 is 1 to 3.  3- is 3 onward.


sed doing several sequential edits, older version need -e "s/.../.../" -e "..."
eg from 
cat somefile.csv | sed -e "s/Total Hospital Beds/TotalHospitalBeds/" -e "s/Hospital Beds per 1,000 Population/HospitalBedsPer1kPop/"  -e "s/Total CHCs/TotalCHCs/" -e "s/CHC Service Delivery Sites/CHCServiceDeliverySites/"


---


awk 'NF > 1 {print $0}' < list
awk -F: 'NF != 7 {print $0 }' < /etc/passwd    # entry that does not have 7 columns will be printed for review

-F: use colon as field/column delimiter
NF = number of fields (ie, number of columns)

$1 = first  column
$2 = second column
$0 = whole line


# perl   column numbering starts with 0
# python column numbering starts with 0
# awk    column numbering starts with 1
# cut    column numbering starts with 1


---


Example with BEGIN and END blocks:

ls -l /var/log | awk ' BEGIN {sum=0} {sum+=$5} END {print sum}'
this will find out how much space are used by the files on the specified dir (but not sub dir)

Note that there is no need to initiate/zero variable.  so the above can be abreviated to (and display in GB):
ls -l /var/log | awk '{sum+=$5} END {print sum/1024^2}'


Another example on summing all quotas on a netapp vfiler:
ssh ntap2 vfiler run nas5 quota report | awk '/vf5_vol1/ {sum+=$6} END {print sum/1024/1024}'


define in .bashrc as shell function

Size() { ls -l $* | awk '{sum+=$5} END {print sum}' ; }         # Size *.txt  # not /usr/bin/size!

qhostTot() { qhost | awk '{h+=1; c+=$3} END {print "host="h " core="c}'  ; }


---


awk '
	/pattern1/	{ print "got pattern 1" }
	/pattern2/	{ print "got pattern 2" }
'


---

qstat -f | awk '$6 ~ /[a-zA-Z]/ {print $0}'					# some host have error state in column 6, display only such host
qstat -f | awk '$6 ~ /[a-zA-Z]/  &&  $1 ~ /default.q@compute/  {print $0}'	# add an additional test for nodes in a specific queue


cat passwd | awk -F: '$4 != $3 {print $0}'			# print user whose GID (col 4) is not the same as the UID (col 3)

cat passwd | awk -F: '$7 !~ /false/          {print $0}' 			# print user who does not have (/bin/)false as shell (column 7)
cat passwd | awk -F: '$7 !~ /false|nologin/ {print $0}' 			# print user who does not have false or nologin as shell

cat passwd | awk -F: '$7 !~ /false|nologin/  &&  $2 ~ /x/  {print $0}' 	# add test cond that col 2 has x as password


---
man page info:

relational operators:
< >
<= >-
!=  ==	not equal to, equal to

~	reg exp match
!~	negated match


       AWK patterns may be one of the following:

              BEGIN
              END
              /regular expression/
              relational expression
              pattern && pattern
              pattern || pattern
              pattern ? pattern : pattern
              (pattern)
              ! pattern
              pattern1, pattern2


Control Statements
       The control statements are as follows:

              if (condition) statement [ else statement ]
              while (condition) statement
              do statement while (condition)
              for (expr1; expr2; expr3) statement
              for (var in array) statement
              break
              continue
              delete array[index]
              delete array
              exit [ expression ]
              { statements }


	eg:
	awk /pattern/    {statement}

	awk '{ if ( str1 ~ str2 ) print $0 }'
	awk '{ if ( str1 ~ str2 ) {print $0} }'
	awk '{ if ( str1 ~ str2 ) {print $0} else {print $2} }'
	
	what I haven't been able to find out is how str1 can be variable.
	arg... use perl!

	awk '/pattern1/	{print $0} '	# =  "grep pattern1"
	awk '/pattern1/	{print "got pattern 1"} '


****************************************

it is pretty hard to get awk to see exactly one tab as delimiter.
So, may want to use tr to convert tab to, say, the hat (^) symbol first.

tr '\t' '^'  < infile > outfile

should do the trick.

TAB is ASCII 9, \011 for really old tr not parsing \t.

****************************************

example awk program using various construct, counting number of users using various shells:

$7~/tcsh/{TSHELL++}
$7~/bin.csh/{CSHELL++}
$7~/a/{BSHELL++; print $7}
END {print TSHELL " "  CSHELL " " BSHELL}


eg from slurm-allIdle-brc.sh :

NODELIST=$( sinfo --Node --long --format '%N %20P %.10t' | awk "\$2 ~ /^$PARTITION$/ && \$3 ~ /idle/ {print \$1}" )

$2 ~ /^exactWord$/   # use this for exact matching.  == is for number, can't compare string.  use ^ and $ to indicate limit of word.  column value would have space stripped already.