Skip to content

WanjohiChristopher/Linux-Unix-Shell-Commmands-Bash-Scripting-For-Data-Engineer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 

Repository files navigation

Common-Linux-Unix-Shell-Commmands-Bash-Scripting

Shell- user interface for running commands/Scripting language to automate tasks.

Shells

-Default one is Bash
-Others include:sh,ksh,tcsh,zsh and fish.

~ means home directory.
copying files ND SAVING TEM IN BACKUP
cp seasonal/summer.csv backup/summer.bck

moving a file to backup

mv seasonal/spring.csv seasonal/summer.csv backup
Renaming Files

mv winter.csv winter.csv.bck

Deleting files

rm seasonal/summer.csv
Creating and removing directory

rmdir-removes current directory
mkdir creates a new directory

Viewing content of files like csv use spacebar or n or p for previous and q for quiting
less seasonal/spring.csv seasonal/summer.csv

Head of the dataset by use of n to specify lines
head -n 5 seasonal/winter.csv

Viewing files even if they are nested in a directory
ls -R -F

Soring a command output into a file
example

tail -n 5 seasonal/winter.csv >last.csv

SELECTING COLUMNS

which means "select columns 2 through 5 and columns 8, using comma as the separator". cut uses -f (meaning "fields") to specify columns and -d (meaning "delimiter") to specify the separator. You need to specify the latter because some files may use spaces, tabs, or colons to separate columns.
cut -f 2-5 ,8 -d ,chris.csv

Bash Shell(Bourne again shell)

For checking default shell on your command line type:

Shell

If you don't have bash simply type 'bash' on the CLI(command line)

Shell command applications

   -Running batch jobs (ETL)
   -Getting information
   -printing files and strings
   -compression and archiving
   -performing network operations

Shell Commands for Getinfo:

getinfo

Shell commands for working with files

carbon

Shell commands for Navigating and working with Directories

carbon (1)

Shell Cmds for printing file and string content

carbon (2)

Shell Cmds for Compressing and Archiving Files

files

Shell commands for Neworking apps

NETWORKS

Shell Pipes,Variables and Filters

ls | sort -r

man ls | head -20

Filters -transforms input data into output data

   -wc,cat,more,head,sort,grep and also can be chained together.

Pipe - denoted by |

 -used for chaining filter commands
 -```command1|command2```
 
 -output of the first cmd is input of the second command
 -basically pipe means pipeline
 
     ``` ls | sort -r represnts reverse sorting```

Variables-scope limited to shell

     -use set to list all shell variables
     
     -```set | head -4```
     
     -assigning variables with = not having spaces var=value
     
     ``` name='chris'
         #printing we use 
         echo $name```
         
     -deleting variables 
     ``` unset name```

Environment Variables -have extended scope

                 ``` export variablenme ```
                 -listing all env variables use ```env```
                 
                 ``` env | grep variablename```

Shell SCripting basics

-list of commands interpreted by a scripting lng. -scripts used for automating processes -shell script is an executable text file with an interpreter directive shebang directive:

its usually represented as:

#!interpreter [optional argument]

Interpreter- is the absolute path to an executable program

optional-argument-single argument string

shell

MetaCharacters in Bash Shell

metacha

Input/Output Redirection

-used for redirecting carbon (3)

Batch vs.Concurrent modes

Batch mode:

commands run squentially

first runs then second

command1;command2

Concurrent mode:

commands run in parallel

Command works works on background and pass input to command2 in foreground.

command1 & command2

Selecting lines containing specific values

head and tail select rows, cut selects columns, and grep selects lines according to what they contain.

In its simplest form, grep takes a piece of text followed by one or more filenames and prints all of the lines in those files that contain that text.

For example, grep bicuspid seasonal/winter.csv prints lines from winter.csv that contain "bicuspid".

grep can search for patterns as well; we will explore those in the next course. What's more important right now is some of grep's more common flags:

-c: print a count of matching lines rather than the lines themselves

-h: do not print the names of files when searching multiple files

-i: ignore case (e.g., treat "Regression" and "regression" as matches)

-l: print the names of files that contain matches, not the matches

-n: print line numbers for matching lines

-v: invert the match, i.e., only show lines that don't match
Data Manipulation Pipeline
 cut -d , -f 2 seasonal/summer.csv | grep -v Tooth | head -n 1

Sort lines of text

As its name suggests, sort puts data in order.

By default it does this in ascending alphabetical order, but the flags -n and -r can be used to sort numerically and reverse the order of its output, while -b tells it to ignore leading blanks and -f tells it to fold case (i.e., be case-insensitive).

Pipelines often use grep to get rid of unwanted records and then sort to put the remaining records in order.

cut -d , -f 2 seasonal/winter.csv | grep -v Tooth |sort -r

Removing Duplicates

We use Uniq to remove adjacent duplicates. -c is for count

cut -f 2 -d , seasonal/winter.csv | grep  -v Tooth | sort |uniq -c
      4 bicuspid
      7 canine
      6 incisor
      4 molar
      4 wisdom
  # Looping
  
  for filetype in docx odt pdf; do echo $filetype; done

Saving of Files and using loops to display them

$ files=seasonal/*.csv$ for f in $files; do echo $f; done
seasonal/autumn.csvseasonal/spring.csv
seasonal/summer.csv
seasonal/winter.csv

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published