Skip to content

Latest commit

 

History

History
8854 lines (8317 loc) · 223 KB

python.md

File metadata and controls

8854 lines (8317 loc) · 223 KB

MODULE, GENERAL


Import python module

import math

Import specific function of a module

from math import ceil

Import a module with abbreviation

import numpy as np

When directly called - programm is running from here

if __name__ == '__main__':

Check which python version is installed

python --version


Use modules / functions from other folder (not working with eg. yahoocrawler.py and rapidtechtools.py - all modules, wether they are needed or not have to imported in the program...)

import sys, os
sys.path.append(os.path.join('C:/', 'Users\path_to_module'))	# Create Path to module
import RapidTechTools as rtt        # Import Module RapidTechTools

VIRTUALENV


Install virtualenv

pip install virtualenv

Global packages installed

pip list

Shows where the python-file is (Windows)

where python

Shows where the python-file is (Linux)

which python

Create new virtual environment env

python -m virtualenv env

Create new virtual environment env (for python3 eg. on MacOS)

python3 -m venv env

Extracts all the modules / dependencies to a txt-file

pip freeze --local > requirements.txt

Go back to the global environment

deactivate

Create new virtual env with specific python-version

virtualenv -p C:\..path to..\Python37\python.exe py37_env

Activate new virtual env named py37_env

py37_env\Scripts\activate

Shows the used python version in the virtual env

python --version

Install all the packages from requirements.txt

pip install -r requirements.txt

Windows: Show all installed python-versions + paths

py -0p


Activate environment

venv\Scripts\activate
    when activated: pip list    #  Now only shows the installed modules for the virtual environment
    pip install package         #  Only installs in the activated virtual environment


install virtual environment with specific python-version

download and install the python-version - Important: remember the path to the python.exe
create a venv with: virtualenv envName -p path\to\new_python.exe

PIP


install module

pip install moduleXY

upgrade module

pip install moduleXY --upgrade

Install specific version

pip install -Iv pyinstaller==4.8

uninstall module XY

pip uninstall module XY

show installed modules in actual environment

pip list

output modules in requirements.txt

pip freeze > requirements.txt

install modules from requirements.txt

pip install -r requirements.txt

PYINSTALLER, INNO SETUP COMPILER - generate python programs to executables


Installation

pip install pyinstaller

Install specific version

pip install -Iv pyinstaller==4.8

Generate the bundle in a subdirectory called dist.

pyinstaller prg.py

Generate only one file

pyinstaller --onefile prg.py

Generate only one file (to get for sure the pyinstaller from the actual env)

python -m Pyinstaller --onefile prg.py

Collect data for specific python module

--collect-data moduleXY

add this when some modules are not found

--copy-metadata pandas_ta

When some modules are making problems - sometimes this helps

--hidden-import=pymssql

pyinstaller --onefile --collect-datas=trafilatura xyz.py pyinstaller --onefile --collect-data fake_useragent checkMapsSel2.py
error when using yfinance

pyinstaller --onefile --collect-data frozendict stockma3.py


when there is a depreciating warning from matlib

--exclude-module matplotlib

Generate file with icon

--icon=app.ico

Generating under mac os sometimes only work with this params

--hidden-import=pkg_resources.py2_warn

Generating under mac os sometimes only work with this params

--hidden-import=cmath


Open this file to start the python-program

prg.exe

only working with import yfinance (not with import yfinance as yf)

yfinance module


Problems with some added python modules (eg. pycountry)

https://groups.google.com/g/pyinstaller/c/OYhJdeZ9010/m/32g3-T8XBAAJ
Create hook-file hook-pycountry.py with content:
	from PyInstaller.utils.hooks import copy_metadata, collect_data_files
	datas = copy_metadata("pycountry") + collect_data_files("pycountry")
Compile Program with
	pyinstaller --onefile --exclude-module matplotlib --additional-hooks-dir=. TestPyCountry.py


Problems with path when executing from py and exe

config_name = 'creds.json'                                    # Define the config file name


Determine if application is a script file or frozen exe

if getattr(sys, 'frozen', False):                             # Get path when starting as executable
    application_path = os.path.dirname(sys.executable)
elif __file__:                                                # Get path when running from IDE as py-file
    application_path = os.path.dirname(__file__)
config_path = os.path.join(application_path, config_name)     # Final Config Path


Using Inno Setup Compiler for creating a setup.exe for all files

https://jrsoftware.org/isdl.php
- Application Information => provide application informations (appName, appVersion, publisher, website)
- Application Files => select application main executable file and add all necessary folder and files
- Application File Association => no selection
- Application Documentation => select files which should be shown before / after the installation
- Setup Languages => select which languages should be included during installation
- Save installation script
- Final setup.exe is stored in the "Output"-folder

SHORTCUTS VSCODE


Comment / Uncomment selected lines ("/" on numblock) (commenting in VS Code)

Ctrl K Ctrl C

Comment / Uncomment selected lines ("/" on numblock) (commenting in VS Code)

Ctrl K Ctrl U

Find in File

Ctrl F

Replace in File

Ctrl H

Selects whole line

Ctrl L

Search in all files

Ctrl Shift F

Replace in Path (in all Files)

Ctrl Shift H

Jump to this Grouping Window

Ctrl 1/2/3

Find next

F3

Goto line

Ctrl G

Copy / Duplicate lines

Shift Alt Up/Down

Expand / Shrink selection

Shift Alt Left/Rigth

Move line

Alt Up/Down

Suggestions for fix error, function informations

Alt Enter

Open the project windows (on the left side)

Alt 1

Show usage of the variable, function, class

Alt F7

Focus back on the editor window

Esc

Collapse everything

Ctrl K Ctrl 0

Expands everything

Ctrl K Ctrl J

Collapse block ("-" on numblock)

Ctrl K Ctrl L

Expand block ("+" on numblock)

Ctrl K Ctrl L

Select whole file

Ctrl A

Undo (backwards) / Redo (forwards)

Ctrl (Shift) Z

Select word to the beginning

Ctrl Shift Left

Select word to the end

Ctrl Shift Right

Go one word left

Ctrl Left / Right

Delete current line

Ctrl Shift K

Delete to end of word

Ctrl Del

Delete to beginning of word

Ctrl Background

Hide/Show sidebar

Ctrl B

Hide/Show terminal

Ctrl J

Show command line (eg. open a file where the name is known)

Ctrl P

Open the the command panel

Ctrl Shift P

Toggle through the open files in the editor

Ctrl TAB

Close File

Ctrl WARNING

MultiCursor - Select more cursors in rows for multiple change in rows

Ctrl Alt Up/Down

MultiCursor - Select Cursor at multiple positions for the same name

Ctrl D

Select all occurences of this word in the document and change all with one change

Ctrl Shift L

Select whole word

Ctrl D

Jump to the function definition

Ctrl Click

Close Tab

Ctrl W

Show suggestions for the actual code function / method

Ctrl Space

SHORTCUTS INTELLIJ IDEA IDE, CHROME, WINDOWS

https://www.shortcutfoo.com/app/dojos/intellij-idea-win/cheatsheet

Ctrl / 				      # Comment / Uncomment selected lines ("/" on numblock) (commenting in VS Code)
Ctrl / 				      # Comment / Uncomment selected lines ("/" on numblock) (commenting in VS Code)
Ctrl Alt F7 	      # Find usages
Ctrl F    		      # Find in File
Ctrl R         	    # Replace in File
Ctrl Shift R 		    # Replace in Path (in all Files)
F2            	    # jump to the next error
F3        			    # Find next
Ctrl G      		    # Goto line
Ctrl D 			        # Copy / Duplicate lines
Ctrl X              # Cut line (when nothing selected)
Ctrl C              # Copy line (when nothing selected)
Alt Shift Down/Up 	# Move line
Ctrl E       		    # Recent opened files
Ctrl TAB            # Switch windows in IDE
Ctrl (Shift) W      # Select / deselect parts of the code step by step
Alt Enter			      # Suggestions for fix error, function informations
Alt 1				        # Open the project windows (on the left side)
Alt F7				      # Show usage of the variable, function, class
Esc					        # Focus back on the editor window
Ctrl Shift -    	  # Collapse everything
Ctrl Shift +        # Exapnds everything
Ctrl - 				      # Collapse block ("-" on numblock)
Ctrl + 				      # Expand block ("+" on numblock)
Ctrl A         		  # Select whole file
Ctrl (Shift) Z 		  # Undo (backwards) / Redo (forwards)
Ctrl Shift Left 	  # Select word to the beginning
Ctrl Shift Right 	  # Select word to the end
Ctrl Left / Right 	# Go one word left
Ctrl Y              # Delete current line
Ctrl Del            # Delete to end of word
Ctrl Backspace      # Delete to beginning of word
lorem + TAB         # Create lorem text for html-code
p>lorem + TAB       # Create lorem text inside <p>-tags


Shortcuts Chrome
https://www.makeuseof.com/tag/google-chrome-shortcuts-pdf/

F5                  # Refresh site
Ctrl Tab            # Go tab right
Ctrl Shift Tab      # Go tab left
Ctrl F5             # Clear Cache and reload page


Shortcuts Win10
https://fossbytes.com/windows-keyboard-shortcuts-cheat-sheet-for-windows-10/

Alt Tab         # Focus on other window
Ctrl C          # Copy
Ctrl X          # Cut
Ctrl VALUES     # Paste


Enable pushing with ssh-key in Idea and VSCode

Enable open ssh agent: https://dev.to/aka_anoop/how-to-enable-openssh-agent-to-access-your-github-repositories-on-windows-powershell-1ab8
Run commands: https://stackoverflow.com/questions/56490194/vs-code-bitbucket-ssh-permission-denied-publickey

OPERATORS, INPUTS, OUTPUTS, ARGUMENTS, STATISTICS


Import module for math calculations

import math

Import module for reading arguments from the command line

import sys

Using some numpy functions

import statistics as stat

Result without Decimals (=> 2)

5 // 2

Modulo / Rest of the division (=> 1)

7 % 2

Assigment of several varaibles

d,e,f = 4,5,6

Change / swap 2 values

a,b = b,a

Increase value by one

a += 1

For better readability - "_" are possible using long num-values (will be ignored)

i = 1_000_000

Value is rounded to two decimal places => 77.23

round (77.2321, 2)

Round to tens place

int(v - v%10)

Conversion to String / Float / Int

str(5), float("5"), int("5")

Outputs absolute value => 2

abs(-2)

Wait for pressing ENTER (or a keyboard stroke) to follow along

input("ENTER to continue")

Input age and change to int

int(input("Alter?"))

Returns the type of a variable

type(var)

Shows all available methods and attributes for the object as list

dir(var)

Check if x has type format float (others: str,int,list,tuple,range,dict,set,bool)

isinstance(x,float)

Set var to max-value (float infinite)

x = float("inf")

Calculates the sqrt of the value => i3

math.sqrt(9)

Check if a variable exists

if "myVar" in locals():

Check for argument which is given when starting the program

sys.argv[1:].upper()

Print "Text" (with a linebreak \n at the end)

print("Text")

Print with end-statement (next print will be in the same line

print("Text", end="-")

Print text with linebreaks \n etc

print(repr("xyz\n"))

python test.py arg1 arg2 arg3

example starting program

Show the arguments from the command line when starting the program eg. ['test.py', 'arg1', 'arg2', 'arg3']

str(sys.argv)

Len of the arguments - eg. 4 (program test.py itself and 3 arguments following)

len(sys.argv)

Find the minimum value => 3

min(3,5,7)

Find the maximum value => 7

max(3,5,7)

Find the mean value => 5 (with statistic-module - input must be given as list)

stat.mean([3,5,7])

Stop program at this point (helpful in test-situations)

exit()

Calculate the change from an old value to a new value in percent

(newVal-oldVal)/oldVal * 100

Import pretty print functionality

import pprint as pp

pretty print dict line by line

pp.pprint(dict)

pretty print with max line-length 30

pp.pprint(dict, width=30)

pretty print only level 1 of nested list eg. ['level1', [...]]

pp.pprint(nestedList,depth=1)

pretty print only level 2 of nested list eg. ['level1', ['level2', [...]]]

pp.pprint(nestedList,depth=2)

convert binary value to decimal

int("1010",2)

RANDOM, SECRETS

random: unsecure, reproducable with seeds secret: secure, not reproducable
Import random module - fast, but not very secure

import random

Random int number between 1 and 6 like a cube

random.randint(1,6)

Random float number between 1 and 3

random.uniform(1,3)

Random value between 0 and 1 in float format - eg. 0.16394553

random.random()

Random float in the range from 1 to 10

random.uniform(1,10)

Random value for standard deviation

random.normal(0,1)

Choose random entry from a list

random.choice(list)

Choose 3 random (unique) entries from the list

random.sample(list,3)

Choose 3 random entries from the list (possible to be the same)

random.choices(list,k=3)

Shuffle the content of a list

random.shuffle(l)

Can reproduce the same results (unsecure)

random.seed(1)

generate a random uppercase 6-char string

"".join(random.choices(string.ascii_uppercase, k=6))

Import secrets module - for security reasons (is slower)

import secrets

Random int in the range from 0-10 (10 excluded)

secrets.randbelow(10)

Random int with 4 bits (highest possible value is 15 - 1111)

secrets.randbits(4)

Random choice (which is not reproducable)

secrets.choice(l)

STRING

ordered, immutable, text representation s = "this is a test" # Define a string s = "this \n text" # String with linebreak \n s = "this \t text" # String with tab between the words


Define string over more lines in editor with

s = '''this \
	and this'''


first char of a string

c = s[0]

last char of a string

c = s[-1]

Insert variable in string oldest method (%s for string, %i for int, %f for float, %.2f for 2 decimals)

s = "text %s bla" % var

Insert variable in string old method (:.2f for 2 decimals)

s = "text {} bla".format(var)

Insert variable in string new method

s = f"text {var} bla"

Format int with allways 2 digits

s = f"{var:02d}"

Format float with allways 2 decimal places

s = f"{float:0.2f}

Format float with 2 decmial places and overall 9 chars with leading 0

s = f"{float:09.2f}


Multiple use

"{0} like, {0} especially {0} but {1}".format("Joe", "noodles")

Returns the index where the text is found

s.find ("ist")

Counts the occurrence of a text

s.count ("i")

Counts several chars in a text

re.findall('[.,’]', s)

Return all index of the char "/" in the string

[i for i, c in enumerate(s) if c == "/"]

Returns all index as a list (needs import re)

[x.start() for x in re.finditer('ist', s)]

Lowercase the whole string

s = s.lower()

Capitalize the whole string

s = s.upper()

Capitalize the first char

s = s.capitalize()

Capitalize the first char of all words

s = s.title()

Check if string is starting with char "H"

s.startswith("H")

Check if string starts with several chars

s.startswith(("H","F","L"))

Check if string is ending with char "H"

s.endswith("H")

Check if string ends with several chars

s.endwith(("H","F","L"))

Split the words in a list which are seperated by " "

s.split(" ")

Split for several chars (eg. " " and "\n" and " ,")

re.split(" |\n| ,", s)

', 1)[0] => Split till the first occurence of "=>"

s.split('

Split sentences after line breaks

s.splitlines()

Delete all whitespaces at the beginning and the end

s = s.strip()

Remove all duplicate spaces in a string

s = " ".join(s.split())

Replacement of two strings

s = s.replace("e","X")

Replace several chars in a string

s = re.sub("[e0fi]", "X", s)

Replace several chars in a string (get back a tupple: 1st changed string, 2nd: count of changed chars)

s = re.subn("[e0fi]", "X", s)

True if the whole string are digits

s.isdigit()

True if the whole string are no digits

s.isalpha()

TRUE if string2 is in strings1

s2 in s

Change string to list with all single chars

l = list(s)

Convert char to ASCII value

ord(char)

Convert ASCII value to char (eg. 65=A, 97=a)

chr(ascii)

Outputs hashvalue oif the string (needs import hashlib)

hashlib.md5(s).encode('utf-8')).hexdigest()

Execute statement in a string => hello

exec(print("hello"))

Gives the value of an expression => 4 (dangerous - can used for sql-injections)

eval("2+2")

Reverse a string

my_string[::-1]

Print string with special characters

repr(s)


check if any string is in another string or all strings are in another string

s = "A string is more than its parts!"
l = ["more", "blabla", "nothing"]
if any(x in s for x in l):                    # check if any elems from checks in string s
if not any(x in s for x in l):                # check if any elems from checks are NOT in string s
if all(x in s for x in l):                    # check if all elems from checks are in string s

REGEX


import regex module

import re

Information about regex handling

https://medium.com/factory-mind/regex-tutorial-a-simple-cheatsheet-by-examples-649dc1c3f285

test regex strings

https://regex101.com/

Find str with 4xdigits + "-"char + 2xdigits

pattern = re.compile("^[0-9]{4}-[0-9]{2}$")

Check if pattern matches - <> None when matches

pattern.match(s)

Find str with 4xdigits + "-"char + 2xdigits

pattern2 = re.compile("[0-9]{4}-[0-9]{2}")

Find str with 3 to 4 xdigits + "-"char + 2xdigits

pattern2 = re.compile("[0-9]{3,4}-[0-9]{2}")

Check with fullmatch (^ and $ not necessary) - <> None when matches

pattern.fullmatch(s)

Char "." has to cherck with "\n" (is a wildcard letter in regex)

pattern3 = re.compile ("[0-9]{1}\.[0-9]{3}")

Replace all digits in string with blank

re.sub("\d","",s)

Insert blank before every capitalized word eg. "CostOfRevenue" => "Cost Of Revenue"

re.sub(r'([a-z](?=[A-Z])|[A-Z](?=[A-Z][a-z]))', r'\1 ', "txt")

count all words which start and ends with a char from cL

len(re.findall(f'[{cL}"]{w}[{cL}"]', l))

count all words which start with w and ends with a char from cL

len(re.findall(f'^{w}[{charList}"]', l))

count all words which start with a char from cL and ends w

len(re.findall(f'[{charList}"]{w}$', l))

count all words which have w in the middle of anything else

len(re.findall(f'(.*?){w}(.*?)', l))

Read string an try to match with groups
Define pattern => as many digits (endless) + " " + one lowercase cahr + ":" + as many digits (endless)

pattern = "([0-9]+) ([a-z]:([a-z]+)"

Check if elem fits the pattern => returns <re.Match object; span=(0, 19), match="3 n: nnnlncrnnnnn">

match = re.search(pattern, elem)

Assign the value of the first group => 3

value = match.group(1)

Show the full string => "3 n: nnnlncrnnnnn"

all = match.group(0)

LIST

ordered, mutable, allows duplicate elements
Define several empty lists (NOT use l1=l=l3=[] => this would be the SAME list)

l1,l2,l3,l4 = ([] for i in range(4))

check for True if list has an element

if l

check for False if list is empty (same would be if l == []

if not l

Define list with content

l = [4,5,6]

Define list - content is list from 0 to 19

l = list(range(20))

Define list with 8 empyt strings

l = ["" for x in range(8)]

Uppercase the whole list

l = [x.upper(for x in l)]

Define a nested list with 5x5

l = [["" for x in range(5)] for x in range(5)]

Add both lists together (to a new list)

l1 + l2

Add 1 element to the list at the back

l.append(1)

Add several elements to the list at the back

l.extend([6,5,4])

Insert an element at the index-position 3

l.insert(3, "xyz")

Returns first index position of element "xyz"

l.index("xzy")

Delete last element from list

l.pop()

Delete first element from list

l.pop(0)

Sort list ascending

l.sort()

Sort list descending

l.sort(reverse=True)

Sort list and store it in a different independent list

l_sort = sorted(l)

Sort list descending and store it in a different independent list

l_sort = sorted(l, reverse=True)

Sort the same way as the operating system (need: from natsort import os_sorted)

l_sort = os_sorted(l)

Check if a list is sorted (by ascending)

if sorted(l) == l

Check if a list is sorted by descending

if sorted(l, reverse=True) == l

Sort list by len of elements

l.sort(key=len)

Define nested list

l = [[1,4,3], [2,2,4], [3,1,5]]

Sort nested list for the 2nd element ascending

l.sort(key=lambda x: x[1])

Sort nested list for the 2nd element descending

l.sort(key=lambda x: x[1], reverse=True)

Sort nested list 1st for the 1st elem, then for the 2nd elem

l.sort(key=lambda x: (x[0], x[1]))

Sort nested list 1st for the 1st elem, then for the 2nd elem descending

l.sort(key=lambda x: (x[0], x[1]), reverse=True)

Reverse the complete list

l.reverse()

Reverse complete list and store it in different independent list

l_reverse = list(reversed(l))

Delete element at index position 2

del l[2]

Delete first element with this value from the list

l.remove("abc")

Delete complete content of the list (same as l = [])

l.clear()

Find smallest element in the list

min(l)

Find greatest element in the list

max(l)

Find longest string in the list

max(mylist, key=len)

Find lenght of longest string in the list

len(max(mylist, key=len))

Sum of all elements in the list

sum(l)

Pair Sorting from 2 lists (1 with 1, 2 with 2, 3 mit 3, aso.)

2xls_sum  = [sum(pair) for pair in zip(l1, l2)]

Count of elements in list

len(l)

Count of occurence of the element "a" in the list

l.count("a")

No seperate copy of the list (updates in both lists)

l2 = l1

Separate individual copy of the list (no updates in both lists)

l2 = l.copy()

Separate individual copy of the list - 2nd variant

l2 = l[:]

Separate individual copy of the list - 3rd variant

l2 = list(l)

Seperate copying a nested list - import copy necessary

l2 = copy.deepcopy(l)

First element

l[0]

Last element

l[-1]

Last 3 elements

l[-3:]

Elements from index position 2 to 3

l[2:4]

Elements from 0 to 1 (exclusive index position 2)

l[:2]

Elements from index position 2 to the end of the list

l[2:]

Every second element [start:end:step]

l[::2]

Check if element is in list

9 in l

Check if any elements from the list are in the string

if any(x in st for x in ["ab","cd","de"]):

Check if all elements from the list are in the string

if all(x in st for x in ["ab","cd","de"]):

Check if all elements from the list are NOT in the string

if all(x not in str for x in ["ab","cd"]):

Iterate through list content

for i in l:

Iterate through list with index

for i in range(len(l)):

Iterate through list with index and content

for idx, cont in enumerate(l):

Iterate through 2 lists pair-wise (stops when the shorter list is reached)

for x,y in zip(l1,l2):

Create string with elements joined together with ", " eg. ["a","b","c"] => a, b, c

', '.join(l)

Define list

a = [1,2,3,4,5]

List is mapped with the lambda function => [2,4,6,8,10]

list(map(lambda x: x*2, [1,2,3,4,5]))

Also possible with list comprehension => [2,4,6,8,10]

l=[x*2 for x in l]

Also possible with list comprehension => [2,4]

l=[x for x in l if x%2==0]

List comprehension with if and else

l=[x*2 if x.isdigit() else x for x in l]

List is filtered with lambda for even numbers => [2,4]

list(filter(lambda x: x%2==0, [1,2,3,4,5]))

Build paris as tuple => (1,4),(2,5),(3,6)

zip([1,2,3],[4,5,6])

Remove duplicates from a list

list(set(l))

Remove duplicates from a list and keep existing order

list(dict.fromkeys(l))

Change elements in list to int

list(map(int,l))

Transpose nested list (change rows and columns)

[[x[i] for x in l] for i in range(len(l[0]))]


rotate list for n places

def rotate(l, n):
    return l[-n:] + l[:-n]


Iterate over 2 lists at the same time using zip

animal = ['Cat', 'Dog', 'Fish', 'Goat']
age = [1, 2, 2, 6]
z = zip(animal, age)
for animal, age in z: ...


Adjust nested list to the same length

maxLen = max(map(len, l))
for row in l:
  while len(row) < maxLen:
    row.append(None)

TUPEL

ordered, immutable, allows duplicate elements working can be more efficient than with lists - especially with big data - regarding space and runtime
Create a tuple

t = (4,5,6)

Create a tuple from a list

t = tuple([3,4,4]

Selection / slicing of elements - same as with lists

t[]

Change tuple to list

l = list(t)

Change list to tuples

t = tuple(l)

Assigning vars to tuple-elements (a=0, b=1, c=2)

a,b,c = (0,1,2)

Assigning to tuple-values (f=0, m=[1,2,3], l=4)

f,*m,l = (0,1,2,3,4)

DICTIONARY

key-value pairs, unordered (till version 3.7), mutable
Define empty dict

d = {}

Define dict with content

d = {"one": 1, "two": 2, "three": 3}

2nd way to defince a dict with content

d = dict(one=1,two=2,three=3)

Define dict with content (with dupels)

d = dict([("one",1),("two",2),("three",3)])

Define dict with content (with pairs in nested list)

d = dict([["one",1],["two",2],["three",3]])

Define dict with 2 different lists (1x keys and 1x values)

d = dict(zip(["one","two","three"], [1,2,3]))

Access value with key element (error when the key is not in the dict)

d["two"]

BETTER: Acesss value with get for key element (no error when the key is not in the dict - returns second parameter instead)

d.get("two","N/A")

Access value - same as get - but also initialize the key when it is not in the dict - with the second parameter)

d.setdefault("two","N/A")

Find key for specific value (v) in dict

key = list(d.keys())[list(d.values()).index(v)]

Read keys from dict to list

list(d.keys())

Read values from dict to list

list(d.values())

Count of entries in dict

len(d)

Check if key is in dict (true / false)

"three" in d

New entry for dict (key = "four", value = 4)

d["four"] = 4

Delete specific key in dict

del d["one"]

Rename dict-keyname

mydict[new_key] = mydict.pop(old_key)

Combine 2 dicts (if key is in both dicts - the second value will be taken)

combined_dict = {**d1, **d2}

Check if key is in dict

if "xyz" in d:

Copying a dict (all changes will be made in BOTH dicts)

d2 = d

Copying a dict (NOT WORKING with nested lists in the dict - use deepcopy!)

d2 = d.copy()

Copying a dict (NOT WORKING with nested lists in the dict - use deepcopy!)

d2 = dict(d)

Copying a dict (dicts will be handled seperate) - import copy necessary

d2 = copy.deepcopy(d)

Dict d get updated with d2 (all existing keys are overwritten - and new added)

d.update(d2)

Iterate through dict keys

for key in d.keys():

Iterate through sorted dict keys ascending

for key in sorted(d.keys()):

Iterate through sorted dict keys descending

for key in sorted(d.keys(),reverse=True):

Iterate through dict values

for val in d.values():

Iterate through keys and values of the dict

for key, val in d.items():

built dict with dict-comprehension

d = {x[0]: x[1:] for x in l}

Sort dict descending according to values (=item[1])

d = {k: v for k, v in sorted(d.items(), key=lambda item: item[1], reverse=True)}

Sort dict ascending according to keys (=item[0])

d = {k: v for k, v in sorted(d.items(), key=lambda item: item[0])}

Dict sorted: 1st value-desc (x[0]) - 2nd key-ascnd (-x[0])

d = {k: v for k, v in sorted(d.items(), key=lambda x: (-x[1],x[0]))}

Sort dict and output list, sorted according to the 3rd element of the list in the dict

l = sorted(d.items(), key=lambda e: e[1][2])

Sort dict and output list, sorted according to the 3rd element of the list in the dict - descending

l = sorted(d.items(), key=lambda e: e[1][2], reverse=True)

Merge 2 dictionaries

d3 = {key: d1.get(key,0)+d2.get(key,0) for key in set(d1)|set(d2)}


save dict to pkl-file

import pickle
with open("fn.pkl", "wb") as f:
  pickle.dump(dict, f)


read dict from pkl-file

import pickle
with open('fn.pkl', 'rb') as f:
  d = pickle.load(f)

SETS

unordered, mutable, no duplicates
Define empty set - {} would define a dict and not a set

s=set()

Define set with content

s = {1,1,2,2,3,4}

Define empty set - returns: {"o","l","H","e"}

s = set("Hello")

Define second set

s2 = {1,7,8}

Add element in set1

s.add(5)

Add several elements to set1

s.update([10,11,12])

Intersection of 2 sets (same as: s.intersection(s2))

s & s2

Untion of 2 sets (same as: s.union(s2))

s | s2

Difference of 2 sets (same as: s.difference(s2))

s - s2

True if s is subset from s2 (same as: s.issubset(s2))

s <= s2

Check if element is in set

3 in s

Clear the set

s.clear()

Delete lowest elmeent in set

s.pop()

Delete element 5 from set - but key error possible

s.remove(5)

Delete element 5 from set - NO key error possible

s.discard(5)

Iterate through set content

for i in s:

Copying a set (all changes will be made in BOTH sets)

s2 = s1

Copying a set (set will be handled seperate)

s2 = s.copy()

Copying a set 2nd method (set will be handled seperate)

s2 = set(s)

Define a frozenset - no changes are possible in the set - union, intersection aso. will work

s = frozenset(1,2,3)

CONTROL STRUCTURES and ITERATIONS

If / elif / else

if x > 10: pass
elif x > 10: pass
else: pass


Use 2 conditions in one line (instead of (a>10 and a<20)

if (10 < a < 20)...

Connecting different control structures with logical and &

if a<10 & b>10 & c==4...


5 iterations from 0 to 4

for i in range(5):

5 iterations from 0 to 4 (start, end, step)

for i in range(0, 5, 1)

Iterations descending from 4 to 0

for i in range(4, -1, -1)


While loop with break condition

while x < 4:

Endless while loop - has to be exited with break

while True:

Break loop completely

break

Break actual loop run - continue with next loop run

continue

FUNCTIONS, DECORATORS

description of the function in form of a docstring

def printNumAbbr(value):
'''
eg. Make abbreviaton for numeric value in thousands (K), millions (M), billions (B) or trillions (T)
:param value: numeric value which should be abbreviated
:return: string value with maximum possible abbreviation
'''


Define function - with default value 0 if no input is given

def add(x=0,y=0):
    erg = x+y                   	# Calculation in function
    return erg                  	# Return value from the function


Optional argument in the funtction (first element is must - second optional)

def pet (animal,n1=None,n2="x")
pet("Cat")                      	# Calling function with n1=None and n2 = "x"
pet("Cat","name")					# Calling function with n1=name and n2 = "X"
pet("Cat",n2="xyz") 				# Calling function with n1=None and n2 = "xyz" (third parameter has to be named when calling)


Function with infinite arguments

def varargs(*args): print(args)
varargs(1,2,3)                  	# Outputs (1,2,3)

def keyword_args(**kwargs):
    print(kwargs)
keyword_args("a"=3, "b"=4)      	# Outputs {"a":3, "b":4}


decorator template

def my_decorator(func):				# Define decorator
  @functools.wraps(func)
  def wrapper(*args,**kwargs):		# Decorator with * arguments
	  #Do...					 	# Do something before the functions
	  result=func(*args,**kwargs)	# Run the function
	  #Do...					 	# Do something after the functions
	  return result					# Return the results from the function
  return wrapper


decorator

extend behaviour of a function with a decorators
def start_end_decorator(func):		# Define the decorator with function "func" as input-parameter
	def wrapper():					# Inside the decorator a wrapper function has to be defined
		print("Start")				# Decorated code which is executed before the core code from the function
		func()						# Calling the function itself
		print("End")				# Decorated code which is executed after the core code from the function
	return wrapper					# Results from the decorator have to be given back
@start_end_decorator				# Defines this decorator for the following function "print_name"
def print_name():					# Normal content of the function
	print("xyz")					# Core functionality of the function
print_name()						# Now when the function is executed - outputs not only "xyz" - also "Start" before and "End" after


decorator with function arguments

def start_end_decorator(func):		# Define the decorator with function "func" as input-parameter
	def wrapper(*args,**kwargs):	# Inside the decorator a wrapper function has to be defined
		print("Start")				# Decorated code which is executed before the core code from the function
		result = (*args,**kwargs)	# Calling the function itself
		print("End ")				# Decorated code which is executed after the core code from the function
		return result
	return wrapper					# Results from the decorator have to be given back
@start_end_decorator				# Defines this decorator for the following function "add5"
def add5(x):						# Normal content of the function with one argument
	return x + 5					# Core functionality of the function
result = add5(10)					# Outputs "Start" => "End" => 15
print(result)

GENERATORS, YIELD

very memory efficient
Defines the generator with 3 yield statements

def mygenerator():
	yield 3
	yield 2
	yield 1


Creates the generator and stores in g (=generator type)

g = mygenerator()

Outputs 3,2,1

for i in g: print(i)

Outputs 3

print(next(g))

Outputs 2

print(next(g))

Sum-Function can also use a generator => result is 6

sum(g)

Sorted-Function can use generator => returns list with sorted elements [1,2,3]

sorted(g)

EXCEPTIONS

try / except

try:
	a = 5 / 0
except Exception as e:
	print("Exception: ",e)						# Prints exception "division by zero"


assert / raise

x = -5
assert (x>=0), "x is not positive"				# Checks / Assert some condition => prints "x is not positive"
if x<0: raise Exception("x should positive")	# Raises an exception => prints "Exception: x should be positive"

LAMBDA, MAP, FILTER


functions with one argument - add10(5) => 15

add10 = lambda x: x+10

functions with two arguments for multiplying - mult(2,7) => 14

mult = lambda x,y: x*y

p = [(1,2),(15,1),(5,-1),(10,4)]
output is sorted with first element then second element => [(1,2),(5,-1),(10,4),(15,1)]

sorted(p)

output is sorted by the second element => [(5,-1),(15,1),(1,2),(10,4)]

sorted(p,key=lambda x:x[1])

output is sorted by the sum of both => [(1,2),(5,-1),(10,4),(15,1)]

sorted(p,key=lambda x:x[0]+x[1])

a = [1,2,3,4,5]
list is mapped with the lambda function => [2,4,6,8,10]

list(map(lambda x: x*2,a))

also possible with list comprehension => [2,4,6,8,10]

c=[x*2 for x in a]

list is filtered with lambda for even numbers => [2,4]

list(filter(lambda x: x%2==0,a))

also possible with list comprehension => [2,4]

c=[x for x in a if x%2==0]

CLASSES, DATACLASSES

Define a class

class Human(object):
  species = "Homo Sapiens"    # Fix variable / class variable for all instances of the class
  def __init__(self,name):    # Constructor - automatically applied when an instance is created
      self.name = name		  # Name is assigned to the instance of the classe (self.)
	  self.tresor = []		  # Tresor is assigned as list to the instance of the class (self.)
  def say(self, msg):         # Methode of the class
      return "{name}: {message}".format(name=self.name, message=msg)
	  elf.tresor.append(msg)  # Tesor of the instance gets a new value in the list
  @classmethod                # Class methode - is used by all instances
  def get_species(cls):
      return cls.species
  @staticmethod               # Static methode - is called without class or method
  def grunt():
      return "*grunt*"


Create instance of the class

i = Human(name="Ian")

Call the methode of the class (output: "Ian: Hi")

print(i.say("Hi"))

Create additonal instance of the class

j = Human(name="Joel")

Call the methode of the class (output: "Joel: Hallo")

print(i.say("Hallo"))

Output "Homo Sapiens"

i.get_species()

Same output "Homo Sapiens"

j.get_species()

Change of the class variable - applies for all instances

Human.species = "Was Neues"

Aufruf der statischen Methode => Ausgabe: "grunt"

Human.grunt()


Dataclasses

from dataclasses import dataclass		# import dataclass necessary
@dataclass								# define a dataclass
class Coordinate:						# class is defined an need NO __init__, __repr__, __eq__
    x: int								# __init__ no necessary - values get assigned automatically
    y: int = 10							# __repr__ print for the string-represantion is automatic => Coordinate(x=4, y=5)
a = Coordinate(4, 5)
@dataclass (frozen=True)				# defines a unmutable instance
asdict(dc)								# converts the dataclass to a dict
astuple(dc)								# converts the dataclass to a tuple

TXT FILES


Read textfile - and print it

with open ("fn.txt","r") as f: print(f.read()

Open File for Read and Write

with open ("fn.txt","r+") as f: print(f.read()


Read textfile - stored in list per line

with open(fn, encoding="utf8", errors="ignore") as f:
  lines = f.readlines()


Writing in textfile

with open("fn.txt","w") as obj: obj.write("Ein neuer Text")

Append text in the next line

with open("fn.txt","a") as obj: obj.write("\nNoch ein Text")

Check if file allready exists

if os.path.exists(fn) == False:

Create and initialize file when not exisiting

with open (fn,"a") as f: f.write("init")

Read content to string

f.read()

Set the current postion in the file to beginning

f.seek(0)

Write s to the f-file opened

f.write(s)


Try/Except - checks if file can be saved

while True:
try:                         							# otherwise outputs a error message
  writer.save ()
  break
except Exception as e:
  print(f"Error happened: {e}")
	traceback.print_exc()                               # Outputs the detailed error message
    input ("File Open not possible - pls close and press <Enter>")

JSON FORMAT


Import json-module

import json

Assign JSON-filename

fn = "numbers.json"

Reading informations in json-format

with open(fn) as data: info = json.loads(data)

Convert / Encode a dict to a json-file (with indent for better reading)

json_format = json.dumps(d,indent=2)

Convert / Encode a dict to a json-file (with sorting the keys)

json_format = json.dumps(d,sort_keys=True)

Convert / Decode a json-file to a dict

person = json.loads(json_format)

Reading information in json-format

with open(fn.json,"r") as data: d=json.load(file)

Pretty-Print formatted json-file

print(json.dumps(erg, indent=4, sort_keys=True))


UTF8-errors
when there are utf-8 errors pls try to set the system variables

import sys
sys.stdin.reconfigure(encoding='utf-8')
sys.stdout.reconfigure(encoding='utf-8')


read JSON from file
ignoring errors and handles multiple json-objects

with open(fnJSON, errors="ignore") as f:
  data = [json.loads(line) for line in f]


read JSON from website
ignoring errors and handles multiple json-objects

r = requests.get(tmpJSONWord)
jsonStrList = r.text.split("\n")
jsonList = [json.loads(j, errors="ignore") for j in jsonStrList if j != '']


write JSON to txt-file

with open('dataDetail.json', 'w', encoding='utf-8') as f:
  json.dump(ergDetail, f, ensure_ascii=False, indent=4)


Example for json-file

{
	"firstName" : "Chuck",
	"lastName" : "Doe",
	"hobbies": ["running","swimming","singing"],
	"age": 28,
	"hasChildren": true
	"children" = [
		{
			"firstName" : "Alex"
			"age" : 5
		},
		{
			"firstName" : "Bob"
			"age" : 7
		}
	]
}


Read name value => "Chuck"

info["firstName"]

Iterate trough the json-file (eg. many children)

for item in children:

XML FORMAT


Import xml-module

import xml.etree.ElementTree as ET


Example for xml-file

<persons>
	<person>
		<name>Chuck</name>
		<phone type="int1"> +1 734 555"</phone>
		<email hide="yes"/>
	</person>
	...
</persons>


Read xml-file into tree

tree = ET.fromstring(data)

Read name text => "Chuck"

tree.find("name").text

Read hide value from email => yes

tree.find("email".get("hide"))


Iterate trough xml-file (eg. many persons)

lst = persons.findall("persons/person")
for item in lst:

URLLIB

https://pymotw.com/2/urlparse/ https://docs.python.org/3/library/urllib.request.html#module-urllib.request
Import Module for URL parsing

import urllib.parse

Import Module for request

import urllib.request

Encode url

link = urllib.parse.quote_plus(link)

Uncode url

link = urllib.parse.unquote_plus(link)

Parse HTML-link1

u = urlparse("http://google.com/search")

Returns "http"

u.scheme

Returns "/search"

u.path

Returns "google.com"

u.netloc


Download picture from a url

  opener = urllib.request.URLopener()
  opener.addheader('User-Agent', "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.106 Safari/537.36")
  filename, headers = opener.retrieve(url, 'Test.jpg')


Download xlsx/csv from a url

  from urllib.request import urlopen, urlretrieve
  urlretrieve(hrefLink, filename)

MODULE - CSV

import csv
Read csv-file to nested-list

fn = os.path.join(path, "inp.csv")
with open(fn, "r", encoding="utf-8") as f:
  csvRead = csv.reader(f, delimiter=";")
  inpData = [x for x in csvRead]

# when this error happens: _csv.Error: line contains NULL
with open(workFN, "r", encoding="utf-16") as f:
  ...

# when this error happens: UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96
with open(workFN, "r", encoding="mac_roman") as f:
  ...


Create an empty csv-file

  with open(fnName, "w") as f:
    pass


Write csv-file from nested list

FN = os.path.join(path, "output.csv")
with open(FN, 'w', newline='', encoding="utf-8") as f:
  writer = csv.writer(f, delimiter=";")
  writer.writerows(l)


Read csv-file to a dictionary for every line

with open ("products.csv", "r", encoding="utf-8") as f:
  dictReader = csv.DictReader(f)
  for item in dictReader:
    for k,v in item.items():
      print(k,v)


Read csv-data from a HTML-link

import urllib.request
import codecs
url = link                                                      # direct link to a csv html file
ftpstream = urllib.request.urlopen(url)
csvfile = csv.reader(codecs.iterdecode(ftpstream, 'utf-8'))     # output is nested list

MOUDLE - ZIPFILE - handling ZIP files


Import zip module

import zipfile

Read Zip-File

zf = zipfile.ZipFile('example.zip', 'r')

List of string with the names of the files in the Zip-File

zf.namelist()

Open file vom Zip-File (eg. CSV-File which is in the Zip-File)

zf.open (file_name)

MODULE - GSPREAD - API for Google Sheets

see also Python-Diverses / GSpread
API for GoogleSheets / Tutorial explaining handling
https://techwithtim.net/tutorials/google-sheets-python-api-tutorial/

# Create a project on https://console.cloud.google.com/
# select name + click create + select project
# 3 lines upper/left - API/Services - Library
# search for google drive API + enable it
# search for google sheets api + enable it
# 3 lines upper/left - API/Services - Credentials
# create credentials for Service Accounts + define name + press Continue
# choose role "editor" + press Continue + press Done
# press on created service account + press keys + Add Key + Create New Key
# select json + and store json-file with name creds.json
# copy client_email entry from the json-file
# in google sheets - got to share in the document
# paste the email from above + press send


Hosting and Scheduling on Google Cloud Platform groupcollide
https://jrizmal.medium.com/deploying-a-python-web-scraper-on-google-cloud-platform-12fe24e57bb0

# select project  (in titlebar right from Google Cloud Platform)
# create a function (search for cloud functions)
# set trigger (Cloud Pub/Sub) and create/select topic
# files needed for hosting: main.py (inside def hello_pubsub), requirements.txt, creds.json
# create schedule (search for cloud scheduler)


See further setup in tutorial

import gspread                                                      # import module
import oauth2client                                                 # import module
from oauth2client.service_account import ServiceAccountCredentials  # import module for account credentials
from pprint import pprint                                           # import module for better output formatting
scope = ["https://spreadsheets.google.com/feeds",
         'https://www.googleapis.com/auth/spreadsheets',
         "https://www.googleapis.com/auth/drive.file",
         "https://www.googleapis.com/auth/drive"]
creds = ServiceAccountCredentials.from_json_keyfile_name
    ("creds.json", scope)


Credentials Mgmt for GoogleSheets

gc = gspread.authorize(creds)

Open the Google Sheet

wb1 = gc.open("NameOfSheet)

Open the first worksheet in the Google Sheet

ws1 = wb1.get_worksheet(0)

Get all worksheets as worksheet-objects

listWS = wb1.worksheets()

Get all worksheets by n

listWS = [x.title for x in wb1.worksheets()]

Open the spreadhseet

sheet = client.open("testpython").sheet1

Get a list of all records

dataList = ws1.get_all_values()

Get specific area

data = ws1.get('B5:K8')

Get formula of a cell

ws1.acell('B1', value_render_option='FORMULA').value

Update cell range with nested list

ws1.update('A1:Z10', [[1, 2], [3, 4]])

Insert new row at row number 2

ws1.insert_row(["new1","new2","new3"],2)

Append new row at the end

ws1.append_row(["new1","new2","new3"])

Sorting the sheet (Range + col1 asc + col2 desc)

ws1.sort((1, 'asc'), (2, 'des'), range='A2:G20')

Get row 3 (starts counting from 1 - not 0)

row = sheet.row_values(3)

Get col E (starts counting from 1 - not 0)

col = sheet.col_values(5)

Get values from col B rows 4 and 5

col = sheet.col_values(2)[3:5]

Get the value from row 1 and col B

cell = sheet.cell(1,2).value

Update cell from row 2 and col B

ws1.update_cell(2,2, "CHANGED")

Define cell area / cell row

cell_list = worksheet.range('A1:G1')

Define new values for area / row

new_values = [1,2,3,4,5]


Gspread Formatting
https://pypi.org/project/gspread-formatting/

# Set format for area:
	from gspread_formatting import *
	fmt = cellFormat(
		backgroundColor=color(1, 0.9, 0.9),
		textFormat=textFormat(bold=True, foregroundColor=color(1, 0, 1)),
		horizontalAlignment='CENTER'
		)
	format_cell_range(worksheet, 'A1:J1', fmt)
# Conditional formatting
	from gspread_formatting import *
	worksheet = some_spreadsheet.worksheet('My Worksheet')
	rule = ConditionalFormatRule(
		ranges=[GridRange.from_a1_range('A1:A2000', worksheet)],
		booleanRule=BooleanRule(
			condition=BooleanCondition('NUMBER_GREATER', ['100']),
			format=CellFormat(textFormat=textFormat(bold=True), backgroundColor=Color(1,0,0))
		)
	)
	rules = get_conditional_format_rules(worksheet)
	rules.append(rule)
	rules.save()

	# or, to replace any existing rules with just your single rule:
	rules.clear()
	rules.append(rule)
	rules.save()

MODULE - OPENPYXL - working with excel worksheets


https://openpyxl.readthedocs.io/en/stable/api/openpyxl.worksheet.worksheet.html#openpyxl.worksheet.worksheet.Worksheet.PAPERSIZE_A3

Documentation:

import openpyxl as on_text
Read xlsx

wb = ox.load_workbook("test.xlsx")

Read worksheetnames to list

l = wb.sheetnames

Select specific worksheet in workbook

ws = wb["sheet1"]

Select first worksheet

ws = wb[wb.sheetnames[0]]

Select active worksheet from workbook

ws = wb.active

Value from specific cell

val = ws["A1"].value


Import module for loading a workbook

from openpyxl import load_workbook

Import module for creating a workbook

from openpyxl import Workbook

Load xlsx

wb = load_workbook(("Test.xlsx"))

Load xlsm

wb = load_workbook(filename="Test.xlsm",

read_only=False, keep_vba=True)
Create new workbook

wb = openpyxl.Workbook()

Create new worksheet (at the end)

wb.create_sheet("ws")

Create new worksheet (at the beginning)

wb.create_sheet("ws",0)

Create new worksheet (at the second last position)

wb.create_sheet("ws",-2)


Change worksheet name

ws.title = "xyz"


Value from specific cell (other method)

ws.cell(row=1, column=1).value

Assign value to specific cell

ws["A1"] = 97

Assign formula to specific cell (only english names for function and arguments must be sepearted by "," and not ";")

ws["A1"] = "=SUBTOTAL(1,I6:I10000)"

Save workbook to xlsx

wb.save("Test2.xlsx")

Sort worksheets in workbook

wb._sheets.sort(key=lambda ws: ws.title)

Copy worksheet in workbook

wb.copy_worksheet(ws)

Copy part of the worksheet

ws1.range('A1:Z20').copy(wb2.sheets[0].range('A1:Z20'))


Loop trough specific area and print the cell values

for row in ws["A1":"C3"]:
    for cell in row: print(cell.value)


Loop trough specific column and print the cell values

for cell in ws["C"]: print(cell.value)

Insert column before col 3

ws.insert_cols(3)

Insert 2 columns before col 3

ws.insert_cols(3,2)

Insert row before row 7

ws.insert_rows(7)

Insert 5 rows before row 10

ws.insert_rows(10,5)

Delete column C

ws.delete_cols(3)

Delete column F to H

ws.delete_cols(6,3)

Delete row 5

ws.delete_rows(5)

Delete row 5 to 7

ws.delete_rows(5,3)

Move the cells from D4:F10 up one row and right two columns

ws.move_range("D4:F10", rows=-1, cols=2)

Append the values at the bottom of the sheet

ws.append([1,"A",2,"C"])

Returns dimension of the worksheet eg. "A1:M24"

ws.dimensions

Returns max column / row count of the worksheet

ws.max_column / ws.max_row

Returns min column / row count of the worksheet (which contains data)

ws.min_column / ws.min_row

Rows 1:3 to be printed at the top of every page

ws.print_title_rows(1:3)

Delete worksheet in workbook

del wb["sheet4"]

Define filters in columns A5 to Q5

ws.auto_filter.ref = "A5:Q5"

Set filter to values "X","Y","Z" in column 2

ws.auto_filter.add_filter_column(2,["X","Y","Z"])

Sort elements in filter for columsn B2 to B15

ws.auto_filter.add_sort_condition("B2:B15")

Close workbook

wb.close()


Iterate trough worksheet row by row

for row in ws.iter_rows():
    for cell in row: print(cell.value)


Iterate trough worksheet row by row (from A1 to C6 - 6 lines)

for row in ws.iter_rows
	(min_row=1,min_col=1,max_row=6,max_col=3):


Iterate trough worksheet col by col

for col in sheet.iter_cols():
    for cell in col: print(cell.value)


Iterate trough worksheet col by col (from A1 to C6 - 3 lines)

for col in ws.iter_cols
	(min_row=1,min_col=1,max_row=6,max_col=3):


Read whole worksheet to a nested list

data_list = []
for row in ws.iter_rows ():
    zeile = []
    for cell in row:
        if cell.value is None:
            zeile.append ('')
        else:
            zeile.append (cell.value)
    data_list.append (zeile)


Saving a nested list in XLSX

import pandas as pd
from openpyxl import load_workbook
cont = [[row1_cell1, row1_cell2], [row2.cell1,row3_cell3]]      # Nested list for saving
book = load_workbook ("fn.xlsx")                                # load existing XLSX - skip when overwriting
writer = pd.ExcelWriter ("fn.xlsx", engine='openpyxl',          # Define writer from pandas
                         options={'strings_to_numbers': True})
pd.DataFrame (cont).to_excel (writer, sheet_name="WS1",			# Prepare Data for XLSX and worksheet1
                              header=False, index=False)
pd.DataFrame (cont).to_excel (writer, sheet_name="WS2",			# Prepare Data for XLSX and worksheet2
                              header=False, index=False)
writer.save()                                                   # Save XLSX with the new worksheets
writer.close()													# Close writer


Checking if xlsx is open while trying to save

while True:
    try:
        writer.save ()
        writer.close ()
        break
    except Exception as e:
        print ("Error: ", e)
		traceback.print_exc()
        input ("Datei kann nicht geöffnet werden - bitte schließen und <Enter> drücken!")


Read column to a list

mylist = []
for col in ws['A']:
     mylist.append(col.value)


Automatic adjustment of the columns accoring to best fit

column_widths = []
ws = writer.sheets[stock]
for row in content:                                    # Determination of the longest value per column
    for i, cell in enumerate (row):
        if len (column_widths) > i:
            if len (str (cell)) > column_widths[i]:
                column_widths[i] = len (str (cell))
        else:
            column_widths += [len (str (cell))]
    for i, column_width in enumerate (column_widths):  # Col 0 and 1 with fixed length - rest according to longest value in col
        if i == 0:
            ws.column_dimensions[get_column_letter (i + 1)].width = 35
        elif i == 1:
            ws.column_dimensions[get_column_letter (i + 1)].width = 32
        else:
            ws.column_dimensions[get_column_letter (i + 1)].width = column_width + 2


Formating the xlsx when a cell is formated 2 times - it gets overwritten)
import openpyxl.styles

from openpyxl.styles import Font, PatternFill, Border, Side, Alignment

import formatting, styles

from openpyxl import formatting, styles

bold font

bold = Font (bold=True)

yellow background (use color picker)

bg_yellow = PatternFill (fill_type="solid", start_color='fbfce1', end_color='fbfce1')

grey background (use color picker)

bg_grey = PatternFill (fill_type="solid", start_color='babab6', end_color='babab6')

blue background (use color picker)

bg_green = PatternFill (fill_type="solid", start_color='8af542', end_color='8af542')


define border on very side

frame_all = Border (left=Side (style='thin'), right=Side (style='thin'),
                    top=Side (style='thin'), bottom=Side (style='thin'))


border only on top and bottom

frame_upanddown = Border (top=Side (style='thin'), bottom=Side (style='thin'))

font size 14

size14 = Font (bold=True, size="14")

define left alignment

left_allign = Alignment (horizontal="left")

define right alignment

right_allign = Alignment (horizontal="right")


define right alignment for area

for row in ws["D1":"G34"]:
    for cell in row: cell.alignment = right_allign


define full border for several areas

areas = ["A7:G19","A27:G31"]


format area

for area in areas:
    for row in ws[area]:
        for cell in row:
			cell.border = frame_all
			cell.number_format = "0"															=> define number format without decimals
			cell.number_format = "0.000E+00"													=> define number format with 3 decimals


set background and size for several specific cells

for i in ["A6","D6","E6","A26","D26","E26"]:
    ws[i].fill = bg_green
    ws[i].font = size12


freeze worksheet at cell B2 for right and down scrolling

freeze = ws["B2"]
ws.freeze_panes = freeze


fit xlsx to one page for printing - 1st part

ws.sheet_properties.pageSetUpPr.fitToPage = True

fit xlsx to one page for printing - 2nd part

ws.page_setup.fitToHeight = False

set page to landscape horizontal

ws.set_printer_settings(paper_size=1, orientation = 'landscape')

set page to landscape horizontal

ws.set_printer_settings(paper_size=1, orientation = 'portrait')

Define red color background

red_color = 'ffc7ce'

Define red color font

red_color_font = '9c0103'

Define red_font with size / bold / red color

red_font = styles.Font (size=14, bold=True, color=red_color_font)

Define red_fill with color red / fill type solid

red_fill = styles.PatternFill (start_color=red_color, end_color=red_color, fill_type='solid')


Define conditional formating for area, <0, => fill with red background

ws.conditional_formatting.add ('A1:Z100',
	formatting.rule.CellIsRule (operator='lessThan',formula=['0'],fill=red_fill))


Define conditional formating for area, <0, => use red font

ws.conditional_formatting.add ('A1:G25',
	formatting.rule.CellIsRule (operator='lessThan',formula=['0'],fill=red_fill,font=red_font))


Conditional formating for more than one area

for area in ["B1:B100","C2:G100"]:
	ws.conditional_formatting.add (area, ...)


Conditional formating according to a formula

ws.conditional_formatting.add('E1:E10',
	FormulaRule(formula=['ISBLANK(E1)'], stopIfTrue=True, fill=redFill))


Conditional formating according to a formula

ws.conditional_formatting.add('D2:D10',
	FormulaRule(formula=['E1=0'], font=myFont, border=myBorder, fill=redFill))


Color Scale formating

# Define rule
ruleBottomGreen = ColorScaleRule(
                        start_type='percentile', start_value=10, start_color="63BE7B", # green
                        mid_type='percentile', mid_value=50, mid_color= "FFFB84", # yellow
                        end_type='percentile', end_value=90, end_color="F8696B") # red
# Apply rule
ws2.conditional_formatting.add(f"K4:K12", ruleBottomGreen)

MODULE - XLWINGS - working on the fly with open excel worksheets

Update XLSX in reealtime
Import module

import xlwings as xw

Use xlwings with hidden excel-worksheets

app = xw.App(visible=False)

Create a new workbook

wb = xw.Book()

Read XLSX

wb = xw.Book ("name.xlsx")

Read XLSX with password protection

wb = xw.Book ("name.xlsx", password("xyz"))

Read specific worksheet

ws = wb.sheets["name_sheet"]

Read first worksheet

ws = wb.sheets[0]

Add worksheet in workbook

wb.sheets.add("NewWS")

Reads the formula of a cell (not the value)

f = ws["A1"].formula

Assign formula to a cell

ws["A1"].formula = string

Iterate trough cells an set them to "" / None

for i in ws.range("A3:A7"): i.value = ""

Sort worksheet in the first col (function see below)

xl_col_sort(ws,1)

Reads the color of a cell and returns a rgb-tuple eg. (146, 208, 80)

ws["C8"].color

Set value for cell

ws["C8"].value

Read hyperlink of cell

ws.range ("A2").hyperlink

Set values for specific cell

ws.range("A1:C3").value

Remove background from a cell

ws["C8"].color = None

White: Assigns background color for a range of cells

ws.range("A1:C3").color = (255,255,255)

Green: Assigns background color

ws.range("A1:C3").color = (149,237,173)

Red: Assigns background color

ws.range("A1:C3").color = (237,149,149)

Yellow: Assigns background color

ws.range("A1:C3").color = (240,232,146)

Orange: Assigns background color

ws.range("A1:C3").color = (230,172,80)

Blue: Assigns background color

ws.range("A1:C3").color = (143,157,235)

Assign numeric format with 2 decimals

ws.range("A1:C3").number_format = "0,00"

Assign percentage format with 2 decimals

ws.range("A1:C3").number_format = "0%"

Set cell to bold

ws.range("B1").api.Font.Bold = True

Windows: Black: Set font color to black (see codes: https://access-excel.tips/excel-vba-color-code-list/)

ws.range("B2").api.Font.ColorIndex = 1

Windows: Red: Set font color to red (see codes: https://access-excel.tips/excel-vba-color-code-list/)

ws.range("B2").api.Font.ColorIndex = 3

Windows: Set font size to 16

ws.range("A1").api.Font.Size = 16

Windwos: Set specific font

ws.range("A1").api.Font.Name = "Arial"

Mac: Set font color to dark grey

ws.range("B2").api.font_object.color.set((105,105,105))

Autofit the columns

ws2.autofit(axis="columns")

Autofit the columns and rows

ws2.autofit()

Set wrap text to False

ws2.range(f"A2:Z10000").api.WrapText = False

Insert a hyperlink and define which text should be shown in the cell

ws["D5"].add_hyperlink(link, text_to_display = "Chart")

add an image at a specific cell

wsInp.pictures.add(imgFN, left=wsInp.range("D20").left, top=wsInp.range("D20").top, height=100)

Get all the pictures of a worksheet in a list

ws.pictures

Delete the first picture in the worksheet

ws.pictures[0].delete()

Delete the first picture with the name "xyz"

ws.pictures["xyz"].delete()

Disable auto filter in worksheet

ws.api.AutoFilterMode = False

Clears the content and formatting of the whole sheet

ws.clear()

Clears the content but leaves formatting

ws.clear_contents()

Read worksheetnames to list

l = [x.name for x in wb.sheets]

Delete worksheet in workbook

wb.sheets["OldWS"].delete()

Hide worksheet in workbook

wb.sheets["WS"].api.Visible = False

Copy worksheet

ws.copy(name='copied')

Save workbook

wb.save()

Close workbook

wb.close()

Quit App when opened at the beginning

wb.quit()

Bring workbook to the front of the screen

wb.activate(steal_focus=True)

Load elements in list from range which are not None

[x for x in ws.range("A3:A2000").value if x != None]

copy range from one sheet to another

ws.range("A1:A10").value = ws2.range("A1:A10").options(ndim=2).value

Move worksheet at the beginning

ws1.api.Move(Before=wsWork.api)

Move worksheet at the end

ws1.api.Move(After=wsWork.api)


Get address and subaddress from hyperlinks

for l in ws1['H18'].api._inner.Hyperlinks:
  wAddr = l.address         # brings results for http-links
  wSubAddr = l.SubAddress   # brings results for internal file-links


Hide worksheet in windows

app = xw.App()
wb = app.books.open(FN)
app.visible = False
ws = wb.sheets[0]


Make border / frame for cells
Definiton for linestyles see: https://docs.microsoft.com/en-US/dotnet/api/microsoft.office.interop.excel.xllinestyle?view=excel-pia
Definiton for border-sites see: https://docs.microsoft.com/en-us/office/vba/api/excel.xlbordersindex

for e in ws.range("B3:D3"):
  for site in [7,8,9,10]:
    e.api.Borders(site).LineStyle = 1


Update row / change whole row

l = ws.range("A2:A6").value         # Read specific cells to list (eg. [2,4,6,7,8])
l = [0,1,2,3,4]											# Change list
ws.range("A2:A100").value = l       # Writeback the updates to cells


Update cell area / change 3 elements

v = ws.range("A2:F9").value
v[3][2] = 9999999
v[5][1] = 8888888
v[7][5] = 5555555
ws.range("A2:F9").value = v


Sorting workssheet in given col

def xl_col_sort(sheet,col_num):
    sheet.range((2,col_num)).api.Sort(Key1=sheet.range((2,col_num)).api, Order1=1)
    return


Delete several rows / columns

from xlwings.constants import DeleteShiftDirection
ws.range('2:4').api.Delete(DeleteShiftDirection.xlShiftUp)      # delete rows 2 to 4
ws.range('A:C').api.Delete(DeleteShiftDirection.xlShiftUp)      # delete cols A to C

MODULE - PYTHON-DOCX - working with word docx


Import the module

import python-docx

Import the library

from docx import Document

Read file

doc = Document('file.docx')

Save file

doc.save('file.docx')

Read all tables

t = doc.tables

Access specific element in table

t[0].rows[0].cells[0].paragraphs[0].text

Assign style to specific paragraph

t[0].rows[0].cells[1].paragraphs[0].style

Read all paragraphs

p = doc.paragrapsh


Set the Normal style to specific font and size

(assigning it to specific paragraph see above)
from docx.shared import Pt
style = document.styles['Normal']
font = style.font
font.name = 'Arial'
font.size = Pt(10)


Replace text and keep the previous style

(iteration over runs necessary)
for para in doc.paragraphs:
  for run in para.runs:
    run.text = "New Text"

MODULE - WIN32COM - creating worksheets as pdf from xlsx

Create Worksheets as PDF from XLSX
Import Win32 Module

import win32com.client

Import OS Module

import os

Define Input-XLSX

inp = os.getcwd() + "\\" +"excel.xlsx"

Define Output-PDF

out = os.getcwd() + "\\" +"ws.pdf"

Initialize Excel Application

o = win32com.client.Dispatch("Excel.Application")

Do everything hidden

o.Visible = False

Open XLSX

wb = o.Workbooks.Open(inp)

Create PDF from the active worksheet in the xlsx

wb.ActiveSheet.ExportAsFixedFormat (0, out)

Close XLSX

wb.Close(True)

Number of Worksheets

wb.Sheets.Count

Select 3 different worksheets by number (for exporting afterwards)

wb.WorkSheets ([3,4,8]).Select()

Assign worksheet 4

ws = wb.Worksheets[4]

No Zooming

ws.PageSetup.Zoom = False

Fit to 1 height

ws.PageSetup.FitToPagesTall = 1

Fit to 1 width

ws.PageSetup.FitToPagesWide = 1

Select specific print area

ws.PageSetup.PrintArea = "A1:G50"

MODULE - PyPDF2 - working with pdfs

Working with PDFs
Read PDF informations / metadata

from PyPDF2 import PdfFileReader        # Import Module for PDF Reading
with open (fn, 'rb') as f:
    pdf = PdfFileReader (f)				# Read PDF informations
    info = pdf.getDocumentInfo ()		# Read PDF infos^
    number_of_pages = pdf.getNumPages ()    # Read number of pages
	print(info)							# Outputs all informations
	print(info.title)					# Outputs title of the pdf
	print(info.author)					# Outputs author of the pdf


Read Text from Pdf

page = pdf.getPage (7)		# Select page 8
text = page.extractText ()	# Extract text
print (text)				# print (text


Split PDFs

from PyPDF2 import PdfFileWriter		    # Import Module for PDF Writing
pdf_writer = PdfFileWriter()				# Create new instance of PDF writer
pdf_writer.addPage(page)                    # Add Page to writer
with open(output_filename, 'wb') as out:	# Write splitted PDF
	pdf_writer.write(out)


Merge PDFs

from PyPDF2 import PdfFileMerger		    # Import Module for PDF Merging
pdf_merger = PdfFileMerger()				# Create new instance of PDF merger
pdf_merger.append(path)                     # Append Page for merging
with open(output_path, 'wb') as fileobj:	# Write merged PDF
	pdf_merger.write(fileobj)


Rotate clockwise

page.rotateClockwise(90)

Rotate counter clockwise

page.rotateCounterClockwise(90)

Overlaying / Watermarking two pages

page.mergePage(watermark_page)

MODULE - PYODBC - odbc-access (MS Access)
import the module

import pyodbc


Define connection and cursor for access-db

FN = "dataOdd.accdb"
path = os.path.abspath(os.path.dirname(sys.argv[0]))
fn = path + "/" + FN
msAccess = r"{Microsoft Access Driver (*.mdb, *.accdb)}"
connPath = f'Driver={msAccess};DBQ={fn};'
conn = pyodbc.connect(connPath)
cursor = conn.cursor()


Read data

cursor.execute('select * from dataOdds')
for row in cursor.fetchall():
    print (row)


Insert data

sql = "INSERT INTO dataOdds (" \
      "[Date]," \
      "Name," \
      "Links," \
      "Fixture) " \
      "VALUES (?,?,?,?)"
cont = [
        datetime.today().date(),
        "datetime",
        "ichbineinlink.html",
        "01:  -  Race Test Race"
       ]
cursor.execute(sql, cont)
conn.commit()


Update data

sql = "UPDATE dataOdds " \
      "SET Links = ? " \
      "WHERE Date = ? " \
      "AND Name = ? " \
      "AND Fixture = ?"
cont = [
        "ABC",
        "2022-03-24",
        "Albury",
        "02:35  -  Race 1 Trackpix maiden showcase plate"
       ]
cursor.execute(sql, cont)
conn.commit()


Delete data

### delete data
sql = "DELETE FROM dataOdds " \
      "WHERE Date = ? " \
      "AND Name = ? " \
      "AND Fixture = ?"
cont = [
        "2022-04-22",
        "ABCDEFGH",
        "01:  -  Race Test Race"
       ]
cursor.execute(sql, cont)
conn.commit()

MYSQL - MARIADB - HEIDISQL


Data where the MariaDB-database is stored

C:\Program Files\MariaDB 10.3\data

Data where the MySQL-database is stored

C:\ProgramData\MySQL\MYSQL Server 8.0\data

Shows the database-version in the column header

HeidiSQL - Datei / Verbindungsmanager


Transfer whole table from local db to hosted db

right click on database / table to transfer and select "export database as sql"
for "output" select eg. "Server: A2Hosting" (connection to the hosted server)
for "database" select eg. "xyztec_stockdb" (name from the db on the hosted server)
for "data" select "replace existing data"
when format of table should be changed also select checkbox "Drop" at table (not database!)
press "export" (wait till the rows-column shows 100%)


Backup MySQL / MariaDB with HeidiSQL
see: https://manjaro.site/backup-mysql-database-using-heidisql/

select DB - <Tools>, <Export Database as SQL>
select Drop-Checkboxes and Create-Checkboxes
Data: Delete + insert (truncate existing data)
Output: single .sql file
Filename: choose destination file and folder

MYSQL - MARIADB - SQL


Both modules have to be installed

pip install mysql, mysql.connector

Import module for mysql access

import mysql.connector

Define connection

mydb = mysql.connector.connect (host="localhost",user="root",passwd="pwd")

Set Cursor on mySQL

mycursor = mydb.cursor()

Creates database "db"

mycursor.execute("CREATE DATABASE db")


Show databases and informations

mycursor.execute("SHOW DATABASES")
    for db in mycursor: print(db)


Create new table with 2 cols

mycursor.execute("CREATE TABLE students (name VARCHAR(255),age INTEGER(10))")

Show tables and informations

mycursor.execute("SHOW TABLES") for tb in mycursor: print(tb)

Insert row in table with the 2 values using sql injection

sqlFormula = "INSERT INTO students (name, age) VALUES (%s, %s)"

Define a row with values to insert

student1 = ("Rachel", 22)

Execute the sql statement to add one row

mycursor.execute(sqlFormula,student1)

Commit the change (otherwise no update will be saved to db)

mydb.commit()

Define more rows with values

students = [("Tom", 22),("Mark", 29),("Clara", 52)]

Execute the sql statement to add more row (from list / tuple)

mycursor.executemany(sqlFormula,students)

Select col "age" from table "students"

mycursor.execute("SELECT age FROM students")

Read all rows according the above select statement

myresult = mycursor.fetchall()

Read one / first row according the above select statement

myresult = mycursor.fetchone()

Output the read row / table informations

for row in myresult: print(row)

Define select (greater than)

sql = "SELECT * FROM students WHERE age > 25"

Define select (with like clausel)

sql = "SELECT * FROM students WHERE name LIKE 'M%'"

Execute SQL

mycursor.execute(sql)

Define select (with sql injection)

sql = "SELECT * FROM students WHERE name = %s"

Execute SQL with parameters

mycursor.execute(sql, ("Mike", ))

Update row where name = "xyz"

sql = "UPDATE students SET age = 82 WHERE name='xyz'"

Limit the output to 5 rows

sql = "SELECT * FROM students LIMIT 5"

Limit the output toi 5 rows and starting with element 2

sql = "SELECT * FROM students LIMIT 5 OFFSET 2

Output content - ascending order

sql = "SELECT * FROM students ORDER BY name"

Output content - descending order

sql = "SELECT * FROM students ORDER BY name DESC"

Delete row with name "xyz"

sql = "DELETE FROM students WHERE name = 'xyz'"

Drop whole table (if exists helps if there no table anymore)

sql = "DROP TABLE IF EXISTS students"

Reset auto-increment in a table

ALTER TABLE mytable AUTO_INCREMENT=0

Delete the data inside a table

TRUNCATE TABLE tablename

Delete the data AND the whole table-structure itself

DROP TABLE tablename


SELECT ticker name for specific ticker

sql = "SELECT tickerName FROM stock_main WHERE ticker=%s"   # Or Select * to get all values from the table
var = [(stockTicker)]
c.execute(sql,var)
data = c.fetchall()


SELECT maximum calcDate for specific ticker

sql = "SELECT max(calcDate) FROM scores WHERE ticker = %s"
var = [(stock)]
c.execute (sql, var)
lastCalcDate = c.fetchone()


INSERT new row in a table

sql = "INSERT INTO scores (" \
    "ticker," \
    "calcDate," \
    "stockName," \
    "profitGrowth)" \
    " VALUES (%s,%s,%s,%s)"  # as many %s necessary as insert attributes
cont = [inputList]          # list with 4 elements
c.execute (sql, cont)
mydb.commit ()


UPDATE row in a table

sql = "UPDATE stock_statistics " \
        "SET " \
        "marketCap=%s," \
        "EV=%s," \
        "Ebitda=%s" \
        " WHERE ticker=%s AND ultimo=%s"
cont = [(valMarketCap,
        valEV,
        valEbitda,
        stockTicker,
        ultimo)]
c.executemany (sql, cont)
mydb.commit ()


DELETE row in a table

sql = "DELETE FROM scores" \
    " WHERE ticker = %s and calcDate = %s"
cont = [stock,dateElem]
c.execute (sql, cont)
mydb.commit ()


DELETE everything in a table

sql = "DELETE FROM scores"
c.execute (sql)
mydb.commit ()

SQL ALCHEMY


Import MODULE

from sqlalchemy import create_engine

necessary to install - otherwise problems eg. with api-accesses to the db

pip install pymysql

Define access to the MySQL-DB (username, pw, dbname)

engine = create_engine("mysql+pymysql: //user:pw@localhost/dbname?host=localhost?port=3306")

Establish connection

conn = engine.connect()

Read all existing tables from the databasen - returns list

engine.table_names()

Read data from db - returns list with tupels as rows

conn.execute("SELECT * FROM actor").fetchall()

Close connection at the end

conn.close()

Close sql engine at the end

engine.dispose()


Define object / table

from sqlalchemy import Table, Column, Integer, String, Float, MetaData
meta = MetaData()
students = Table(
   'students',
   meta,
   Column('id', Integer, primary_key = True),
   Column('name', String(45)),
   Column('lastname', String(45)),
)


Create the table(s) according to the metadata-definition

meta.create_all(engine)

name of the table => students

user_table.c.name

all column-names / keys as list => ['id', 'name', 'lastname']

user_table.c.keys()

shows the primary key of the object

user_table.primary_key


Create INSERT / UPDATE / DELETE / SELECT statements

students.insert()                                           # Creates INSERT Statement according due the whole students structure
	=> shows: INSERT INTO students (id, name, lastname, floatvar) VALUES (:id, :name, :lastname, :floatvar)
students.insert().values(name = 'Karan')                    # Creates INSERT Statement for specific attribute
	=> shows: INSERT INTO students (name) VALUES (:name)
students.update()                                           # Creates UPDATE Statement
	=> shows: UPDATE students SET id=:id, name=:name, lastname=:lastname, floatvar=:floatvar
students.delete()                                           # Creates DELETE Statement
	=> shows: DELETE FROM students
students.select()                                           # Creates SELECT Statement
	=> shows: SELECT students.id, students.name, students.lastname, students.floatvar  FROM students


Execute SQL command - INSERT

conn = engine.connect()                                     # Insert one row
ins = students.insert().values (name = 'Brauneder', lastname = 'Karl')
result = conn.execute(ins)
result.inserted_primary_key                                 # Shows the inserted primary key
conn.execute(students.insert(), [                           # Insert more rows
   {'name':'Krankl', 'lastname' : 'Hans'},
   {'name':'Herzog','lastname' : 'Andreas'},
])


Execute SQL command - SELECT

s = students.select()                                       # Creates select statement for SELECT *
result = conn.execute(s)                                    # Excenute select statement
for row in result: print(row)                               # Outputs the result rows for select
s = students.select().where(students.c.id > 10)             # Creates select statement with WHERE clausel


Execute SQL with Textual SQL

from sqlalchemy.sql import text
t = text("SELECT name FROM students")
result = conn.execute(t)
for row in result: print(row)
t = text("select students.name, students.lastname from students where students.name between :x and :y")
result = conn.execute(t, x = 'A', y = 'L')                  # Select statement with where-parameters


Using Aliases

from sqlalchemy.sql import alias, select
st = students.alias("a")
s = select([st]).where(st.c.id > 2)
conn.execute(s).fetchall()


Execute UPDATE statement

conn = engine.connect()
stmt=students.update().where(students.c.lastname=='Khanna').values(lastname='Kapoor')
conn.execute(stmt)
s = students.select()
conn.execute(s).fetchall()


Execute DELETE statementconn = engine.connect()

stmt = students.delete().where(students.c.lastname == 'Khanna')
conn.execute(stmt)
s = students.select()
conn.execute(s).fetchall()

SQLITE3 SQL


Import module

import sqlite3

Create / Connet to a database

conn = sqlite3.connect("address_book.db")

Create a cursor (for working with the db)

c = conn.cursor()


Create a table in the database (if it not exists)

attr1 text, attr2 num


Insert rows into database

c.execute("INSERT INTO tbl VALUES (:attr1,:attr2,:attr3)",
		{
			"attr1": field1.get(),
			"attr2": field2.get(),
			"attr3": field3.get()
		})


Select rows from database

c.execute("SELECT * FROM adresses")

Delete row from database - get key from delete_box widget (eg. tkinter-module)

c.execute("DELETE from addresses WHERE oid = " + delete_box.get())

Fetch all rows

records = c.fetchall()

Fetch only one row

c.fetchone()

Fetch the first 50 rows

records = c.fetchmany(50)

Commit Changes

conn.commit()

Close Connection

conn.close()

MODULES - DATES, DATETIME, CALENDAR, TIMEIT, TIME, SYS, CTYPES

Standardmodules from python no pip install necessary
Overview % parsing commands https://www.programiz.com/python-programming/datetime/strptime https://rdrr.io/r/base/strptime.html
Import module datetime and timedelta

from datetime import datetime, timedelta

Import module dat

from datetime import date

Conversion string to datetime in format yyyy-mm-dd

datetime.strptime("2021-06-01", "%Y-%m-%d")

Conversion for date format eg. "Apr 21, 2020"

datetime.strptime("Apr 21, 2020", "%b %d. %Y")

Conversion for utc format eg. "14:42:46 UTC 2021-12-15

datetime.strptime(s, "%H:%M:%S UTC %Y-%m-%d")

Conversion for time format hh:mm:ss

datetime.strptime("05:20:25","%H:%M:%S")

Conversion datetime to string in format yyyy-mm-dd

datetime.strftime(dt, "%Y-%m-%d")

Convert dateformat from string in one line (1st from - 2nd to)

dt.strftime((dt.strptime(tmp,"%m/%d/%Y")),"%Y-%m-%d")

Check if variable is of type datetime.date or datetime.datetime

isinstance(x, date)

Check if variable is of type datetime.datetime

isinstance(x, datetime)

Define date as "2020-07-24"

date = date(2020,7,24)

Define date+time as "2020-03-05 19:27:23"

dt = datetime(2020,3,5,19,27,23)

Actual date+time in format 2020-06-09 20:11:13

now = datetime.now().strftime("%Y-%m-%d %H:%M:%S")

Actual date in format 2020-06-09

now = datetime.now().strftime("%Y-%m-%d")

Actual date in datetime format

tday = datetime.today()

Actual date as date format

tday = datetime.today().date()

Actual date as string format

tday = str(datetime.today().date())

Convert datetime to date

date = dt.date()

Year / month / day / hour / minute from the datetime

dt.year, dt.month, dt.day, dt.hour, dt.minute

Returns weekday (monday=1, sunday=7)

tday.isoweekday()

Return the calendar week-number

dt.isocalendar().week

Calculates last friday (+2 for FR, +1 SA, 0 SU, -1 MO, -2 TU, -3 WE, -4 TH)

last_friday = day - timedelta(days=tday.isoweekday() + 2)

Creates timedelta for 7 days

tdelta = timedelta(days=7)

Creates timedelta for 1 minute

tdelta = timedelta(seconds=60)

7 days are added to the actual day

tday + tdelta

Previous 1-day-back-date for date

newdate = date - timedelta(days=1)

Timedelta for two dates

date2-date1

Difference in days

daysDiff = abs(dt2-dt1).days

Difference in minutes

minDiff = (date2-date1).total_seconds() / 60

Generate UNIX/ISO-format from datetime

datetime.fromisoformat('2020-07-10 02:00:00').timestamp()

Generate UNIX/ISO-format from actual date

iso_dt = datetime.fromisoformat(str(datetime.now())).timestamp()

Generate Datetime from UNIXISO-format

date = datetime.fromtimestamp(ts)

Shows size of the list / tuple in bytes

sys.getsizeof(l) or sys.getsizeof(t)

Convert float to hh:mm:ss datetime.timedelta

t = timedelta(days=0.786331018518518)

Convert datetime.timedelta to float

f = t.total_seconds() / 86400


Using german names / deutsche namen for month- and day-names

import locale
locale.setlocale(category=locale.LC_ALL,locale="German")


Check last day of month & first weekday of month

import calendar
calendar.monthrange(year,month)                         # return tuple - 1st: weekday first day - 2nd: ultimo day


Check if date and timezone has summmer time / daylight time

import pytz
def is_dst(dt=None, timezone="UTC"):
    if dt is None:
        dt = datetime.utcnow()
    timezone = pytz.timezone(timezone)
    timezone_aware_date = timezone.localize(dt, is_dst=None)
    return timezone_aware_date.tzinfo._dst.seconds != 0
is_dst()													# False => UTC has no daylight time
is_dst(datetime(2019, 1, 1), timezone="US/Eastern")			# False => at this date US/Eastern has no daylight time
is_dst(datetime(2019, 4, 1), timezone="US/Eastern")			# True => at this date US/Eastern has daylight time
is_dst(datetime(2019, 1, 1), timezone="CET")				# False => at this date CET has no daylight time
is_dst(datetime(2019, 4, 1), timezone="CET")				# True => at this date CET has daylight time


Timing a specific activity

import timeit
start = timeit.default_timer()                          # Start timer
stop = timeit.default_timer()                           # Stop timer
round(stop-start,2)										# Stopped time in seconds and rounded to 2 decimals
round((stop-start)/60,2)                                # Stopped time in minutes and rounded to 2 decimals
timeit.timeit(stmt="[0,1,2]",number=1000000)     		# Checking the time for creating the list for 1mio times


Countdown while waiting

import time, import sys
time.sleep(3)                                           # Delay for 3 seconds
for i in range (30, 0, -1):                             # Delay for 30 seconds - countdown in one row
    sys.stdout.write (str (i) + ' ')                    # Countdown output
    sys.stdout.flush ()
    time.sleep (1)


Check which os system running

from sys import platform
if platform == "linux"                                  # Program is excecuting from Linux
if platform == "win32":                                 # Program ix excecuting from Windows
if platform == "darwin":                                # Program is excecuting from Mac


Read monitor resolution

import ctypes
user32 = ctypes.windll.user32
screensize = user32.GetSystemMetrics(0),                # 1st parameter is width, 2nd parameter is height
    user32.GetSystemMetrics(1)


Wait for keystroke or x seconds

TIMEOUT = 10                # time / seconds for waiting
startTime = time.time()
inp = None
print(f"Program finished - pls press <enter> to close the window or wait {round(TIMEOUT / 60, 2)} minutes...")
while True:
    if msvcrt.kbhit():                          # continue when pressing a key
        inp = msvcrt.getch()
        break
    elif time.time() - startTime > TIMEOUT:     # continue when the time is running out
        break

MODULE - CURRENCYCONVERTER - currency conversion


Import Module

from currency_converter import CurrencyConverter

Import Module for rate-checks for specific date

from datetime import date

Read Function CurrencyConverter

c = CurrencyConverter()

Convert 100 USD to AUD with for actual date

c.convert(100, 'USD', 'AUD')

Convert 100 USD to AUD for date 2013-03-21

c.convert(100, 'USD', 'AUD', date=date(2013, 3, 21))

Check begin and end-date for all rates for BGN

first_date, last_date = c.bounds['BGN']

Read all possible currencies in a set

c.currencies

MODULE - PYCOUNTRY - Countries, Currency, Language


Description about the module

https://pypi.org/project/pycountry/

Problems with some added python modules (eg. pycountry) with pyinstaller => see pyinstaller description
Install moduel

import pycountry

List of all countries with several informations

list(pycountry.countries)

Searching for country-informatns for "DE"

c = pycountry.countries.get(alpha_2='DE')

Outputs => DE

c.alpha_2

Outputs => DEU

c.alpha_3

Outputs number of the coutnry => 276

c.numeric

Outputs full name of the country => Germany

c.name

Outputs the official name => "Federal Republic of Germany!

c.official_name

List of all currencies

list(pycountry.currencies)

List of all languages

list(pycountry.languages)

List of all historic countries which not more exists

list(pycountry.historic_countries)

List of all subdivided countries

list(pycountry.subdivisiones)

MODULES - SMTPLIB, MIMEText - sending emails

sending emails if you have 2-Factor-Authentification you must generate a App-Password for GMail
Import smtplib

import smtplib

Import MIMEText

from email.mime.text import MIMEText

SMTP-Server and port number from the mail provider (e.g. GMail)

s = smtplib.SMTP('smtp.gmail.com', 587)

Response 250 means connection is ok

print(s.ehlo())

Response 250 means connection is ok

print(s.starttls())

Not the real pw - only the app-generated pw from gmail

print(s.login('[email protected]', 'xyz123'))

Message of the email

msg = MIMEText("This is the text of the mail")

One recipient

sender = ['[email protected]']

One or more recipient

recipients = ['[email protected]']

Or more recipients

Subject

msg['Subject'] = "Subject of Mail"

From

msg['From'] = sender

One or more recipients

msg['To'] = ", ".join(recipients)

Sending email

s.sendmail(sender, recipients, msg.as_string())

Closing connection

s.quit()

MODULE - PATHLIB - interacting with the operating system


Import module

from pathlib import Path

Current working directory

Path.cwd()

Creates new folder (no error message if folder allready exists with exist_ok=True)

Path("folder").mkdir(parents=True, exist_ok=True)

define some folder

path = Path("some_folder")

add some subfolders (eg. some_folder/sub_folder/sub_sub_folder)

path = path / "sub_folder" / "sub_sub_folder"

make path absolute

path = path.resolve()


get mac-adress from the actual pc

from uuid import getnode as get_mac
mac = get_mac()

MODULE - OS, SHUTIL, FILECMP, DIRCMP, Send2Trash - interacting with the operating system


Import module

import os

Get the actual path

path = os.path.abspath(os.path.dirname(sys.argv[0]))

Put all filenames to a list from the actual path

os.listdir(path)

Shows alls attributs and methods of the library

print(dir(os))

Change the directory

os.chdir("C:/temp")

Current working directory

os.getcwd()

Get actual user name

os.getlogin()

Outputs the entire content of the current working dir

for f in os.listdir(): print(f)

Outputs the elements seperated in name + extension

for f in os.listdir(): fn, ext = os.path.splitext(f)

Change the filenmae

os.rename(filename, new_name)

Cut and paste file from path1 to path2

os.rename(src, dst)

Copy file from source to destination

shutil.copyfile("src","dst")

Creates new folder (also possible with makedirs => is prefered)

os.mkdir("folder")

Creates new folder and subfolders

os.makedirs("folder/subfolder")

Delete file

os.remove (file)

Delete folder (prefered cause deleting is dangerous)

os.rmdir ("folder")

Delete folder and subfolders

os.removedirs ("folder/subfolder")

Outputs the statistics of the file (size, modification time,...)

os.stat("file")

Outputs dirpath - all the dirs - and all the files

for dirpath,dir,files in os.walk(os.getcwd())

Get home dir of the actual user

os.environ.get("HOME")

Outputs only the filename => file.txt

os.path.basename("temp/file.txt")

Outputs only the dir => temp/

os.path.dirname("temp/file.txt")

Outputs filename and dir as a tupel => ("temp/","file.txt")

os.path.split("temp/file.txt")

Checks if filename exists in the filesystem

os.path.exists("temp/file.txt")

Checks if folder exists / element is a directory (in the actual dir)

os.path.isdir("xyz")

Checks if files exists / element is a file (in the actual dir)

os.path.isfile("xyz")

Join path together on every os-system (win,mac,linux)

os.path.join(path, "URLS", link)

Put all sub-folders to a list from the actual working dir

[x[0] for x in os.walk(path)][1:]

Change file to hidden

os.popen('attrib +h ' + fn).read().close()

Send file to trash (needs:)

send2trash(filename)


Put all files with path in a list from actual directory with all subdirs and Filtering

listExtensions=["jpg","jpeg","png"]
listImages = []
for dirpath,dir,files in os.walk(os.getcwd()):
  for f in files:
    if any(x in f.lower() for x in listExtensions):
      wFN = os.path.join(dirpath, f)
      listImages.append(wFN)


Find home-directory for the actual user

from os.path import expanduser
home = expanduser("~")


Check if 2 folders are different

from filecmp import dircmp
ergDiff = dircmp(path, toPath)          # compare 2 directories
if ergDiff.left_only: ...               # see all changed files on the left side

MODULE - LOGGING

general Logging Layers: debug => info => warning => error => critical
importing module

import logging

Create & Config - w overwrites every time

logging.basicConfig(filename="fn.log",format='%(asctime)s %(message)s', filemode='w')

Level every log > INFO possible (but not DEBUG)

logging.basicConfig(filename="fn.log",level=logging.INFO)

getLogger-Name / Level / Time / Message

...format="%(name)"s-%(levelname)s:%(asctime)s-%(message)s

Creating an object with specific name

logger=logging.getLogger(logname)

Setting the threshold of logger to DEBUG

logger.setLevel(logging.DEBUG)

Debug message

logger.debug("Harmless debug Message")

Info message

logger.info("Just an information")

Warning message

logger.warning("Its a Warning")

Error message

logger.error("Did you try to divide by zero")

Critical error message

logger.critical("Internet is down")


Example with 2 logging levels

logger=logging.getLogger(__name__)							 # Creating an object with the name of the py-file
stream_h = logging.StreamHandler()							 # Defining a handler for streamdata in the console
file_h = logging FileHandler("file.log"						 # Defining a handler for logging to a file
stream_h.setLevel(logging.WARNING)							 # Set Stream-Logging to Warning-Level
file_h.setLevel(logging.ERROR)								 # Set File-Logging to Error-Level
	=> Error-Messages will go to both stream and file - Warning-Messages only to stream and not to file
formatter = logging.formatter								 # Defining the format for outputting
	"%(name)s - %(levelname)s - %(message)s")
stream_h.setFormatter(formatter)							 # Setting format for stream-logging
file_h.setFormatter(formatter)								 # Setting format for file-logging
logger.addHandler(stream_h)									 # Adding Stream-handler to logger
logger.addHandler(file_h)									 # Adding File-handler to logger


logging using RotatingFileHandler
https://stackoverflow.com/questions/24505145/how-to-limit-log-file-size-in-python # Explanation about handling

from logging.handlers import RotatingFileHandler             # importing file handler
log_formatter = logging.Formatter('%(asctime)s               # Define format for output in logfile
    %(levelname)s %(funcName)s(%(lineno)d) %(message)s')
my_handler = RotatingFileHandler(logFile, mode='a',          # Define handler with maxBytes and backupCount
    maxBytes=5*1024*1024, backupCount=2, encoding=None,      # When the log grows more then maxBytes...
    delay=0)                                                 # ...itgenerates a new logfile-version
my_handler.setFormatter(log_formatter)                       # Setting the format
my_handler.setLevel(logging.INFO)                            # Setting the threshold of logger to INFO
logger = logging.getLogger('root')                           # Creating an object with name
logger.setLevel(logging.INFO)                                # Setting the threshold of logger to INFO
logger.addHandler(my_handler)                                # Define handler with definition above

MODULE - UNITTEST

many methods: https://docs.python.org/3/library/unittest.html
Importing module

import unittest

Import the function which will be tested

from prg import func

Define the class for testing the function

class TestFunc(unittest.TestCase):

Define the first testcase - must begin with test*

def testcase1(self):

call the function

erg = func("x",3)

check if result is ok

self.assertEqual("3x")

starting the unittest

if __name__ == '__main__': unittest.main()

Check if equal

self.assertEqual(a,b)

Check if not equal

self.assertNotEquale(a,b)

Check if true

self.assertTrue(x)

Check if false

self.assertFalse(x)

Check if i in list

self.assertIn(i,list)

Check if i not in list

self.assertNotIn(i,list)

MODULE - DOTENV - hide environment informations

Installation: pip install python-dotenv
create .env file and store sensible data in the form TOKEN=xyz123
and hide the *.env files in the .gitignore

pip install python-dotenv
from dotenv import load_dotenv, find_dotenv
import os
load_dotenv(find_dotenv())
GMAIL_TOKEN = os.environ.get("GMAIL_TOKEN")

MODULE - COLLECTIONS - counter, defaultdic

Counter, namedtuple, OrderedDict, defaultdict, deque


COUNTER

from collections import Counter
colors = ['blue', 'blue', 'red', 'red', "red"]
counter = Counter(colors)									# Creates dict with counts => {'red': 3, 'blue': 2} in ordered form
counter.most_common(1) 		                           		# Outputs the element with the most counts as tuple
counter["red"})            		                            # Outputs the occurence of the value "red" => 3


NAMEDTUPLE

from collections import namedtuple
Point = namedtuple("Point","x,y")							# Creates class with tumple
pt = Point(1,-4)											# Creates Point(x=1,y=-4)
print(pt.x, pt.y)											# Shows 1 and -4


ORDEREDDICT

not necessary anymore in the newer python versions
is allready implemented in the regular dict-type)


DEFAULTDIC

d = defaultdict(int)	# Create defaultdict and define default-type
d["xyz"]				# No key error when when key not in the dict => default-type value 0 is outputed


DEQUE

from collections import deque      # import collections deque module
d = deque()				# Create deque (optimized for working with elements at the end or beginning of the list)
d.append(1)				# Add element at the end
d.appendleft(2)			# Add element at the beginning
d.extend([1,2]			# Add list of elements at the end
d.extendleft([1,2])		# Add list of elements at the beginning (in reversed order so 2,1)
d.pop()					# Delete element at the end
d.popleft()				# Delete element at the beginning
d.clear()				# Clear entire list
d.rotate(2)				# Shift all elements 2 places to the right
d.rotateleft(-1)			# Shift all elements 1 place to the left

MODULE - ITERTOOLS - products, combinations

  • product, permutations, combinations, accumulate, groupby, infinite iterators
    - from itertools import product, permutations, combinations, combinations_with_replacement, accumulate, groupby
  • from itertools import count, cycle, repeat
    necessary for the func-operator
import operator

all combis for carthesian product => [(1,3),(1,4),(2,3),(2,4)]

product([1,2],[3,4])

all with 3 elements => (0,0,0),(0,0,1),(0,1,0),(0,1,1), aso.

product([0, 1], repeat=3)

all combis of this 2 strings => [("a","1"),("a","2"),("b","1"),("b","2")]

product('ab', '12')

all different orderings => [(1,2,3),(1,3,2),(2,1,3),(2,3,1),(3,1,2),(3,2,1)]

permutations ([1,2,3])

all different orderings with 2 elements => [(1,2),(1,3),(2,1),(2,3),(3,1),(3,2)]

permutations([1,2,3],2)

all possible combis for a defined length => [(1,2),(1,3),(1,4),(2,3),(2,4),(3,4)]

combinations([1,2,3,4],2)

all combis also with itself => [(1,1),(1,2),(1,3),(2,2),(2,3),(3,3)]

combinations_with_replacement([1,2,3],2)

accumulates every number with the next => [1,3,6,10]

accumulate([1,2,3,4])

using operator .mul for multiplication every number the next => [1,2,6,24]

accumulate([1,2,3,4], func=operator.mul)

compares every entry to the maximum => [1,2,5,5,5]

accumulate([1,2,5,3,4], func=max)

gives interabel - groups by the func => True[1,2], False[3]

groupby([1,2,3],key=func_smaller3)

same thing - with a lambda operation => True[1,2], False[3]

grougpy([1,2,3],key=lambda x: x<3)

infinite loop starting at 10 (has to be ended with break sometimes)

for i in count(10)

infinite cycle through the list (has to be ended with break sometimes)

for i in cycle([1,2,3])

infinite loop with 1

for i in repeat(1)

loop with 1 for 4 times

for i in repeat(1,4)

fastest way to change an column in a dataframe (over 400x faster then iterrows)

df = df['col'].apply(lambda x: x.strip())

MODULE - NUMPY - basis for different other modules

  • memory efficient, working in the background used by other modules
    - using only the bits which are necessary for each column (1 is stored in python with 24bytes - in numpy with 1byte or lesser)
  • shape of array has to be consistent - otherwise it just fall back to regular python objects eg. [[12,11],[13]]
    Import numpy module with np abbreviation
import numpy as np

Using arrays, types and slicing
Define a numpy array (automatically with type int)

a = np.array([1,2,3,4])

Multiindexing in arrays possible => gives array/[0.,1.,2.])

a[[0,2,-1]]

Defines a numpy array with datatype float (and not int)

np.array([1,2,3,4]),dtype=np.float)

Defines a 2d numpy array

a = np.array([[1,2,3],[4,5,6],[7,8,9])

Returns dimension shape => (3,3)

a.shape

Returns number of dimensions => 2

a.ndim

Returns total number of elements => 9

a.size

Gives / slices the second row => [4,5,6]

a[1]

Gives / slices the second row and first element => 4

a[1,0]

Slices the first 2 rows => array([[1,2,3],[4,5,6]])

a[0:2]

Slices all row with first 2 elements => array([[1,2],[4,5],[7,8]])

a[:,:2]

Slices first 2 rows and first 2 elements => array([[1,2],[4,5]])

a[:2,:2]

Slices first 2 rows and last element => array([[3],[6]])

a[:2,2:]

Changes content of second row => array([[1,2,3],[9,9,9],[7,8,9])

a[1] = np.array([9,9,9])

Changes content of third row (for all elements) => array([[1,2,3],[4,5,6],[8,8,8])

a[2] = 8

Builds the sum of all elements => 45

a.sum()

=> 5.0

a.mean()

Return standard deviation => 2.58198889774344

a.std()

Builds sum per columns => array([12,15,18])

a.sum(axis=0)

Builds sum per rows => array([6,15,24])

a.sum(axis=1)

Initialize numpy array with 5 zeros => array([0.,0.,0.,0.,0.])

np.zeroes(5)

Initialize numpy array with 2x3 ones with int32 => array([[0,0,0],[0,0,0]])

np.ones((2,3),dtype="int32")

Initialize np array with 2x2 with the value 99 => array([[99,99],[99,99]])

np.full((2,2,),99)

Initialize a square matrix with len 3 => array([[1.,0.,0.],[0.,1.,0.],[0.,0.,1.]])

np.identity(3)

Separate individual copy of the numpy array (no updates in both arrays)

arr_b = arr_a.copy()

Define 2d numpy array with 2,4 dimension

c = np.array([[1,2,3,4],[5,6,7,8]])

Reshape the numpy array to 4,2 dimension => [[1,2],[3,4],[5,6],[7,8]]

c.reshape((4,2))

Reshape the numpy array to 3d with 2,2,2 dimension => [[[1,2],[3,4]],[5,6],[7,8]]]

c.reshape((2,2,2))

Convert numpy datetime64 to str (can afterwards be converted with strptime to datetime

dt = str(dt64).split("T")[0]

Vector handling
Build array with this range => array([0,1,2,3])

a = np.array(4)

Changes +10 to each element => array([10,11,12,13]) (creates new array - no changing a)

a + 10

Changes *10 to each element => array([0,10,20,30]) (creates new array - no changing a)

a * 10

Changes the existing array a => array([100,101,102,103])

a += 100

Defines second array babab6

b = np.array([10,10,10,10])

Sums the 2 arrays to new array => array([10,11,12,13)

a + b

Multiplicates the 2 arrays to new array => array([0,10,20,30)

a * b

Make sinus of all entries

np.sin(b)

Make cosinus of all entries

np.cos(b)

Boolean Array Operations
Build array with this range => array([0,1,2,3])

a = np.array(4)

Selecting items with boolean expressions - returns first and last element => array([0,3])

a[True,False,False,True]

Use operation on the array => array([False,False,True,True])

a >= 2

Using operation for slicing => array([2,3])

a[a >= 2]

Gives back all elements which are greater than the mean => array([2,3])

a[a > a.mean()]

Gives back all elements which are NOT greater than the mean => array([0,1])

a[~(a > a.mean())]

Gives back all elements which are equal to 0 OR 1 => array([0,1])

a[(a==0) | (a==1)]

Gives back all elements which are <=2 and have no rest divided by 2 => array([0,2])

a[(a <= 2) & (a % 2 == 0)]

Creating random arrays
Random 1d float array with 3 entries => eg. (0.8377, 0.8720, 0,7784)

np.random.rand(3)

Random 2d array 3 entries => [[0.27,0.91,0.54],[0.68,0.87,0.36],[0.05,0.64,0.16]]

np.random.rand(3,3)

Random 1d int array 3 entries => [0,8,8]

np.random.randint(0,10,3)

Random 2d int array 3x4 entries => [[2,6,4,5],[7,6,7,3],[7,0,4,0]]

np.random.randint(0,10,(3,4))

Random 2d int array 3x3 entries with values from 0 to 99

np.random.randint(100, size(3,3))

Random 2d int array 3x3 entries with values from -4 to 7

np.random.randint(-4,8,size(3,3))

Shuffeling a 2d list but only the first axes

np.random.shuffle(lst)

Can reproduce the same results

np.random.seed(1)

MODULE - PANDAS - analyzing and working with data


examples

https://gist.github.com/why-not/4582705

Import Module

import pandas as pd

Also importing numpy-module necessary

import numpy as np

Necessary for supressing performance warnings see below

import warnings

Show more information - count of the shown columns

pd.set_option('display.max_columns',11)

Show more information - show all columns

pd.set_option('display.max_columns',None)

Show more information - None is using the maximum width of the IDE

pd.set_option('display.width', None)

Supress Performance Warnings

warnings.simplefilter(action='ignore', category=pd.errors.PerformanceWarning)

SERIES
Define a series

s = pd.Series([35,63,81,61,127,65,319])

Shows default-index in 1 row / values in 2 row - also the dtype: int64 at the bottom

print(s)

Name the series (shows the name also when series i printed at the bottom

s.name = "G7 population in millions"

Shows the dtype of the series => dtype("int64")

s.dtype

Values of the series => array([35,63,81,61,127,65,319])

s.values

Slicing element => 35

s[0]

Shows the index of the series => RangeIndex(start=0, stop=7, step=1)

s.index

Shows first index-value

s.index[0]

Change the index of the series from default 0-6 to country-names

s.index=["CAN","FRA","GER","ITA","JPN","UK","USA]

Accessing now with the new index possible => 81

s["GER"]

Access with index still possible using iloc => 35

s.iloc[0]

Access with index still possible using iloc => 319

s.iloc[-1]

Multiple selecting possible => GER 81 and JPN 127

s.["GER","JPN"]

Multiple selecting with index using iloc possible => CAN 35 and USA 319

s.iloc[0,-1]

Multiple selecting with ":" - upper element is included => CAN 35, FRA 63, GER 81

s.["CAN":"GER"]

Add 10 for the every element in the series => ([45,73,91,71,137,75,329])

s + 10

Using boolean operation to output every element => GER 81, JPN 127, USA 319

s[s > 70]

Using boolean operation to output countries > mean => JPN 127, USA 319

s[s > s.mean()]

~ for not, | for or, & for and

logical operator which can be used

Changing the value of a series element => CAN 40

s["CAN"] = 40

Changing the value of the last element => USA 500

s.iloc[-1] = 500

Changing the value with a specific operation => CAN,FRA,ITA,UK are changed to 99

s[s < 70] = 99

Define a series with NaN values

s = pd.Series([1,np.nan,7]

Check for NaN values in the series => False,True,False

d.isnull()

Check for NOT NaN values in the series => True,False,True

s.notnull()

Sum of not NaN values in the series => 2

s.notnull().sum()

Sum of NaN values in the series => 1

s.isnull().sum()

Outputs all entries which are not NaN

s.[s.notnull()]

Delete all entries with NaN

s.dropna()

Replace the NaN-values with 0

s.fillna(0)

Replace the NaN-values with the meanvalue of the serie

s.fillna(s.mean())

With forward-fill the NaN-values are replaced with the value before (if the first i gets NaN)

s.fillna(method="ffill")

With backward-fill the NaN-values are replaced with the value after (if the last i gets NaN)

s.fillna(method="bfill")

Returns the duplicates (only the first entry for the same is no duplicate - rest gets True)

s.duplicated()

Returns the duplicates (only the last entry for the same value is no duplicate - rest gets True)

s.duplicated(keep="last")

Returns the duplicates (when there are more than one value - all get flagged as duplicate with True)

s.duplicated(keep=False)

Duplicates get dropped (only the first entry remains)

s.drop_duplicates()

Duplicates get dropped (only the last entry remains)

s.drop_duplicates(keep="last")

Duplicates get dropped (no entry remains if there are duplicate entries for the value)

s.drop_duplicates(keep=False)

Sorting by value ascending

s.sort_values()

Sorting by value descending

s.sort_values(ascending=False)

Iterate through pandas series

for index, value in s.items():

Get element from series - if not found is the result "N/A"

erg = s.get("Entry","N/A)

Delete timezone from timestamp

.index.tz_localize(None)

Convert pandas timestamp to datetime

dt = dt.to_pydatetime()

Convert datetime to pandas timestamp

dt = pd.Timestamp(dt)


DATAFRAMES
are immutable, column is a series, dataframe = combination of multiple series
reading data
possible reader: read_csv,read_excel,read_json,read_html,read_sql
possible writer: to_csv,to_excel,to_json,to_html,to_sql

Important
Convert pandas timestamp to datetime

dt = dt.to_pydatetime()

Convert a dataframe to a nested list (only values, without index)

l = df.values.tolist()

Convert a dataframe to a nested list including index

l = df.reset_index().values.tolist()

Read first worksheet to dataframe

df = pd.read_excel(workFN, sheet_name=0)

get specific row with title in index

filt = (tmpDF.index == "Total Assets")

get specific row with title in index

tmpDF = tmpDF[filt]

get the first entry of a series

tmpDF[tmpDF.columns[0]][0]

add new column in dataframe at position 0 (column is name, value is content)

df.insert(loc=0, column="A", value="XYZ")

CSV
Shows all parameters for the reader

pf.read_csv?

Read csv into dataframe - first line will be col-names of the dataframe

df = pd.read_csv("fn.csv")

Read csv into dataframe - no headerline (col names will be default numeric)

df = pd.read_csv("fn.csv",header=None)

") => Read csv with sepeartor "=>" instead of the default ";"

df = pd.read_csv("fn.csv",sep="

Many operations can be directly done when reading the csv-file

df = pd.read_csv("fn.csv",

=> no header in the csv available

header=None

=> name the columns

names=["Timestamp","Price"],

=> define which column will be the index

index_col=0,

=> read dates as date - and not as string

parse_dates=True,

=> treat this chars as NaN-values

na_values["?","-",""],

=> assign specific datatype to the column "Price"

dtype={"Price":"float"},

=> decimal-point is "," (and not ".")

decimal=",",

=> read also blank lines (as default they are ignored)

skip_blank_lines=False),

=> use only the three columns when reading the csv (by position)

usecols=[0,1,2],

=> use only the three columns when reading the csv (selected by name)

usecols=["first","last","age"])

Write dataframe to csv

df.to_csv("fn2.csv")

XLSX
Read xlsx (first worksheet) into dataframe

df = pd.read_excel("file.xlsx")

Read xlsx (first worksheet) into dataframe - without headers

df = pd.read_excel("file.xlsx",header=None)

Read xlsx and define the first column as index of the dataframe

df = pd.read_excel("file.xlsx",index_col=[0])

Read xlsx (worksheet "Prod") into dataframe

df = pd.read_excel("file.xlsx",sheet_name="Prod")

Read xlsx with the ExcelFile method

ef = pd.ExcelFile("file.xlsx")

Shows alle worksheet-names as a list => eg. ["Prod","Desc"]

ef.sheet_names

Read/Parse the worksheet "Prod" in a dataframe

df = ef.parse("Prod")


Write several dataframes to excel

# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('basisReadData.xlsx', engine='xlsxwriter')

# Write each dataframe to a different worksheet.
quandlMain.to_excel(writer, sheet_name="quandlMain"),
quandlMain2.to_excel(writer, sheet_name="quandlMain_From2018_MRT")

# Write also in this form
with pd.ExcelWriter("example.xlsx") as writer:
    df.to_excel(writer, sheet_name="coinPrices")

# Close the Pandas Excel writer and output the Excel file.
writer.save()

SQL
Establish connection to the sql-db

conn = sqlite3.connect("data.db")

Read data from sql-db into dataframe

df = pd.read_sql("SELECT * FROM empl;",conn)


Many operations can be directly done when reading the sql-db

df = pd.read_sql("SELECT * FROM empl;",conn,
	index_col="EmplID",										    # define which column will be the index
	parse_dates["BirthDate","HireDate"])			# read this columns as date


It is easier to read a complete table using sqlalechmy

from sqlalchemy import create_engine

Create SQL Alchemy engine

engine = create_engine("data.db")

Establish connection with SQL Alchemy

connection = engine.connect()

Read whole table with SQL-Alechmy (many things are auto-done - e.g. Indexing,Parsing)

df = pd.read_sql_table("empl",con=connection)

Writes dataframe to the sql-db (breaks when table allready exists)

df.to_sql("data.db",conn)

Writes dataframe to the sql-db (with droping the table before inserting)

df.to_sql("Dadata.db",conn,if_exists="replace")

Writes dataframe to the sql-db (with inserting new values to existing table)

df.to_sql("data.db",conn,if_exists="append")

HTML
Read html into dataframe

df = pd.read_html(html_string)

Assign html-varialbe

html_url"https://www.xyz.com"

Read html-link to dataframe

df = pd.read_html(html_url)

DICT, LIST
Creates a dataframe from a list of dicts => [{...},{...},{...}]

df = pd.DateFrame(l)

Create a dataframe for a dict

df = pd.DataFrame.from_dict(d, orient="index")


EXAMPLE

https://www.freecodecamp.org/learn/data-analysis-with-python/data-analysis-with-python-course/pandas-condtitional-selection-and-modifying-dataframes
    Popl	HDI		Continent
CAN	35		0,913	America
FRA	64		0,888	Europe
GER	81		0,916	Europe
ITA 61		0,873	Europe
JPN 127		0,891	Asia
UK  65		0,907	Europe
USA 319		0,915	America


Initialize a dataframe with random ints with 5 rows and 2 columns

df = pd.DataFrame(np.random.randint(0,5,size=(5, 2)), columns=list('AB'))

Create an empty dataframe

df = pd.DataFrame()

Creates a dataframe from a list of dicts => [{...},{...},{...}]

df = pd.DateFrame(l)

Create a dataframe for a dict

df = pd.DataFrame.from_dict(d, orient="index")

Check if datafram is empty

if df.empty:

Show the shape of the dataframe (rows, columns) => (7,3)

df.shape

Info for columns (index,name,dtype) also count of NaN-values per column

df.info()

Shows datatype of every column

df.dtypes()

Several statistic for the dataframe (median,max,min,std,...)

df.describe()

Output of the first 5 rows per default

df.head()

Output of the first 22 rows

df.head(22)

Output of the last 5 rows per default

df.tail()

Output of the last 22 rows

df.tail(22)

Output of all column names

df.columns

Outputs the index

df.index

Set index to column "name"

df.index = df["name"]

Outputs the size of the dataframe

df.size

Resets the index to numeric (previous index exists as column)

df.reset_index(drop=False)

Resets the index to numeric (previous index is deleted)

df.reset_index(drop=True)

Define "Popl" as new index - with inplace it is permanently saved

df.set_index("Popl", inplace=True)

Selecting Rows by index - showing the row with the index = "CAN"

df.loc["CAN"]

Selecting Rows by position - showing the last row

df.iloc[-1]

Selecting Columns by title - showing the whole column "Population" for all rows

df["Popl"]

df["Continent"].value_ # fill motor city input print(f"Create filtering for Motor City Input AM...") filt = (dfInpAM["Lab"] == "US-Motor City") filt2 = (dfInpAM["Status"] == "In Production") dfMCI = dfInpAM[filt & filt2]
Output of alle values with their count

dfMCI["ReceivedTime"]= dfMCI["ReceivedTime"].astype(str)    counts()

Output of alle values with their count in percent

df["Continent"].value_counts(normalize=True)

Showing cols Popl,Continent for all rows

df[["Popl","Continent"]]

Showing cols Popl,Continent for all rows - values divided by 100

df[["Popl","Continent"]] / 100

Show the rows from index CAN to ITA

df.loc["CAN":"ITA"]

Show the rows from index CAN to ITA - only with the column1

df.loc["CAN":"ITA","Popl"]

Show the rows from index CAN to ITA - only with the column1 + column3

df.loc["CAN":"ITA",["Popl","Continent"]]

Show the columns col1 to col4 for all rows

df.loc[:,"col1":"col4"]

Show the columns col1 to col4 for 3 rows from index 0 to 2 (2 included)

df.loc[0:2,"col1":"col4"]

Show the columns cols 1 and 3 for the rows from index 0 to 2 (2 included)

df.loc[0:2,["col1","col3"]]

Show the 1st and last row => CAN,USA

df.iloc[0,-1]

Show 2nd and 3rd row for all cols => FRA,GER / Popl,HDI,Continent

df.iloc[1:3]

Show 2nd and 3rd row for 2nd col => FRA,GER / HDI

df.iloc[1:3,1]

Show 2nd and 3rd row for 1st and 3rd column => FRA,GER / Popl,Continent

df.iloc[1:3,[0,2]}

Show 2nd and 3rd row for the 2nd and 3rd col => FRA,GER / HDI,Continent

df.iloc[1:3,1:3]

Returns GER,JPN,USA for all cols

df.loc[(df["Popl"] > 70	)]

Returns GER,JPN,USA for col "Popl"

df.loc[df["Popl"] > 70, "Popl"]

Returns GER,JPN,USA for cols "Popl" and "Continent"

df.loc[df["Popl"] > 70, ["Popl","Continent"]]

Output of 2 columns for the column slice

df[["Open","Close"]]["2010":"2020"]

Output specific rows and slice of columns

df[df.index.isin(["2020","2019"])][hist.columns[0:4]]

Output specific rows and columns

df[df.index.isin(["2020","2019"])][["Open","Close"]]

Drops the row CAN

df.drop("CAN")

Drops the rows CAN,JPN

df.drop(["CAN","JPN"])

Drops the rows CAN,JPN (2nd method)

df.drop(["CAN","JPN"],axis=0)

Drops the rows CAN,JPN (3rd method)

df.drop(["CAN","JPN"],axis="rows")

Drops the cols Popl,Continent

df.drop(columns=["Popl","Continent"])

Drops the cols

df.drop(df.columns[1,2,5,9],axis=1)

Drops the cols Popl,Continent (2nd method)

df.drop(["Popl","Continent"],axis=1)

Drops the cols Popl,Continent (3rd method)

df.drop(["Popl","Continent"],axis="columns")

Drops the whole dataframe

df.drop(df.index, inplace=True)

Define series with same col-names as in the dataframe - with some correction values

crisis = pd.Series([-10,-0,3],index=["Popl","HDI"])

Comines crisis-series to the df => all values in "Popl" -10 and "HDI" -0,3 in the df

df[["Popl","HDI"]] + crisis


Define new series for new column in dataframe

langs = pd.Series(
	["FR","DE","IT"],           # values of new column
	index=["FRA","GER","ITA"],  # existing index in dataframe
	name= "lang")               # name of the new column


Creates new column in the dataframe according to the series "langs" - other rows will get "NaN"

df["lang"] = langs

Values of all rows in the column "lang" will be changed to "GB"

df["lang"] = "GB"

Renames column names

df.rename(columns={"Popl":"POPL","Continent":"CONT"})

Renames index names

df.rename(index={"UK":"UnKi","USA":"US"})

Creates new column by dividing col "Popl" with col "HDI"

df["Popl_HDI"] = df["Popl"] / df["HDI"]

Different operations for the column in the dataframe

df["Popl"].median().max().min().sum().mean().qunatile()

Changes Timestamp-String to Timestamp-Type (eg. read from csv-file before)

df["Timestamp"]=pd.to_datetime(df["Timestamp"])

Plotting the dataframe using the mathlib-module

df.plot()

Plot the dataframe

df.plot(figsize=(12,6))

Plot the dataframe for the specific timeframe

df.loc["2017-12-01":"2019-12-01"].plot(figsize=(12,6)))

Check for NaN values in the Dataframe => Matrix with True / False

df.isnull()

Check for NOT NaN values in the Series => Matrix with True / False

df.notnull()

Sum of NaN values per column

df.isnull().sum()

Deleting all rows with at least one NaN value in it

df.dropna()

Deleting all columns with at least one NaN value in it

df.dropna(axis=1)

Deleting all rows where all columns are NaN

df.dropna(how="all")

Deleting all rows where at least 3 columns are NOT NaN

df.dropna(thresh=3)

Replace NaN values with text

df.fillna("missing")

Replace NaN values with 0

df.fillna(0)

With forward-fill NaN-values are replaced with the value before (axis=0 means vertical / per column)

df.fillna(method="ffill",axis=0)

With forward-fill NaN-values are replaced with the value before (axis=0 means horizontal / per row)

df.fillna(method="ffill",axis=1)

Shows the existing values for the column => eg: "M","F","D","?" (M/F is correct and D/? is wrong)

df["Sex"].unique()

Shows the count of each existing value in the column => eg. "M" 175, "F" 132, "D" 5, "?", 3 => so 8 wrong entries

df["Sex"].value_counts()

Change all "D" values in the column to "F"

df["Sex"].replace("D","F")

Change all "D" to "F" and "N" to "M" in the column

df["Sex"].replace({"D":"F","N":"M"})

Find all the invalid values in the age-column => eg. greater than 100 is invalid entry

df[df["Age"] > 100]

Fixing the invalid values for age by dividing it to 100

df.loc[df["Age"]>100,"Age"]=df.loc[df["Age"]>100,"Age"]/100

Defines new filter

filt = (df["col1"] == "col1_val1")

Defines new filter with several values

filt2 = (df["col1"].isin(["val1","val2","val3"]))

Defines new filter NOT with several values

filt3 = (~df["col1"].isin(["val1","val2","val3"]))

Defines new filter with a > operator

filt4 = (df["col1"] > 123)

Outputs the result for the above filter

df[filt]

Outputs the result for the above filter with only col2

df[filt]["col2"]

Use 2 filters

df[filt1 & filt2]

Filter the elements for the top 2.5% of the dataset (quantile)

df= df[(df['value'] >= df['value'].quantile (0.025))]

Filter the elements for the bottom 2.5% of the dataset (quantile)

df= df[(df['value'] < df['value'].quantile (0.975))]

Check duplicates for all rows (only first entry is not flagged as True)

df.duplicated()

Check duplicates for all rows (only last entry is not flagged as True)

df.duplicated(keep="last")

Check duplicates for all rows (every entry is flagged as True)

df.duplicated(keep=False)

Check duplicates only for subset-col (only first entry is not flagged as True)

df.duplicated(subset=["Name"])

Check duplicates only for subset-col (only last entry is not flagged as True)

df.duplicated(subset=["Name"],keep="last")

Check duplicates only for subset-col (every entry is flagged as True)

df.duplicated(subset=["Name"],keep=False)

Split all the strings in the column at the blank-char "_" => eg. results [1987, M, US, 1]

df["Data"].str.split("_")

Split all the strings in the column at the blank-char and build seperate new columsn in the dataframe

df["Data"].str.split("_",expand=True)

Check column if there exists an "U" in the string-values (True / False output as series) - also regex possible

df["Data"].str.contains("U")

Eliminate the whitespaces at the beginning and end for the string-values

df["Data"].str.strip()

Replace / Deletes blanks " " in the string-values

de["Data"].str.replace(" ","")

Function lower is used for the whole dataframe

df.apply(lambda x: x.lower())

Function lower is used for the first 3 columns

df.apply(lambda x: x.lower(), axis="columns")[0:3]

Sorting the new index ascending

df.sort_index(inplace=True)

Sorting the columns in dataframe

df.reindex(sorted(df.columns), axis=1)

Sorting by column ascending

df.sort_values(by="co1")

Sorting by column descending

df.sort_values(by="co2",ascending=False)

Sorting by 2 columns descending - first col1 and then col2

df.sort_values(by=["col1","col2"],ascending=False)

Sorting a series for one specific col descending

df["col1"].sort_value(ascending=False)

Change the column titles to uppercase

df.columns =) [x.upper() for x in df.columns]

Function lower is used for the whole column

df["c1"] = df["c1"].apply(lambda x: x.lower())

Change the names of the values - all others become NaN

df["c1"].map({"val1":"val1new","val2":"val2new"})

Change the names of the values - all others are not touched

df["c1"].replace({"val1":"val1new","val2":"val2new"})

New column at the end

df["cnew1"] = df["c1"] + df["c2"]

New column at the end

df["cnew1"] = df["c1"] + df["c2"]

Deleting 2 columns

df.drop(columns=["c1","c2"], inplace=True)

Add new row (all other cols are defined as NaN for the row)

df = df.append({"c1":"val1"},ignore_index=True)

Outputs the 5 highest values in the column - only col1

df["col1"].nlargest(5)

Outputs the 5 highest values in the column - all columns

df.nlargest(5,"col1")

Outputs the 5 lowest values in the column - all columns

df.nsmallest(5,"col1")

Grouping according to col1

col1_grp = df.groupby("col1")

Access to one value from the group

col1_grp.get_group("col1_value")

Grouping col1 and output median for col2

col1_grp["col2"].median()

Grouping col1 and output median for col2 only for 3 group-values

col1_grp["col2"].median().loc[["A","B","C"]]

Filtering all Values with 0

non_zero = df[df["col1"]!=0]

Add new column with Y/N depending if the value in col "Sales" is > 50

df["newcol"]=["Y" if x > 50 else "N" for x in df["Sales"]]

Change existing column with Y/N depending if the value in col is > 50

df["col"]=["Y" if x > 50 else "N" for x in df["col"]]

Add/change column with 0/1 depending of calc of the variables x/y from 2 cols

df["c"]=[1 if (x/y>9) else 0 for x,y in zip(df["c1"],df["c2])]

Delete duplicate index entries in the dataframe

df = df[~df.index.duplicated(keep='first')]

Convert a dataframe to a dict (https://stackoverflow.com/questions/26716616/convert-a-pandas-dataframe-to-a-dictionary)

df.set_index('timestamp').T.to_dict('list')

Select every 5th row

df.iloc[::5, :]

Reset index form a dataframe with a new column from 0 to x

df = df.reset_index()

Get the first row of the dataframe

df = df.iloc[:,0]

Iterate trough dataframe with index and content / row

for index, row in df.iterrows():

Show all rows of the dataframe when printed

pd.set_option('display.max_rows', None)

Change datetime64 column to datetime

df["col1"] = df["col1"].dt.date

Convert datetime to pandas timestamp

dt = pd.Timestamp(dt)

Get the maximum from the 3 different columns and put it in the column "max"

df["max"]=df[["high-low","high-pc","low-pc"]].max(axis=1)

Changing a column with this lambda function (over 400x faster then using iterrows)

temp = df['name'].apply(lambda x: x.strip())

Add empty line to dataframe

df.append(pd.Series(np.nan), ignore_index=True)

Delete rows where col1 = nan

df = df[df["col1"].notna()]

Delete rows where col1 < 10

df = df[df["col1"] < 10]

Merge 2 dataframes where the column order-id is ident

dfFinal = df1.merge(df2, on='order-id')

Convert a dataframe to a nested list

l = df.values.tolist()


When there is some warning like "A value is trying to be set on a copy of a slice from a DataFrame"

change the access from eg.: df["upperband"][current] = df["upperband"][previous]
to: df.loc[current, "upperband"] = df.loc[previous, "upperband"]
(then there will be no more warning)

MODULE - MATPLOTLIB - working with charts


Import module as plotter plt

import matplotlib.pyplot as plt

Plot a line

plt.plot(x, x **2 )

Plot another line (in the other direction)

plt.plot(x, -1 * (x ** 2))

Define x values

x_values = list(range(1000))

Define y values / squares

squares = [x**2 for x in x_values]

Plot diagram with x- and y-axes

plt.plot(x_values, squares)

Plot 1st row, 2nd col in the 2nd diagram (any line after that will only access the 2nd diagram)

plt.subplot(1,2,2)

Sscatter diagram with x- and y-axes and line strenght = 5

plt.scatter(x_values, squares, s=5)

Title with fontsize = 24

plt.title("Title", fontsize=24)

X-Axe title with fontsize = 18

plt.xlabel("X-Axe", fontsize=18)

Y-Axe title with fontsize = 18

plt.ylabel("Y-Axe", fontsize=18)

Params describtion with fontsize = 10

plt.tick_params(axis="both", which="major", labelsize=10)

Define scaling of axes

plt.axis([0,1100,0,1100000])

Colors from one shade to another with different params

plt.scatter(x,y,c=squares,cmap=plt.cm.Blues,edgecolor="none", s=10)

Expoit the first point larger in green

plt.scatter(x[0],y[0],c="green",edgecolor="none", s=100)

Expoit the last point larger in red

plt.scatter(x[-1],y[-1],c="red",edgecolor="none", s=100)

Hide the complete x-axis description

plt.axes().get_xaxis().set_visible(False)

Hide the complete y-axis description

plt.axes().get_yaxis().set_visible(False)

Define custom figure size

plt.figure(dpi=128, figsize=(10,6))

Show diagram

plt.show()

Save diagram as png-picture-file

plt.savefig("example.png",bbox_inches="tight")

MODULE - SELENIUM - browser automatization

chromedriver.exe in Ordner von py-file speichern
Python specific documentation for selenium

https://selenium-python.readthedocs.io/locating-elements.html

Import WebDriver for url access

from selenium import webdriver

Import Options

from selenium.webdriver.chrome.options import Options

Import Service

from selenium.webdriver.chrome.service import Service

Import WebDriverWait (explizit waiting time)

from selenium.webdriver.support.ui import WebDriverWait

Import EC (for expected conditions with waiting logic)

from selenium.webdriver.support import expected_conditions as EC

Import By (for finding elements with By-logic)

from selenium.webdriver.common.by import By

Import Keys (for keystroke handling)

from selenium.webdriver.common.keys import Keys

Import Time-Library when there are delays needed

import time

Import Keys to send key-strokes

from selenium.webdriver.common.keys import Keys

Define chrome driver - with actual folder os.getcwd

driver = webdriver.Chrome(os.getcwd() + '/chromedriver')

Throws exception when time for read is over 10 seconds

driver.set_timeout(10)

Somtimes necessary to delay some work in progress

time.sleep(1)

Encode url

link = urllib.parse.quote_plus(link)

Open link

driver.get(link)

get full unparsed html-code

driver.page_source

Find element (input-element with id = "passwd-id")

driver.find_element(By.XPATH, "//input[@id='passwd-id']")

Click on element by xpath (investigate - copy xpath)

driver.find_element(By.XPATH,"//*[@id='button']/input").click()

Fill field "name" with content "x"

driver.find_element(By.XPATH,"//*[@id='name']").send_keys("x")

Other method to find element per xpath ()

driver.find_element(By.XPATH, "//button[text()='Some text']")

Find Element according to a specific element eg. "aria-label" with text "Post" in a "div"

driver.find_element(By.XPATH,"//div[@aria-label='Post']").click()

Clear / delete input field

driver.find_element("q").clear()

Fill text in field and press enter

driver.find_element("q").send_keys("txt" + u'\ue007')

Set text of element to xyz

driver.execute_script('arguments[0].innerHTML = "xyz";', elem)

Fill text in field (use css-selector to find element => input element with id "myID" in it)

drv.find_element_by_css_selector("input#myID").send_keys("txt")

driver.set_window_size(600,1000)
Search field for class

driver.find_elements(By.CLASS_NAME, "cl")

Search field for tag

driver.find_elements_by_tag_name("tag")

Search by (exact) link-text in an anchor-tag

driver.find_element_by_link_text('Continue')

Search by (partial) link-text in an ancor-tag

driver.find_element_by_partial_link_text('Conti')

Send single keyboard-elements <> chars (needs import keys)

driver.send_keys(Keys.Backspace,Keys.ARROW_LEFT,Keys.ENTER)

Send a PageDown-keystroke (eg. for scrolling to the very bottom of a page)

driver.send_keys (Keys.PAGE_DOWN)

Send an Return-keystroke (eg. for enter after data entries in fields)

driver.send_keys (u'\ue007')

Press enter and submit site

field.submit()

Close driver (otherwise the windows will not close)

driver.quit()

Close the actual window

driver.close()

Title of the html-site

driver.title

Back-Function in browser-history

driver.back()

Forward-Function in browser-history

driver.forward()

Refresh website

driver.refresh()

Define an individual cookie

cookie = {"name": "token", "value": "23874kljdsjhfaökldjs"}

Use Cookie

driver.add_cookie(cookie)

Outputs a specific cookie

driver.get_cookie("token")

Outputs all cookies

driver.get_cookies())

Read current, actual url

driver.current_url

Switch to a frame-id (eg. a PopUp)

driver.switch_to.frame ("frame_id")

Read active window name

act_window = driver.current_window_handle

Switch (back) to active window (eg. after working on the popup-frame or popup-window)

driver.switch_to.window(act_window)

Set window size to 600 wide and 1000 height

driver.set_window_size(600,1000)

Minmize window

driver.minimize_window()

ScrollDown in the window (maybe to the bottom of the site)

driver.execute_script("window.scrollTo(0, 10000)")

Open a new tab

driver.execute_script("window.open('');")

Swich to first tab

driver.switch_to.window(driver.window_handles[0])


Switch between iframes

waitWD.until(EC.presence_of_element_located((By.XPATH,"//iframe[@id='mainFrame']")))
iframe = driver.find_element(By.XPATH, "//iframe[@id='mainFrame']")
driver.switch_to.frame(iframe)
driver.switch_to.default_content()


Wait till a specific element is loaded / clickable

=> waiting as long as required - better then time.sleep
=> more possible conditions see https://selenium-python.readthedocs.io/waits.html#explicit-waits
wait = WebDriverWait (driver, 5)                                        # Define maximum wait time till the element is ready
wait.until(EC.presence_of_element_located((By.NAME, "agree"))).click()  # Wait till element with name = "agree" is present and click afterwards
wait.until(EC.presence_of_element_located((By.ID, "YDC-Col1")))         # Wait till element with id = "YDC-Col1" is present
wait.until(EC.element_to_be_clickable((By.XPATH, "//div[text()='Total revenues']"))).click()    # Wait till div-element with specific text is pressent and click


Use Selenium with chrome windows hidden

os.environ['WDM_LOG_LEVEL'] = '0'     # with this seeting no output of log / messages using webdriver_manager.chrome
import undetected_chromedriver as uc
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager
from fake_useragent import UserAgent
ua = UserAgent()
userAgent = ua.random
options = Options()
options.add_argument('--headless')
options.add_experimental_option ('excludeSwitches', ['enable-logging'])
options.add_argument("start-maximized")
options.add_argument('window-size=1920x1080')
options.add_argument('--no-sandbox')
options.add_argument('--disable-gpu')
options.add_argument(f'user-agent={userAgent}')
srv=Service(ChromeDriverManager().install())
# driver = webdriver.Chrome (service=srv, options=options)
driver = uc.Chrome (service=srv, options=options)
waitWebDriver = WebDriverWait (driver, 5)
link = "https://www.google.com"
driver.get (link)


Define Real User Agent Header

(avoid being blocked while scraping some data - keep it up to date from time to time)
see: https://www.scraperapi.com/blog/5-tips-for-web-scraping/
common user agent list: https://www.networkinghowtos.com/howto/common-user-agent-list/
latest user agent list: https://www.whatismybrowser.com/guides/the-latest-user-agent/
options.add_argument("user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.106 Safari/537.36")


Use Random User Agent Header

https://stackoverflow.com/questions/49565042/way-to-change-google-chrome-user-agent-in-selenium

from fake_useragent import UserAgent
options = Options()
ua = UserAgent()
userAgent = ua.random
options.add_argument(f'user-agent={userAgent}')


Using XPath with Selenium

https://www.guru99.com/xpath-selenium.html
Can be used to find the location of any element in a html- or xml document

Example:
wait = WebDriverWait(driver, 10)
tmpElem = wait.until(EC.element_to_be_clickable((By.XPATH, "//span[text()='Top Grossing']")))
tmpElem.click()

Xpath=//tagname[@attribute='value']
    // : Select current node
    Tagname: Tagname of the particular node.
    @: Select attribute.
    Attribute: Attribute name of the node.
    Value: Value of the attribute.

//input[@name='uid']                      # input-elem with name = "uid"
//input[@type='text']                     # input-elem with type = "text"
//label[@id='message23']                  # label-elem with id = "message23"
//input[@value='RESET']                   # input-elem with value = "RESET"
//*[@class='barone']                      # any elem with class = "barone"
//td[text()='UserID']	                    # td-elem which has the text "User-ID"
//a[@href='http://demo.guru99.com/']      # anchor with specific link
//img[@src='//xyz.com/img/java.png']      # img-elem with specific pic-link
(//button)[2]                             # select the second button on the page
//button[last()]                          # select the last button
(//p[@class='product']/span)[2]           # select the second span in the p-element with class =

//*[@type='submit' or @name='btnReset']         # any elem which has 2 OR conditions (for type and name)
//input[@type='submit' and @name='btnLogin'     # any elem which has 2 AND conditions (for type and name)

//*[contains(@name,'btn')]              # any elem which has "btn" somewhere in the name
//*[contains(@id,'message')]            # any elem which has "message" somewhere in the id
//*[contains(text(),'here')]            # any elem which has "here" somewhere in the text
//*[contains(@href,'orf')]              # any elem which has "orf" somewhere in the href-link
//select[contains(.,'Division')]        # select-element where some child has the text "Division" in it
//label[starts-with(@id,'message')]     # elem where id starts with "message"

//*[@type='text']//following::input[1]              # select all following-elements which have input as tag and select the first elem
//*[text()='Enterprise Testing']//ancestor::div[2]  # find ancestor-elements with text and take the second one
//*[@id='java_technologies']//child::li[23]         # select all child elements with specific id and select the 23th one
//*[@type='submit']//preceding::input[1]            # select preceding (dt.: vorangegangene) input-elements and take the first one
//*[@type='submit']//following-sibling::input       # select following sibling which is input
//*[@id='rt-feature']//parent::div[5]               # select parent of the elem with id = "rt-feature" and take the 5th div
//*[@type='password']//self::input                  # select elem and use itself (only 1 element can be found)
//*[@id='rt-feature']//descendant::a[X]             # find all elements below (dt.: Nachfahren) and take the Xth one


Scroll to the very bottom of a site - eg. Twitter, Facebook, Stocktwits,...

lenOfPage = driver.execute_script("window.scrollTo(0, document.body.scrollHeight);var lenOfPage=document.body.scrollHeight;return lenOfPage;")
match=False
while(match==False):
    lastCount = lenOfPage
    time.sleep(1)
    lenOfPage = driver.execute_script("window.scrollTo(0, document.body.scrollHeight);var lenOfPage=document.body.scrollHeight;return lenOfPage;")
    if lastCount==lenOfPage:
        match=True


When there are problems clicking an element on a page

Method1:
element = driver.find_element_by_id('datePickerToggleBtn')
webdriver.ActionChains(driver).move_to_element(element ).click(element ).perform()

Method2:
element = driver.find_element_by_css('div[class*="loadingWhiteBox"]')
driver.execute_script("arguments[0].click();", element)


Take a screenshot when something fails in headless mode

try:
    ...
    do the whole selenium work...
    ...
except:
    driver.save_screenshot("a.png")


use locators in this order as suggested by Selenium contributors

ID
name
className
linkText
partialLinkText
tagName
cssSelector
XPath


Solving some captchas - clicking for 10 seconds

  element = driver.find_element(By.XPATH,"//div[@id='px-captcha']")
  action = ActionChains(driver)
  click = ActionChains(driver)
  action.click_and_hold(element)
  action.perform()
  time.sleep(10)
  action.release(element)
  action.perform()
  time.sleep(0.2)
  action.release(element)


Use only newest fake_useragent

  # install with
  # pip uninstall fake-useragent
  # pip install --force-reinstall "git+https://github.com/robertsmd/fake-useragent.git@robertsmd-feature-only-newest-uas"
  see: https://github.com/hellysmile/fake-useragent/pull/122
  from fake_useragent import UserAgent
  userAgent = UserAgent(num_newest_uas=2).random


Download PDF-File to folder

  # create this option-setting when creating the driver
  ...
  DOCPath = os.path.join(path, "WORK")
  options = Options()
  options.add_experimental_option('prefs', {
    "download.default_directory": DOCPath,
    "download.prompt_for_download": False,
    "download.directory_upgrade": True,
    "plugins.always_open_pdf_externally": True
    })
  ...
  srv=Service(ChromeDriverManager().install())
  driver = webdriver.Chrome (service=srv, options=options)
  driver.get(workerPDF)

MODULE - BEAUTIFUL SOAP - webscraping


Import module for beautiul soap

from bs4 import BeautifulSoup

Import moduel for fake useragent

from fake_useragent import UserAgent

Define UserAgent

ua = UserAgent()

Create random useragent

ua.random								"t

Read url as page

page = requests.get("https://www.ariva.de/dax-30")

Read url as page with proxy like eg. "https://10.10.1.11:1080"

page = requests.get(url,headers=headers,proxy=proxy)

Read html-page with parser

soup = BeautifulSoup (page.content,"html.parser")

Read xml-page with parser

soup = BeautifulSoup (page.content,"xml")

find first a-tag in body

soup.body.a

Find all td-elements

soup.find_all("td"):

Find alle th- and td-elements

soup.find_all(["th","td"])

Read all table rows for specific class

soup.find_all("tr", class_="arrow0"):

Find all div-elements with specific class

soup.find_all("p", {"class": "payamntp"})

Find element with one of this 2 classes

soup.find_all("div", {'class':['class1', 'class2']})

Read all table data cells with id and content

for col_id, col_content in enumerate(result.find_all("td")):

Read all divs with specific class

soup.find_all("div", class_="col-xs-6")

Find div with id "lj_div"

soup.find("div", {"id": "lj_div"})

Find (first) p with class "line"

soup.find("p", {"class": "line"})

Find element with specific text / string

soup.find("dt", string="In-App Purchases")

General way to find string

soup.find("div", string=re.compile('contributions'))

Check if string nameSymbol_" is part of the class-element

soup.find("small", class_=re.compile("nameSymbol_"))

Find elements for several tags

soup.find_all(["strong","a","br"])

Found the parent div-element for a specific before find element

erg2 = erg.find_parent("div")

Check if class has specific content

if row.get("class") == ["ellipsis", "nobr", "new"]:

Read all links from href element

for e in soup.find_all("a"): e.get("href")

Read all number from the field "value"

for e in soup.find_all("a"): e.get("value")

Change all linebreak
in text to blank

for lb in soup.findAll("br"): lb.replaceWith(" ")

Read url from the image

url_i = rows[0].find('img')['src']

Read image as content

image = requests.get(f'https:{url_i}').content

Formated Output with html indentations

results.prettify()

Check if specific text in soup text-output

if "Kein Zugriff" in soup.text:

Reads everyting what is in the tbody-tag

soap.tbody

return direct child elements of a an element in a list

.contents

returns all direct childrens (as a generator object)

.children

returns all descendants (nachfolger) as iterable

.descendants

return the parent-tag

.parent

return all parents (as a generator object)

.parents

return the next sibling (going sideways / have the same parent)

.next_sibling

return the previous sibling (going sideways / have the same parent)

.previous_sibling

all following siblings as iterable

.next_siblings

all previous siblings as iterable

.previous_siblings

return next element

.next_element

return next elements (as a generator object)

.next_elements

return previous element

.previous_element

return previous elements (as a generator object)

.previous_elements

find the next ul-element

.find_next("ul")

find the previous ul-element

.find_previous("ul")

find only direct children of an element

.find_all("div", recursive=False)

returns the name of the tag

.name

get all strings from a tag in an iterable

.strings

get all (stripped) strings from a tag in an iterable

.stripped_strings

get all text directly below a tag

.find_all(text=True, recursive=False)


Using the select selector

# https://facelessuser.github.io/soupsieve/selectors/
soup.select("div.class1.class2")                => divs-elem which have both classes
soup.select("div[class='table-container']")     => div-elem which have only this class
soup.select('h1, p')                            => all h1- and p-elements
soup.select('body p')                           => p-elems in body
soup.select('div > p')                          => children p-elems in div
soup.select('h1 ~ p')                           => p-elems on the same level as h1
soup.select(':any-link')                        => any links which has a href-attribute
soup.select(':checked')                         => all checked elements (radio-buttons, checkboxes, option)
soup.select('p:first-child')                    => first child in a group
soup.select('span:first-of-type')               => firt child of a given type in a group


Define Real User Agent Header

(avoid being blocked while scraping some data - keep it up to date from time to time)
see: https://www.scraperapi.com/blog/5-tips-for-web-scraping/
common user agent list: https://www.networkinghowtos.com/howto/common-user-agent-list/
latest user agent list: https://www.whatismybrowser.com/guides/the-latest-user-agent/
headers = {
    'User-Agent': '<copy from the above link>',
}
response = requests.get(url, headers=headers)


Read with Selenium when necessary due the specific homepage

from bs4 import BeautifulSoup
from selenium import webdriver
link = "https://www.gurufocus.com/stock/" + stock + "/summary"			# define the homepage to be scraped
options = Options()
options.add_argument('--headless')										# webpage will work in the back
options.add_experimental_option ('excludeSwitches', ['enable-logging'])
path = os.path.abspath (os.path.dirname (sys.argv[0]))					# works for alle platforms (Win,MacOs,Linux)
if platform == "win32":
	cd = '/chromedriver.exe'
elif platform == "linux":
	cd = '/chromedriver_linux'
elif platform == "darwin":
	cd = '/chromedriver'
driver = webdriver.Chrome (path + cd, options=options)					# Use chromedriver.exe to read website with the above options
driver.get (link)  														# Read link
time.sleep (2)  														# Wait till the full site is loaded


Wait for the specific element to load

element = driver.find_element_by_id("analyst-estimate")
driver.execute_script("arguments[0].scrollIntoView();", element)

driver.execute_script("return arguments[0].scrollIntoView(true);",
  WebDriverWait(driver, 20).until(EC.visibility_of_element_located((
    By.XPATH, "//div[@class='search-pagination']"))))


time.sleep (1)


Read page with html.parser and to the further scraping

soup = BeautifulSoup(driver.page_source, 'html.parser')

Close the chromedriver window

driver.quit ()


all searches also possible for more elements wiht find_elements_*

driver.find_element_by id												# Search for a id in html
driver.find_element_by_name												# Search for name-field (eg. in input-tag)
driver.find_element_by_xpath											# Search by xpath of the element (right mouse blick in Chrome Dev Tools)
driver.find_element_by_link_text										# Search for (full) text within the tags
driver.find_element_by_partial_link_text								# Search for (part) text within the tags
driver.find_element_by_tag_name('h1')									# Search for a tag in html
driver.find_element_by_class_name("cont")								# Search for a specific class name in html
driver.find_element_by_css_selector('p.content')						# Search for tag and selector


Read the whole text of an article / element

  tmpText = soup.find("article", {"itemprop": "description"})
  if tmpText != None:
    wText = "\n".join(list(tmpText.stripped_strings)[1:])

MODULE - REQUESTS - workings with APIs and reading websites

https://realpython.com/python-api/
import module requests

import requests

import module json

import json

Request api from the generated api-link

resp = requests.get (url)


GET-request with payload and headers

resp = requests.request("GET", url,
                        json=payload,
                        headers=headers)


Return the statuscode of the response

resp.status_code
    # 200: ok
    # 201: resource created
    # 400: bad request (wrong or missing some informations)
    # 401: Unauthorized (requires some additonal permissions)
    # 404: Not Found (requested resource does not exist)
    # 405: Method Not Allowed (wrong HTTP Method - eg. POST instead of GET)
    # 500: Internal Server Error: something broken on the server site


shows the statuscode in textform (eg. OK, Not Found)

resp.reasons

used headers for the request

resp.request.headers

header from the response

resp.headers

show the content-type of the resonse (eg. json, image/jpeg)

resp.headers["Content-Type"]

response in text-form

resp.text

response in json-form

resp.json()

Pretty-Print formatted json-file

print(json.dumps(resp,indent=4,sort_keys=True))

Read the value for the 1 element and the attribute xyz

result = api[0]["xyz"]


Using requests_html

pip install requests-html                         => Install module
from requests_html import HTMLSession             => Import requests_html
s = HTMLSession()                                 => Define a html-session
r = s.get(url, headers=HEADERS)                   => Define response
r.html.find("title", first=True).text             => title of in the html-tag
r.html.find("span#wob_tm", first=True).text       => Span-value with id = "wob_tm"
r.html.find("div.clX", first=True)                => Div-element with class = "clX"
r.html.find("div.clX span.clY",first=True).text   => Span-element with class="clY" ind Div-element with class="clX"


Using request to get the source code or text from a page

resp = requests.get("https://api.buyproxies.org/?a=showProxies&pid=175649&key=168281a74e7bb3f0dc0feb4f37ec6de5&port=12345")
proxyList = resp.text.split("\n")


Using requests to download a file

url = "https://xyz.mp3"
doc = requests.get(url)
with open('work.mp3', 'wb') as f:
  f.write(doc.content)


Using requests for reading websites

ua = UserAgent(verify_ssl=False)
userAgent = ua.random
HEADERS = {"User-Agent": userAgent}
page = requests.get (link, headers=HEADERS)
page = requests.get(site,timeout=5)               => try to read the website for max 5sec
soup = BeautifulSoup (page.content, "html.parser")


Using requests to read image from web, save and build hashValue

  headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36'} # This is chrome, you can set whatever browser you like
  resp = requests.get(img, headers=headers, stream=True)
  print(resp.status_code)
  with open(workerPic, 'wb') as f:
    shutil.copyfileobj(resp.raw, f)
  del resp
  img = Image.open(resp.raw)
  workihashImg = imagehash.average_hash(img)
  print(workihashImg)

MODULE - SCRAPY - webscraping using spiders

Tutorial: https://docs.scrapy.org/en/latest/intro/tutorial.html
Video-Tutorial: https://www.youtube.com/c/zytedata/playlists (Playlist - Learn Scrapy) Video-Tutorial2: https://www.youtube.com/watch?v=s4jtkzHhLzY (Scrape a Whisky-Shop)
install the scrapy module

pip install SCRAPY

creates the project and folder-structure

scrapy startproject indeedScraper

create tilesSpider.py with scrapy-template

scrapy genspider tilesSpider magnattiles.com

scrape a webpage in command line and use the console for further investigations

scrapy shell http://quotes.toscrape.com/random

fetch a new site from an allready scraped page with "scrapy shell" before

fetch("newURL")

shows the full html-code of the page

print(response.text)

select element "small" with class="author" and shows the html-code

response.css("small.author").extract()

return only the text for the element+class for the first element

response.css("small.author::text").extract()[0]

return the text for the first element for span + class="text"

response.css("span.text::text").extract_first()

return all anchor-tag texts in a list

response.css("a.tag:lst:text").extract()

find elems a in li with class "next" and get the content from the href-element

response.css("li.next > a::attr(href)").extract_first()

extract href from all anchor-tags inside div.quote => span => a

response.css("div.quote > span > a::attr(href)").extract()

combine the url from the initial scrapy shell with the string-path

response.urljoin("/page/2/")

read text-data when the response is i a json-format

data = json.loads(response.text)


Create spider skeleton on command line (quotesName = spider-name, toscrape.com = site to scrape)

scrapy genspider quotesNAME toscrape.com

run spider from command line

scrapy runspider quotesNAME.py

run spider and store result into items.json

scrapy runspider quotesNAME.py -o items.json

shows the duplicate elements which are filtered

'dupefilter/filtered': 50,

from second video-tutorial above
open scrapy shell

scrapy shell

get the first div-element with that class

response.css("div.product-item-info").get()

get the first li-element in the ul-element with class "products"

response.css("ul.products li")

assign all div-elements with that class to the variable "products"

products = response.css("div.product-item-info")

outputs the number of alle div-elements with that class on the loaded page

len(products)

from the products-variable get the first a-element with this class

p.css("a.product-item-link").get()

return the text from the element (first a-element with this class)

p.css("a.product-item-link::text").get()

return the text from all elements (all a-elements with this class)

p.css("a.product-item-link::text").getall()

return the text from all elements (all span-elements with this class)

p.css("span.price::text").getall()

return the text from the element (first span with class "price" and replace the €-symbol)

p.css("span.price::text").get().replace("€","")

return the link from the element (first href-link with this class)

p.css("a.product-item-link").attrib["href"]

second way to the the href-value

p.css("a.product-item-link::attr(href)").get(

return value of attribut "dpsku" in the a-tag with class "button"

p.css("a.button::attr(dpsku)").get()


start scrapy with the spider "whisky"

scrapy crawl whisky

run spider "whisky" and output result to json-file (overwrite the json-file)

scrapy crawl whisky -O whisky.json

same as above (only appending the results and not overwriting)

scrapy crawl whisky -o whisky.json

outputing as csv

scrapy crawl whisky -O whisky.csv

outputing as xlsx (scrapy-xlsx must be installed + settings.py changed: https://pypi.org/project/scrapy-xlsx/)

scrapy crawl whisky -O whisky.xlsx

find element with class "action next" (spaces have to be replaced with ".")

response.css("a.action.next").get()

store the link for the next page

nextPage = response.css("a.action.next").attrib["href"

go to next page and do parsing

yield response.follow(nextPage, callback=self.parse)

automatic delay between requests

settings.py: AUTOTHROTTLE_ENABLED = True

The initial download delay

settings.py: AUTOTHROTTLE_START_DELAY = 5

The maximum download delay to be set in case of high latencies

settings.py: AUTOTHROTTLE_MAX_DELAY = 60

ignore robots.txt

settings.py: ROBOTSTXT_OBEY = False

set download-delay (to prevent site-checkings)

settings.py: DOWNLOAD_DELAY = 3


when scraping with scrapy_selenium

DOWNLOADER_MIDDLEWARES = {
    'scrapy_selenium.SeleniumMiddleware': 800
}
#Selenium
from shutil import which
SELENIUM_DRIVER_NAME = 'chrome'
SELENIUM_DRIVER_EXECUTABLE_PATH = r"C:\Users\Polzi\Documents\DEV\Python-Private\chromedriver.exe"
# '--headless' if using chrome instead of firefox
SELENIUM_DRIVER_ARGUMENTS = ['--headless']


get text (with div and a combination)

response.xpath("//div[@class='vendor']//ancestor::a/text()").get()

get href-value (from a element with class)

response.xpath("//a[@class='next page-numbers']/@href").get()

get the aria-label-value

response.xpath("//a/@aria-label).get()

get href-element (from a element and existing selector "tmp")

tmp.xpath(".//a/@href").get()


Example spider

import scrapy
class QuotesnameSpider(scrapy.Spider):
    name = 'quotesNAME'                                 # name of the spider
    custom_settings = {                                 # define custom settings - eg. encoding utf-8 for german umlaute
        "FEED_EXPORT_ENCODING": "utf-8"
    }
    allowed_domains = ['toscrape.com']                  # allowed domains (no http in front of the domain)
    start_urls = ['http://quotes.toscrape.com/random']  # urls to scrape

    def parse(self, response):
        self.log("I just visited: " + response.url)     # info-output when program is running
        yield {                                         # store scraped informations in a dictionary
            "author_name": response.css("small.author::text").extract()[0],
            "text": response.css("span.text::text").extract_first(),
            "tags": response.css("a.tag::text").extract(),
        }


Example spider (with direct api-access on an infinite page)

import scrapy
import json

class QuotesScrollSpider(scrapy.Spider):
    name = 'quotesINFINITE'                                     # name of the spider
    apiURL = "https://quotes.toscrape.com/api/quotes?page={}"   # standard URL with page variable
    start_urls = [apiURL.format(1)]                             # create the start-url with page = 1

    def parse(self, response):
        data = json.loads(response.text)                        # read the json-page
        for quote in data["quotes"]:                            # iterate trough all quotes
            yield {                                             # return the needed values
                "authorName": quote["author"]["name"],
                "text": quote["text"],
                "tags": quote["tags"]
            }
        if data["has_next"]:                                    # check if next page has to be read
            nextPage = data["page"] + 1
            yield scrapy.Request(url=self.apiURL.format(nextPage), callback=self.parse)


Import module

import ftplib

Create FTP instance

ftp = ftplib.FTP()

Define encoding

ftp.encoding = "utf-8"

Connect to ftp-address

ftp.connect(ADDR)

Login with username and password

ftp.login(USERNAME, PW)

Change working directory eg. to tmp

ftp.cwd("tmp/")

Show actual working directory

ftp.pwd()

Show files in actual working directory

ftp.dir()

Quit ftp instance

ftp.quit()


Upload file with ftp

with open(fn, "rb") as file:
  ftp.storbinary('STOR test.png', file)

MODULE - PYQT - making GUIs

see Codemy.com https://www.youtube.com/channel/UCFB0dxMudkws1q8w5NJEAmw see Python-Diverses/PyQt
Install Py Qt Designer

https://build-system.fman.io/qt-designer-download

Installation

pip install pyqt5


QtWidgets / Different Widgets

import PyQt5.QtWidgets as qtw   # Import QtWidgets
qtw.QApplication                # Defines Application
qtw.QLabel                      # Defines Label
qtw.QWidget                     # Defines Widget
qtw.QHBoxLayout                 # Arranges horizontally from left to right
qtw.QVBoxLayout                 # Arranges vertically from top to bottom
qtw.QGridLayout                 # Arranges in grid from on x and y axes
qtw.QFormLayout                 # Field for forms - 1col are titles, 2col are fields,rbs,cb
qtw.QPushButton                 # Defines PushButton
qtw.QDialog                     # Module for dialog windows
qtw.QDialogButtonBox            # Module for button boxes in dialog windows
qtw.QLineEdit                   # Defines input field with text
qtw.QMainWindow                 # Module for main windows
qtw.QStatusBar                  # Module for status bar
qtw.QToolBar                    # Module for tool bar
qtw.QMessageBox                 # Module for Message Boxes
qtw.QCommandLinkButton          # Module for Command Link Button
qtw.QCheckBox                   # Module for CheckBox
qtw.QDialog                     # Module for Dial Widget
qtw.QFileDialog                 # Module for File Dialog


Main Window

app = QApplication(sys.argv)                            # Create an instance of the application
w = QWidget()                                           # Initialize a window
w.setWindowTitle('Windows Title')                       # Title of the windows
w.setGeometry(200, 200, 380, 180)                       # Define windows (1:x-coord,2:y-coord,3:width,4:height)
w.move(50, 15)                                          # Place / move windows to coodinates 60,15
msg = QLabel('<h1>Hello World!</h1>', parent=window)    # Define label / message in format h1
msg.move(50, 15)                                        # Place / move labe to coordinates 60,15
windows.show()                                          # Show the application GUI - schedules a paint event
sys.exit(app.exce_())                                   # Start the app and close it with sys.exit


Layouts

qhLayout = QHBoxLayout()                                # Defines horzontal box layout
qvLayout = QVBoxLayout()                                # Defines vertical box layout
qgLayout = QGridLayout()                                # Defines grid layout
qfLayout = QFormLayout()                                # Defines form layout
btn1 = qtw.QPushButton(self)                            # Define Push-Button
qhLayout.addWidget(btn1)                                # Add Button on Layout
qvLayout.addLayout(qhLayout)                            # Add horizontal Box Layout in vertical Layout
lbl = qtw.QLabel('Name:', self)                         # Define Label
inp = qtw.QLineEdit(self)                               # Define Input-Field
qfLayout.addRow(lbl, inp)                               # Add row in form (Label + Input-Field)
qgLayout.addWidget(QPushButton('Button1'),1,0)          # Defines one or more buttons and set it to position 1,0
qgLayout.addWidget(QPushButton('Button1'),2,1,1,2)      # Set buttons to position 2,1 and span it 1row/2cols
w.setLayout(layoutXY)                                   # Sets layout dimensions to window

=> Button / QPushButton

btn = qtw.QPushButton(self) # Define Push-Button layout.addWidget(btn) # Add ComboBox to layout btn.setText("Calc") # Set Text of button btn.setEnabled(False) # Deactivate the button btn.clicked.connect(self.btnPressedCalc) # Set Function which should run when the button is clicked
2nd method to assign a function (short method)

btn = qtw.QPushButton(self, clicked = lambda: self.btnPressedCalc())

btn.setStyleSheet("QPushButton {color: red;}") # Set color of button to red btn.setStyleSheet("QPushButton {color: #797979;}") # Set color of button to hex-code

=> Label / QLabel

lbl = qtw.QLabel("Name:", self) # Define Label layout.addWidget(lbl) # Add Label to layout lbl.setText("xyz") # Set text of label lbl.setPixmap(self.pixMap) # Show picture in label

=> Input-Field / QLineEdit

inp = qtw.QLineEdit(self) # Define Input-Field layout.addWidget(inp) # Add input field to layout inp.clear() # Delete content of input-field inp.text() # Get actual text from the input-field inp.textChanged.connect(self.inputName) # Call function when something is changed

=> ComboBox / QComboBox

cb = qtw.QComboBox(self) # Create ComboBox l = [("AAPL",23),("BRK",47),("CYP",78)] # Define list with shown value and data behind (eg. AAPL and 23) cb.addItems(l) # Add list of items to combobox as content cb.addItem(s) # Add one item to combobox as content layout.addWidget(cb) # Add ComboBox to layout layout.addRow("Name:",cb) # Add ComboBox to form-layout cb.currentText() # Actual selected value (AAPL, BRK, CYP) cb.currentData() # The data of the selected item (23, 47, 78) cb.currentIndex() # Index-number of the selected item idx = cb.findText("xyz") # Find index of the element in the combobox cb.setCurrentIndex(idx+1) # Set element the found element by index cb.setEnabled(False) # Disable or enable combobox cb.clear() # Clear / empty combo box cb.activated.connect(self.funcClicker) # Call function when entry in combobox is selected cb = qtw.QComboBox(self, # Create editable ComboBox editable=True, insertPolicy=qtw.QComboBox.InsertAtBottom

=> List / QListWidget

lw = qtw.QListWidget(self) # Create List lw.addItems(l) # Add list of items to listwidget as content lw.addItem(s) # Add one item to listwidget as content lw.currentRow() # Get index of current row lw.takeItem(1) # Delete / take 2nd item from the list lw.count() # Number of items in the list lw.item(idx) # Take the element from the idx-position lw.item(idx).text() # Get text of the element

=> SpinBoxes / QSpinBox, QDoubleSpinBox

sp = qtw.QSpinBox(self, # create spinbox value=10, # start value maximum=100, minimum=0, singleStep=5, # steps between values prefix="#", # prefix before the value suffix=" Order" # suffix behind the value
Add SpinBox to layout

layout.addWidget(sp)

Add SpinBox to form-layout

layout.addRow("Name:",sp)

sp = qtx.QDoubleSpinBox (self) # create spinbox for floats

=> Textfield / QTextEdit

txt = qtw.QTextEdit(self) # create Textfield layout.addWidget(txt) # add textfield to layout txt.toPlainText() # convert Text to plain text txt.setPlainText("xyz") # set Textfield to xyz txt = qtw.QTextEdit(self, lineWrapMode = qtw.QTextEdit.FixedColumnWidth, lineWrapColumnOrWidth = 50, placeholderText="ABC", readOnly=False)

=> RadioButtons / QRadioButton

rb = qtw.QRadioButton(self) # define radio button rb.isChecked() # return true if radio-button is selected rb.text() # outputs text of the radio button rbHam.setChecked(True) # check radio button (eg. for default setting when opening) rb.setText("xyz") # change name of the radio button to xyz rb.toggled.connect(lambda: self.func()) # call function when radio-button is pressed

=> CheckBoxes / QCheckBox

cb = qtw.QCheckBox(self) # define checkbox cb.setChecked(True) # check checkbox cb.isChecked() # return true if check-box is checked cb.checkState() # return 0 when unchecked, return 1 when checked cb.stateChanged.connect(lambda: self.checked()) # call function when state of checkbox change cb.toggled.connect(lambda:self.checked()) # call function when checkbox is pressed

=> DialWidget / QDial

dw = qtw.QDial # define dial widget dw.value() # get the value of the dial widget dw.setMinimum(10) # set minimum to 10 dw.setMaximum(360) # set minimum to 10 dw.setRange(100,200) # set range (with min and max) dw.setValue(50) # set (default-)value dw.setNotchesVisible(True) # put scale on the widget dw.valueChanged.connect(lambda: self.func()) # call function when dial widget is turned around

=> Statusbar

statbar = qtv.QStatusBar(self) # create Statusbar statbar.showmessage("xyz") # show message in status bar

=> MessageBox

msg = QMessageBox() # Define Message Box msg.setWindowTitle("Infobox") # Set Title of Message-Window msg.setText("Your Todo List has been saved!") # Set Text of Message-Box msg.setIcon(QMessageBox.Information) # Set Info-Icon for Message-Box

=> FileDialog / QFileDialog

get Filename,

1st parameter: name of the window-filedialog eg. "Open File"

  # 3rd parameter: which files can be selected eg. "All Files" for all files
  # returning a tuple (name of the file and selected extension)

fName = qtw.QFileDialog.getOpenFileName(self, "Open File", "C:\TEMP", "All Files (*)")

  fName[0]    # to get only the choosen filename


Menu or Icon / QAction

  self.menu = qtw.QAction(MainWindow)
  self.menu.triggered.connect(self.actPressedOpen)    => Set Function which should run when the menu is clicked

class Dialog(QDialog): # Defines a class dialog which inherits from QDialog buttons = QDialogButtonBox() # Define buttons for a dialog window buttons.setStandardButtons(QDialogButtons.OK) # Defines a ok button on a dialog window class Window(QMainWindow): # Defines a class Windows which inherits from QMainWindow


PyQt5Designer

pip install PyQt5Designer                 # Install PyQt5Designer
designer.exe                              # Find designer.exe in installed venv
<Form><Preview>                           # Show preview of the designed window
pyuic5 -x filename.ui -o filename.py      # Convert Qt-Designer file to a python-file


QtGui / Change GUI like font, size

import PyQt5.QtGui as qtg                               # Import Gui for changing fontsize
nameLabel.setFont(qtg.QFont("Helvetica", 18))           # Change font of the label


Open other Window / Dialog
Click button one window to open another window

  from Window2from24 import Ui_SubWindow      # import Class / Window from other py-file
  from Dialog24 import Ui_Dialog              # import Class / Dialog from other py-file
  ...
  def openWindow(self):
    self.window = QtWidgets.QMainWindow()     # Which widget should be opened QMainWindow / QDialog
    self.ui = Ui_SubWindow()                  # Class Name of the window which sould be opened
    self.ui.setupUi(self.window)
    self.window.show()
    self.ui.label.setText("New")              # Access some elements on the opened window
  ...
  self.pb.clicked.connect(self.openDialog)    # call the function where the window is opened
  self.pb2.clicked.connect(self.openWindow)   # call the function where the dialog is opened


Load ui-file in py-file

uic.loadUi("27LoadUIFile.ui",self)            # load the ui-file
self.l = self.findChild(qtw.QLabel, "label")  # assign label from ui
self.btn = self.findChild(qtw.QPushButton, "pushButton")    # assign push button from ui


Simplest App

import PyQt5.QtWidgets as qtw
class MainWindow(qtw.QWidget):
  def __init__(self):
    super().__init__()
    self.setWindowTitle("Name XY")
if __name__ == '__main__':
  app = qtw.QApplication(sys.argv)
  mw = MainWindow()
  mw.show()
  sys.exit(app.exec_())

MODULE - TKINTER - making GUIs


Tutorial with deep explanation

https://www.youtube.com/watch?v=YXPyB4XeYLA

Import tkinter module

from tkinter import *

Create root window

root = Tk()

Name of the window in the title

root.title("Header Title")

Define icon for the windows (left upper corner)

root.iconbitmap("pic.ico")

Define size of the window

root.geometry("400x400")

Creating a label widget

myLabel = Label(root,text="Hello World!")

Showing myLabel on the screen

myLabel.pack()

Showing myLabel on grid place 0/0 (left upper corner) with 10pixel padding from above

myLabel.grid(row=0,column=0,pady=10)

Creating a status label widget with border (bd=1), sunken relief and anchor position right (east)

status = Label(root,text="St",bd=1,relief=SUNKEN,anchor=E

Showing status bar - with using maximum space from left to right (west to east) with sticky

status.grid(row=2,column=0,columnspan=3,sticky=W+E)

Creating a button widget with a name

myButton = Button(root,text="Click Me!")

Creating a deactivated, greyed button widget with a name

myButton = Button(root,text="Click Me!",state=DISABLED)

Creating a button widget with a name and size 50 * 50

myButton = Button(root,text="C",padx=50,pady=50)

Creating a button which will excute the myfunc-function when its clicked

myButton = Button(root,text="T",command=myfunc)

Creating a button which will excute the func-function with a numeric parameter

myButton = Button(root,text="T",command=lambda: func(nr))

Creating a button with background blue and foreground white

myButton = Button(root,text="col",bg="blue",fg="white")

Create a quit-button on the window

QuitButton = Button(root,text="exit",command=root.quit)

Creating a entry field - with width 50 and borderwidth 5 - and showing the field

myEntry = Entry(root, width=50, borderwidth=5).pack()

Showing myEntry as grid in row/col = 0/0 and spanned over 3 columns

myEntry.grid(row=0,column=0,columnspan=3)

Showing myEntry as grid in row/col = 0/0 and with pading in x/y-axes

e.grid(row=0,column=0,padx=10,pady=10)

Read content of the entry field

myEntry.get()

Shows a default value in the entry field

myEntry.insert(0,"Default-Text")

Insert a number into the entry field

myEntry.insert(0, number)

Delete content of the entry field

myEntry.delete(0,END)

Mainloop of the program

root.mainloop()

Close window "wind"

...command=wind.destroy


Pics in Tkinter-Windows

from PIL import ImageTk,Image								# Additional module Pillow neede (install with <pip install Pillow>
my_img = ImageTk.PhotoImage(Image.open("demo.png"))			# Reads Image wiht Pillow-module (Tkinter can not read png/jpg files)
my_label = Label(image=my_img)								# Creating a Label with the image
my_label.pack()												# Showing the pic


Define Frame

frame = LabelFrame(root,text="Name_Frame",padx=5,padx=5)	# Creating a frame with padding 5/5 (inside the frame)
frame.pack(padx=10,pady=10)									# Showing the frame with padding 10/10 (outside the frame)
button = Button(frame,text="Name_Button")					# Creating button in the frame (not root window)
button.grid(row=0, column=0)								# Showing the button in the frame as grid (with frames is mixing between pack and grid possible)


RadioButtons

r = IntVar()												# Define variable for radio button - when string use StringVar()
r.set("2")													# Choose default radio button
Radiobutton(root,text="Opt1",variable=r,value=1).pack()		# Define and show radiobutton 1
Radiobutton(root,text="Opt2",variable=r,value=2).pack()		# Define and show radiobutton 2
r.get()														# Get value from choosen radio button


MessageBox

messagebox.showinfo("Title of PopUp", "Hello World!")		# Show Messagebox
different boxes available: showinfo, showwarning, showerror, askquestion, askokcancel, askyesno
response = messagebox.askquestion ("Bla", "BlaBla!")		# Show AskQuestion Box and store response


Check about response of user in message box

if response == 1: Label(root, text= "YES").pack()
elif response == 0: Label(root, text= "NO").pack()


More Windows

def open()													# Function for opening the 2nd window
	global my_img											# Global definition necessary - otherwise no pic would be drawn
	sec = TopLevel()										# Define 2nd window
	sec.title("2nd Window")									# Title for 2nd window
    img = ImageTk.PhotoImage(Image.open("demo.png"))		# Define image
    Label (top, image=img).pack()							# Show image in 2nd window
btn=Button(root,text="Open 2nd Window",command=open).pack()	# Button for opening 2nd window per click


FileDialog

root.filename = filedialog.askopenfilename(					# Define filname dialog
    initialdir="/Users/Bla/images",							# Define initial dir for the file dialog
    title="Select file",									# Title of the file dialog
    filetypes=(("png files","*.png"),("all files","*.*")))	# Define filetypes in dialog (first name, second which files, default first entry)
my_label = Label(root, text=root.filename).pack()			# After choosing a file - show the filename + filepath


Slider

vertical = Scale(root,from_=0,to=200)						# Define vertical slider
vertical.pack()												# Showing vertical slider
horizontal = Scale(root,from_=0,to=200,orient=HORIZONTAL)	# Define horizontal slider
horizontal.pack()											# Showing horizontal slider


Checkboxes

Checkbutton(root,text="Check1",variable=var).pack(			# Define and Show a checkbox
c.select() or c.deselect()									# Select or Deselect per default the checkbox


DropDown

options=["Mo","Tu","We","Th","Fr"]							# Define options for dropdown menue
var = StringVar()											# Define variable for dropdown menue
var.set("Mo")												# Set default for dropdown menue
OptionMenu(root,var,*options).pack							# Define dropdown menue

MOUULE - KIVY - making GUIs


https://kivy.org/#home

Kivy documentation

https://www.youtube.com/watch?v=dLgquj0c5_U&list=PLCC34OHNcOtpz7PJQ7Tv7hqFBP_xDDjqg

Training from

python -m pip install kivy[base] kivy_examples

Install kivy

Use template.spec (create with: pyinstaller *.spec - creation not as onefile possible with pyinstaller)

Create EXE


Simple Window with Label

import kivy                         # Import kivy base
from kivy.app import App            # Import App
from kivy.uix.label import Label    # Import Label class

class MyApp(App):                   # Define App-Class
    def build(self):
        return Label(text="Hello", font_size = 72)  # Print label with font_size 72)

if __name__ == "__main__":          # Start App / draw windows with label
    MyApp().run()


Input fields and button with Grid-Layout

from kivy.uix.gridlayout import GridLayout      # Import Grid-Layout class
from kivy.uix.textinput import TextInput        # Import TextInput class
from kivy.uix.button import Button              # Import Button class

class MyGridLayout(GridLayout):                 # Define Grid Layout
    def __init__(self, **kwargs):                       # Initialize infinite keywords
        super(MyGridLayout,self).__init__(**kwargs)     # Call grid layout constructor
        self.cols = 2                                   # Set columns to 2 (draws all elements in 2 columns)

        self.add_widget(Label(text="Name: "))           # Add Label-Widget with text="Name" (as one-liner)
        self.name = TextInput(multiline=False)          # Add Input Box "name" with only one line (multiline = True would me more input lines)
        self.add_widget(self.name)                      # Assigment as two-liner

        self.submit = Button(text="Submit", font_size=32)   # Add Button-Widget with text="Submit" and font-size 32
        self.submit.bind(on_press=self.press)               # Bind the button to a function (when pressed the function press will be called)
        self.add_widget(self.submit)

    def press(self,instance):
        print(self.name.text)   # Print the text of the name-input field in the terminal
        self.add_widget (Label (text=f"New Label with text {self.name.text}")       # Create new Label and write the text to it


Combine 2 Grids with 2 columns and 1 column

    super(MyGridLayout,self).__init__(**kwargs)
    self.cols = 1                                       # define standard grid with 1 column

    self.secondGrid = GridLayout()
    self.secondGrid.cols = 2                            # define second grid with 2 columns

    self.secondGrid.add_widget(Label(text="Stock: "))   # define content for the secondGrid
    self.stock = TextInput(multiline=False)
    self.secondGrid.add_widget(self.stock)
    self.add_widget (self.topGrid)

    self.submit = Button(text="Levermann Score", font_size =32)     # define button with only 1 column
    self.submit.bind(on_press=self.levermannCalc)
    self.add_widget(self.submit)


Set height and width of the widgets

    self.row_force_default=True     # necessary - otherwise no adjustment will be done for the widget
    self.row_default_height=120     # sets the height to 120 pixel

    # set height for the whole widgets in topGrid
    self.topGrid = GridLayout(
        row_force_default=True,
        row_default_height=40
    )

    # set width and heigth for individual widget
    self.top_grid.add_widget(Label(text="Name: ",       # set the name of the widget
                                   size_hint_y=None,
                                   height=10,           # define height
                                   size_hint_x=None,
                                   width=200))          # define width


Kivy Design Language
Seperate program in python from the gui-design (in a kv-file)

# python-file:
from kivy.uix.widget import Widget              # import widget module
from kivy.properties import ObjectProperty      # import object property
class MyGridLayout(Widget):
    inpfield = ObjectProperty(None)             # define an object and initialize empty
    def method_function(self):                  # defube a method / function in the the gui (eg. for a button)
        print(f"something")
class MyApplication(App):                       # define the application class (kv-file must have the same name = MyApplication.kv)
    def build(self):
        return MyGridLayout()                   # name of the first line in the kv-file
if __name__ == "__main__":
    MyApplication().run()                       # run the gui

# kv-file (name must be in this case "MyApplication.kv")#
# intention is here very similar (only tabs - no spaces)
<MyGridLayout>
    inpfield : inpfield                         # initialize the input field

    GridLayout:                                 # define the main grid layout with one column
        cols: 1
        size: root.width, root.height           # take the root width/height (otherwise the objects will be placed in the very left bottom)
        row_force_default: True                 # necessary - otherwise no adjustment will be done for the widget
        row_default_height: 120                 # set the height (of the whole grid) to 120 pixel)

        GridLayout:                             # define second (nested) grid with two columns
            cols: 2
            row_force_default: True
            row_default_height: 40              # set the height (of every object in this grid to 40 pixel)

            Label:
                text: "Stock Ticker"            # name of the label
            TextInput:
                id: inpfield                    # if of the inputfield
                text: blablabla                 # default value for the inputfield
                multiline: False                # no multiline input field

        Button:                                 # define button as part of the first gridlayout (so only 1 column)
        text: "Button Text"                     # button text
        font_size: 32                           # font-size of the button text
        on_press: root.method_function()        # on button press call function "method_function" from the py-file
        size_hint_y: None
        height: 50                              # set height of the button to 50 pixel


Kivy Builder

# link kv-files to python-file
from kivy.lang import Builder       # import builder module
Builder.load_file("whatever.kv")    # link the kv-file to the python-file (so the App-classname must not be the same as the kv-filename)


Coloring buttons and input fields

# using rgb-color for the coloring
background_normal: ''                           # necessary to use this - otherwise the color will be mixed with the intial grey default
background_color: (1,0,0,1)                     # coloring the element in red
background_color: (238/255,250/255,0/255,1)     # using a rgb-color for the coloring

# using hexadecimal coloring
#:import utils kivy.utils                               # importing the kivy-utils in the very first line of the kv-file
background_color: utils.get_color_from_hex("#eefa00")   # coloring the element with this hex-code


Box Layout

BoxLayout:
    orientation: "horizontal"                   # arrange the object horizontal (also vertical possible)
    size: root.width, root.height               # use the full place of the window
    spacing: 20                                 # 20 pixels between the elements
    padding: 50                                 # 50 pixels padding around the elements

    Button:
    text: "blablabla"                           # define button in the box-model
    size_hint: (0.5,0.5)                         # make the button the half-size (x- and y-axes) - working also responsive when the windows size is changed

    size_hint: (None, None)                     # set fixed size for the button (200 pixels width and 50 pixels height)
    width: 200
    height: 50
    pos_hint: {"center_x": 0.5}                 # center the element horizontal


Inheritance, Widget Properties

# Define the global settings in sepearte section in the kv-file
<Button>                            # these settings are used for all buttons in the window
    font_size: 32
    background_normal: ''
    background_color: (0,0,1,1)
<TextInput>                         # this color is used for all input fields in the window
    background_color: (150/255,150/255,150/255,1)

# But when some individual setting for an element is needed this is possible
<MyLayout>
    BoxLayout:
        TextInput:
            background_color: (1,0,0,1)     # background color for this input-field will be different then the global settings aboth


Color Labels

# Works a little bit different that the normal coloring of objects like buttons or input fields
Label:
    background_color: (0,0,1,1)             # choose the color for the label
    canvas.before:                          # this whole code-part has to be added - otherwise there will be no label coloring
        Color:
            rgba: self.background_color
        Rectangle:
            size: self.size
            pos: self.pos

# Normal usage of other properties
    text: "Name"                # name of the label
    font_size: 45               # font size of the label
    color: (0,1,0,1)            # color of the text of the label
    bold: True                  # text will be bold written
    italic: True                # text will be italic written
    outline_color: (0,0,0)      # set the color for the border of the text
    outline_width: 2            # width of the border of the text


Color Background

# 1st method in kv-file: make a white background color
<MyLayout>
    canvas.before:
        Color:
            rgba: (1,1,1,1)
        Rectangle:
            pos: self.pos
            size: self.size

# 2nd method in py-file
from kivy.core.window import Window

class AwesomeApp(App):
    def build(self):
        Window.clearcolor = (1,1,1,1)
        return MyLayout()


Set window size

# set windows size to width 550 and height 300
from kivy.core.window import Window
class MyLayout(Widget):
    Window.size = (550, 300)


DropDown box with spinner

kv-file:
    Spinner:
        id: spinner_id
        text: "Favorite"
        values: ["Pepperoni","Cheese","Mushroom","Pineapple"]
        on_text: root.spinner_clicked(spinner_id.text)

py-file:
    def spinner_clicked(self,value):
        self.ids.click_label.text = f"You selected: {value}"

MODULE - PYGAME - making games


Pygame documentation

http://pygame.org/docs/

Import of the pygame module

import pygame as pg

Initialize and setup screen

pg.init()

Define dimension

screen_dim = (1200,800)

Show screen with the defined dimensions

screen = pg.display.set_mode(screen_dim)

Define background color

bg_color = (230,230,230)

Screen filling with defined background color

screen.fill(bg_color)

Title of the game in the window

pg.display.set_caption("My Super Game")

Rectangle infos (left,top,width,height - (0,0,1200,800)

rect screen.get_rect()

Center of the Rectangle as tuple - (600,400)

rect.center

Size of the Rectangle as tuple - (1200,800)

rect-size

X-Dim left - (0)

rect.left

X-Dim right - (1200)

rect.right

Y-Dim top - (0)

rect.top

Y-Dim bottom - (800)

rect.bottom

Middle of X-Dim - (600)

rect.centerx

Middle of Y-Dim - (400)

rect.centery

Width of the rectangle - (1200)

rect.width

Height of the rectangle - (800)

rect.height

Define new rectangle

small_rect = pg.Rect(100,100,10,150)

Draw small rectangle in screen

pg.draw.rect(screen,(100,100,100,small_rect)

Loading an image

figure = pg.image.load("ship.png")

Getting the rect object from an image

figure.get_rect()

Positioning an image in the middle of the bottom screen

figure_rect.midbottom = screen_rect.midbottom

Drawing an image to the screen

screen.blit(figure, figure_rect)

Check if an event has happend (keystroke, mouseclick)

for event in pg.event.get():

Check if key is pressed

event.type == pg.KEYDOWN:

Some action when Right-Key is pressed

event.key == pg.K_RIGHT:

Some action when Space-Key is pressed

event.key == pg.K_SPACE:

Check if mousebutton is clicked

event.type == pg.MOUSEBUTTONDOWN

Findinig the mouse position

mouse_pos = pg.mouse.get_pos()

Check if mouse-cursor is ofer an object / rectangle

button_rect.collidepoint(mouse_pos)

Pygame groups
Group class for working with similar objects

from pygame.sprite import Sprite,Group

Making and filling a group - must inherit from Sprite

def Bullet(Sprite):

Method from the new group

def draw_bullet(self):

Method from the new group

def update(self):

Define group of elements

bullet = Group()

Define new instance

new_bullet = Bullet()

Add new instance to group

bullets.add(new_bullet)

Iterate through group instances and draw

from bullet in bullets.sprites(): bullet.draw_bullet()

Calls the methode update() on each member of the group

bullets.update()

Remove instance from group

bullets.remove(bullet)

Check if single object is overlapping with groupelements

prg.sprite.spritecollideany(obj,group)

Result-Dict with all overlapping elements of both groups

pg.sprite.groupcollide(group1,group2,True,True)

Define message

msg = "Play again!"

Defines message color

msg_col = (100,100,100)

Defines background color

bg_col = (230,230,230)

Defines font from system font

font = pg.font.SysFont(None,48)

Create an image of the message

msg_img = f.render(msg,True,msg_col,bg_col)

Read rect from message-image

msg_img_rect = msg_image.get_rect()

Pos the message-image in the middle of the window

msg_img_rect.center = screen_rect.center

Display the positioned message-image

screen.blit(msg_img, msg_img_rect)

MODULE - PYTHONCOM - make new formula in excel with python-function

https://www.youtube.com/watch?v=cYwn8Pu5eRg&feature=emb_logo
Import necessary modules

import pythoncom

Import client

import win32com.client

Import server

import win32com.server.register

Put everything in a class

class PythonObjectLibrary:

UniqueID for our object to register it with Windows

_reg_clsid_ = pythoncom.CreateGuid()

Register the object as an EXE

_reg_clsctx = pythoncom.CLSCTX_LOCAL_SERVER

Register the object as a DLL

_reg_clsctx = pythoncom.IMPROC_SERVER

Name of the project library

_reg_progid_ = "Python.ObjectLibrary"

Optional / Description of the library

_reg_desc_ = "This is our Python Obj Library"

Name the functions - this sees the user in excel

_public_methods_ = ["f1","f2,"f3"]

Define the function in the class

def f1(self, x, y):

Return the result of the function (to excel later)

return x + y

Another function - this time wie a range (more cells in excel)

def f2(self, myRange):

Create an instance of the range object passed trough

rng1 = win32com.client.Dispatch(myRange)

Change the object to an numpy-array (or an list)

rng1val = np.array(list(rng1.Value))

Return the sum of the range (to excel later)

return rng1val.sum()

Part of the code in the main-section

if __name__ == "__main__":

Register the library (when the program is started => the Library is registered in Windows)

win32com.server.register.UseCommandLine(PythonObjectLibrary)


Part in Excel

(in the Visual Basic Editor under Development Tools)
# Definition of the f1-function - now the user can use the function "f1" in excel
Function f1(x As Long, y As Long)
    f1 = VBA.CreateObject("Python.ObjectLibrary").f1(x, y)
End Function
# Definition of the f2-function - now the user can use the function "f2" in excel
Function f2(x As Range)
    f2 = VBA.CreateObject("Python.ObjectLibrary").f2(x)
End Function

MODULE - FLASK, ZAPPA, AWS - making an API


create basic api in flask

https://nordicapis.com/how-to-create-an-api-from-a-dataset-using-python-and-flask/

create basic api in flask, deploy to AWS with zappa and setup on Rapidapi

https://pythonforundergradengineers.com/deploy-serverless-web-app-aws-lambda-zappa.html#what-is-zappa

create basic api in flask, deploy to AWS with zappa and

https://towardsdatascience.com/develop-and-sell-a-python-api-from-start-to-end-tutorial-9a038e433966


Import the module

import flask

Creates a Flask application

app = flask.Flask(__name__)

Runs the program in debug mode (no reset necessary when changing a line of code)

app.config["DEBUG"] = True

"/" indicates where the result in the url will output - GET will read data (POST for writing data)

@app.route('/', methods=['GET'])

Runs the application

app.run()


Outputs a List with a dict to the server / url

@app.route('/api/v1/resources/books/all', methods=['GET'])	# Data will available at: http://127.0.0.1:5000/api/v1/resources/books/all
def api_all():
    return jsonify(books)	    # Converts and returns the list in JSON-Flask-format
app.run()
# results are available under http://127.0.0.1:5000/api/v1/resources/books/all


Filtering data

@app.route('/api/v1/resources/books', methods=['GET'])
def api_id():
    if 'id' in request.args:					# Check if an ID was provided as part of the URL
        id = int(request.args['id'])			# If ID is provided, assign it to a variable
    else: return "Error: No id field provided"	# If no ID is provided, display an error in the browser.
    for book in books:							# Loop through the data and match results that fit the requested ID
        if book['id'] == id: results.append(book)
    return jsonify(results)						# Use the jsonify function from Flask to convert our list of - Python dictionaries to the JSON format
app.run()
# results are available under http://127.0.0.1:5000/api/v1/resources/books?id=1


Check if API is called from the right access-place (like rapidapi.com)

(key is provided from the platform - eg. here from rapidapi.com)
    if "e6e45ed0-55e5-11eb-963e-ffe9fcdce080" not in request.headers:
        return "Error: Wrong API-Call - use www.rapidapi.com for calling the API!"


Using zappa and deploying to AWS (zappa is only working in a virtual environment)

pip install zappa											=> Install module
=> Create file and insert the credentials from AWS (creation see second link above)
create file credentials in "%UserProfile%\.aws
	[default]
	aws_access_key_id = XXXXXXXXXXXXXXXXXXXXXXXXXX
	aws_secret_access_key = XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
virtualenv xyz									# Create an virtual environment
xyz\Scripts\activate							# Activate virtual environment
go back to folder xyz and install all necessary modules with pip install
pip list										# Check if everything is installed in the folder
zappa init (when virutal environement is activated!)		# Create zappa_settings.json File (press return at any prompt)
    "dev": {
		"app_function": "app.app",				# Name of the py-file
		"aws_region": "eu-central-1",			# Nearest aws-region
        "profile_name": "default",				# Same as in aws-credentials at "%UserProfile%/.aws/credentials
        "project_name": "apicheck",
        "runtime": "python3.7",					# Automatic generated
        "s3_bucket": "zappa-gsoln7zyl"			# Automatic generated
    }
pip freeze > requirements.txt					# Create requirements.txt for all installed modules in the virtual environment
zappa deploy dev (when virutal environement is activated!)	# Deploy everything to AWS (when ended with link everything worked fine) (otherwise 502 error - often a module is missing)
	eg. https://xnujiyxsb1.execute-api.eu-central-1.amazonaws.com/dev/api/v1/incstat?ticker=FB
		https://xnujiyxsb1.execute-api.eu-central-1.amazonaws.com/dev/api/v1/summary?ticker=CAT
		https://xnujiyxsb1.execute-api.eu-central-1.amazonaws.com/dev/api/v1/profile?ticker=FB
zappa update dev (when virutal environement is activated! and actual directory is correct where the py-file is!)	# Update deployment on AWS (when something changed in the pyhton-program)
zappa status									# Show actual status
zappa tail										# Show actual loggin (in error cases)
zappa undeploy dev								# Undeploy everything on AWS


AWS part

see the APIs under AWS Api Geteway => region on the upper right site has to be the same as in the zappa_settings.json file#
check API-access
	select API (under API Gateway and APIs)
	click on second ANY link
	click Test (on the left side)
	select Method Get
	input path example under {proxy} eg. api/v1/incstat?ticker=FB
	click Test Button (at the bottom)

MODULE - FLASK, heroku, GUNICORN - making an API

https://stackabuse.com/deploying-a-flask-application-to-heroku/

ongoing deployment / update

git init (when .git is not in the folder of the api)
git add .
git commit -m "upd"
heroku login -i
heroku git:remote -a financerapidapi (name of the app on heroku)
git push heroku main

MODULE - FASTPI, UVICORN - making an app

https://www.youtube.com/watch?v=OFnOEIyqSRg&list=PLvzuUVysUFOuoRna8KhschkVVUo2E2g6G see Python-Diverse - TradingApp
import module

from fastapi import FastAPI

start reloading app (refreshed automatically)

uvicorn main:app --reload

for rendering the html-page

jinja2 templates

https://fastapi.tiangolo.com/advanced/templates/
framework for formatting (like bootstrap)

semantic-ui.com

https://semantic-ui.com/introduction/advanced-usage
for html-templates which can be used in several other html-files

layouts.html

https://www.tradingview.com/widget/advanced-chart/
for interactive candlebars on the webpage

javascript-code


simplest route

app = FastAPI()
@app.get("/")
def index():
  return {"message": "hello world!"}

MODULE - ICECREAM - print for debugging


Imort icecream

from icecream import ic

Show value of the variable in the form: ic| var1: 20

ic(var1)

Show when command is reached: ic| icecream_example.py:5 in func() at 01:44:20.394 (line / file / time)

ic()

Set Context Output => also shows line / file / time

ic.configureOutput(includeContext=True)

MODULE - TA-LIB - technical analysis of financial market

Documentation: https://mrjbq7.github.io/ta-lib/


Installation

Download whl file from here: https://www.lfd.uci.edu/~gohlke/pythonlibs/#ta-lib
(with corresponding python version (eg. cp39) and os (eg. adm64)
then: pip install *.whl (eg. pip install TA_Lib-0.4.19-cp39-cp39-win_amd64.whl)
then: pip install TA-Lib

MODULE - YFINANCE - date from Yahoo Finance

Docu and examples: https://aroussi.com/post/python-yahoo-finance
Installation (yfinance as yf not working - problems with pyinstaller)

import yfinance

Read data for ticker yahoo finance ticker symbol

dataYF = yf.Ticker(ticker)

Summary info for stock / cryptopcoin

dataYF.info

Price data as dataframe from/to-date

yf.download(ticker,"2020-01-01","2021-03-31))

Intraday price data as dataframe for 1 month and 60 minute interval

dataYF.history(period="1mo",interval="60m")

Intraday price data as dataframe for 1 day and 2 minute interval

dataYF.history(period="1d",interval="2m")

Financial data

dataYF.financials

Dividends (date and amount)

dataYF.dividends

Splits (date and split-factor)

dataYF.splits

Percent major holders

dataYF.major_holders

Biggest Insitutional holders

dataYF.institutional_holders

Balance Sheet Data

dataYF.balance_sheet

Cashflow Data

dataYF.cashflow

Earnings Data (Revenue, Income)

dataYF.earnings

Sustainability Informations (palm oile, military, adult, etc.)

dataYF.sustainability

Recommendations from different agencies

dataYF.recommendations

Calendar data

dataYF.calendar

ISIN-Number

dataYF.isin

MODULE - PLOTLY - data visualization


installation

pip install plotly

examples of graphing libraries

https://plotly.com/python/

see example for candlestick-chart

Python-Diverses / Plotly / example.py

MODULE - DJANGO - framework for webdevelopment


Video Training

https://www.youtube.com/watch?v=F5mRW0jo-U4&t=5043s

Example

C:\Users\Polzi\Documents\DEV\Python-Diverses\Django

installation

pip install django

create a blank django project

django-admin startproject pjName

run django server (start site on http://127.0.0.1:8000/)

python manage.py runserver

settings.py in django-folder

  BASE_DIR => directory of the django-project
  SECRET_KEY => individual key from django-project
  DEBUG => should be set to False in production
  INSTALLED_APPS => componente / pieces of the django-project
  DATABASES => database of the django-project

python manage.py startapp products

  # create a new app / component
  models.py => create models and link to the database
  add "products" to INSTALLED_APPS in settings.py


create superuser (eg. for login to /admin app/component)

python manage.py createsuperuser

prepeare migrations (when anything changed in the apps)

python manage.py makemigrations

migrate all changes

python manage.py migrate

now products can created in the /admin-page

add this: from .models import Product in admin.py of the products app / component

open django-sheel (eg. create products from the shell )

python manage.py shell

overview builtins

https://docs.djangoproject.com/en/4.0/ref/templates/builtins/

include html in an html-page (navbar.html)

{% include "navbar.html" %}

extend html-page (base.html)

{% extends "base.html"  %}

replace this block content

{% block content %} xyz {% endblock %}

add 2 to the value

{{ x|add:"2" }}

if statement

{{ if x in lst }}

elif statement

{{ elif x == 3 }}

else statement

{{ else }}

close if statement

{{ end if }}

capitalize first letter

{{ x|capfirst }}

uppercase whole string

{{ x|upper }}

capitalize and upper the whole string

{{ x|capfirst|upper }}


add a new module / component

- python manage.py startapp products    # create "products" folder
- create class in models.py of the products-folder
- import and register at admin-site in admin.py of the products-folder
- add products to INSTALLED_APPs in settings.py of the project-folder


add a new website

# create "pages" folder
  - python manage.py startapp pages
# eg. function with HttpResponse to output html-code
# for that function import is necessary with "from django.http import HttpResponse"
  - add view in views.py of the pages-folder
- add pages to INSTALLED_APPS in settings.py of the project-folder
# import view with "from pages.views import homeView"
# set path with eg. "path('', homeView, name="home")"
  - add pages/view in urls.py fo the project-folder


request and show page/view

# in urls.py the requests are stored
  eg. path('contact/', contactView)
# and in the views.py is the html-code stored
  eg. def contactView(*args, **kwargs):
        return HttpResponse("<h1>Contacts!</h1>")


using templates

# render the page in views.py
  eg. def contactView(request, *args, **kwargs):
        return render(request, "contact.html", {})
# create a folder eg. "templates" and put all html-pages in it
  eg. about.html, contact.html, home.html
# change DIRS in TEMPLATES in settings.py
  eg. to 'DIRS': [os.path.join(BASE_DIR, "templates")],


extends: template Inheritance

# create base-html
  eg. <h1>This is a navbar</h1>
      {% block content %}
        replace me
      {% endblock %}
# inside the about-html
  eg. {% extends "base.html"  %}
      {% block content %}
        <h1>About</h1>
        <p>This is a template</p>
      {% endblock %}
# "replace me" will get replace by the info from about-html from the block content


include: something in html

{% include "navbar.html" %}


render data in html

# render html-page with dictionary
  define myContext as Dictionary
  myContext = {
    "myText": "This is about me",
    "myNumber": 123,
    "myList": [123, 4242, 12313]
  }
  eg. return render(request, "about.html", myContext)
# access this dict in the html-page
  eg. {{ myText }}, {{ myNumber }}


for loop when rendering

# get myList from Dictionary in views.py
  {% for mySubItem in myList %}
    <li>{{ forloop.counter }} - {{ mySubItem }}</li>
  {% endfor %}

DOLT - Databases like Repos

https://www.dolthub.com/blog/2020-02-03-dolt-and-dolthub-getting-started/ Install on windows with msi-file from here: https://github.com/dolthub/dolt/releases
Config Mail

dolt config --global --add user.email [email protected]

Config Name

dolt config --global --add user.name "YOUR NAME"

Login and the key

dolt login

Clone DB-Repos

dolt clone dolthub/baseball-databank

Show tables in db

dolt ls

Read column description for table

dolt sql -q "describe Salaries"

Run some query on the table

dolt sql -q "SELECT * FROM Salaries"

Write query to table

dolt sql -r csv -q "SELECT * FROM Salaries" > salary-hr.csv

PYTHONANYWHERE


install python modules in specific version / environment

pip3.7 install --user --upgrade selenium

HEROKU - run python scripts in the cloud

Instruction: https://dev.to/josylad/how-to-deploy-a-python-script-or-bot-to-heroku-in-5-minutes-9dp
New => Create new app => give name with only lowercases

on Heroku: create App

Setting => Add buildpack => choose python

on Heroku: set python build path

Create Procfile with 2 lines => web: python appname.py, worker: python appname.py

in App-folder: create Procfile

Create or Copy the requirements.txt => eg. pip freeze> requirements.txt

in App-folder: create requirements.txt

when there is not git-structure yet

in App-folder: git init

like xlwings, openpyxl, pywin32

delete not possible modules	in cloud

login to heroku (login page opens in the browser)

heroku login

connect to the heroku app

heroku git:remote -a HEROKU_APP_NAME

add files (like in git)

git add .

commit files (like in git

git commit -am "msg"

push the files to heroku for build (try master instead of main when not working)

git push heroku main

in Resources deselect both (otherwise the program will run more times per day)

on Heroku: deselect web + worker

Heroku Scheduler for scheduling the app, Papertrail for logs

on Heroku: add Add-ons

Schedule the app run times

on Heroku / Scheduler

Show which dynos are running currently

heroku ps -a appName

run the app / script which is hosted on heroku (only possible when no dyno is running)

heroku run -a <appName> python <prg.py>

see all running dynos

heroku ps -a <appName>

stop the running dyno with name "scheduler.6436"

heroku ps:stop scheduler.6436 -a <appN>


using TA-Lib on heroku

remove TA-Lib from requirtements.text
Described here: https://stackoverflow.com/questions/43453953/how-to-install-python-library-in-heroku
GitDoku: https://github.com/numrut/heroku-buildpack-python-talib
use this commands instead:
heroku buildpacks:add --index 1 heroku/python
heroku buildpacks:add --index 2 numrut/ta-lib
schedule the running time with addon Heroku Scheduler