GitHub - alessandrofuda/python-experiments: Python experiments

Python3 and ML experiments

Glossary

Likelihood = probability
supervised learning = we have already data/values (input/output) to train the model with train_test_split method
samples, sampling = campioni, campionamento
LLM = is a Language Model like, for example: GPT-3.5 by OpenAI (..there are much others LLM (ex. PaLM 2 by Google/Bard), also open source (LLaMA, BLOOM, Bert,...))
LangChain framework (for python) = stay between client and LLM/Vector DB. It's like a proxy before LLM. There are others too, like 4 ex: Llamaindex.
RAG = Retrieval Augmented Generation, approach that "merge" "universal" LLM knowledge with our specific pre-trained knowledge base (our specific texts, .csv, .pdf, ecc...)
Embeddings = results of conversion from texts to numbers comprehensible by the machine. They are vectors.
FAISS = openAi not provide space to STORE all embeddings/vectors generated --> so we can use FAISS to store it (by facebook). Think FAISS as a DB.

ML Process:

data collection (pd.read_csv('...csv'))
data exploration (df.shape, df.info(), df.describe() .plot() charts...)
data preparation
- check/resolve missing data (dropna(), fillna() ...),
- normalize data,
- split train/test data and input/output (x/y), split data 'horizontally'/'vertically'
  
  x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.20, random_state = 1234)
  
  Feature = Input = Independent Variables = x
  
  Target = Output = Dependent Variable (Only one) = y ==> TO PREDICT by Model

These 3 steps get 80% of the time

modeling (train model via train data)
- LinearRegression().fit(x_train, y_train)
evaluation (accuracy of the model comparing test run results with real data)
- model.score(x_test, y_test)
- y_pred = model.predict(x_test)

These 2 steps iterate to find model that have the best accuracy rate

actionable insight

Chatbot build process

Build initial knowledge base (pdf, csv, db, ecc...)
Convert all in embeddings (from texts to "numbered vectors" machine comprehensible, ex: [ -0.02026, -0.00698, -0.02565, -0.02634, ... ]). These "numbered vectors" calculate similarity between words.
store it in vector database (FAISS, in our example), named vectorstore too.
todo..

Other notes:

to launch jupyter jupyter notebook

Deploy without container

GUNICORN is a substitute of Flask ready to PROD env

$ pip install gunicorn==20.1.0 on server

$ gunicorn -w 2 -b 0.0.0.0:5000 app:app

[INFO] Starting gunicorn 20.1.0
[INFO] Listening at: http://0.0.0.0:5000 (1)
[INFO] Using worker: sync
[INFO] Booting worker with pid: 3
[INFO] Booting worker with pid: 4

DEPLOY in CONTAINER

In Nginx level: configure it as a Reverse Proxy, so, in example.conf:

#  ROUTING TO DOCKER CONTAINER
server {
        listen 80;
        listen 443 ssl;
        server_name python.example.it;
        
        location / {
                include proxy_params;
                proxy_pass http://localhost:5000;
        }
}

Test conf && restart nginx process:

$ sudo nginx -t 
$ sudo systemctl restart nginx

Build and Run docker image:

$ docker build -t test-python-app:1.0 .
$ docker images   # (--> to see generated image)
$ docker run -d -p 5000:5000 --rm --name "test-python-app" test-python-app:1.0

(adding -d to detach mode, adding --volumes if in dev)

Deploy flow

$ git pull

# IF there is '--volumes' on the app code --> Not necessary re-build container
# ELSE --> re-build container
$ docker build -t name-to-assign:TAG .

# restart container (inside: restart web server)
$ docker run -d -p...:... --rm --name .... image-name:TAG

see ./deploy.sh script (wip...)

Deploy using Docker-Compose

Configure all via docker-compose up

...todo...

Some other commands:

pip3 freeze > requirements.txt

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
.idea		.idea
.ipynb_checkpoints		.ipynb_checkpoints
data		data
lnkdn/exerc_files		lnkdn/exerc_files
projects		projects
1_fetch_data_info.py		1_fetch_data_info.py
1a_select_filter_data.py		1a_select_filter_data.py
2_split_csv.py		2_split_csv.py
3_generate_rand_int_array.py		3_generate_rand_int_array.py
4_linear_regression.py		4_linear_regression.py
5_data_visualization.py		5_data_visualization.py
6_combine_multiple_csv_in_one.py		6_combine_multiple_csv_in_one.py
7_classification_with_random_forest_alg.ipynb		7_classification_with_random_forest_alg.ipynb
8_classification_auto_assign_category.ipynb		8_classification_auto_assign_category.ipynb
9_text_classification_big_csv.ipynb		9_text_classification_big_csv.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Python3 and ML experiments

Glossary

ML Process:

Chatbot build process

Deploy without container

DEPLOY in CONTAINER

Deploy flow

Deploy using Docker-Compose

About

Releases

Packages

Languages

alessandrofuda/python-experiments

Folders and files

Latest commit

History

Repository files navigation

Python3 and ML experiments

Glossary

ML Process:

Chatbot build process

Deploy without container

DEPLOY in CONTAINER

Deploy flow

Deploy using Docker-Compose

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages