Skip to content

Commit

Permalink
Polars refactor (#14)
Browse files Browse the repository at this point in the history
* Replaces pandas with polars
* Refactored and merged some funcs, added tests
* Refactored requirements.txt
* Fixes error in viz sql
* Replaces format() with fstrings in most instances
* Updates README
  • Loading branch information
zenlan committed May 29, 2024
1 parent 43223d2 commit acb0e3c
Show file tree
Hide file tree
Showing 60 changed files with 410,860 additions and 381,987 deletions.
7 changes: 2 additions & 5 deletions .env.example
Original file line number Diff line number Diff line change
@@ -1,8 +1,5 @@
ORDS_DB_HOST=localhost
ORDS_DB_DATABASE=ords_test
ORDS_DB_COLLATION=utf8mb4_unicode_ci
ORDS_DB_USER=admin
ORDS_DB_PWD=admin
ORDS_DB_CONN="{'host': 'localhost','database': 'ords','user': 'admin','pwd': 'admin','collation': 'utf8mb4_unicode_ci'}"
ORDS_DB_TEST="{'host': 'localhost','database': 'ords_test','user': 'admin','pwd': 'admin','collation': 'utf8mb4_unicode_ci'}"
ORDS_DATA=OpenRepairData_v0.3_aggregate_202309
ORDS_CATS=OpenRepairData_v0.3_Product_Categories
DEEPL_KEY=
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ dat/ords/*.csv
dat/backup*.*
out/
tmp/
tmp.*
*.tmp
solr/solr-*
pyvenv.cfg
*.code-workspace
Expand Down
38 changes: 7 additions & 31 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,25 +63,23 @@ It is compiled and published by the [Open Repair Alliance (ORA)](https://openrep
* Python with venv module.
* Optional MySQL 8.x and libmysqlclient-dev.

### Working copy

```git clone [email protected]:openrepair/tools.git ./ords-tools```

### Virtual environment

```python3 -m venv ords-tools```

```cd ords-tools```

```source bin/activate```
```git init```

To install the requirements.
```git remote add origin [email protected]:openrepair/tools.git```

```pip install -r requirements.txt```
```git pull origin main```

To upgrade to the newest requirements.
```git branch --set-upstream-to=origin/main main```

```pip install -r requirements.txt --upgrade```
```source bin/activate```

```pip install -r requirements.txt```

### Data

Expand All @@ -93,8 +91,6 @@ Copy ```.env.example``` to ```.env``` and edit as necessary.

```.env``` is in .gitignore, do not add it to this repo.

[.env file documentation](https://saurabh-kumar.com/python-dotenv/#file-format)

## Links

### Repair data
Expand All @@ -107,18 +103,8 @@ Copy ```.env.example``` to ```.env``` and edit as necessary.

### Python

[Python and Virtual Environments](https://csguide.cs.princeton.edu/software/virtualenv#scm)

[Using Python environments in VS Code](https://code.visualstudio.com/docs/python/environments)

[Python](https://docs.python.org/)

[W3 Schools Python](https://www.w3schools.com/python/)

[Numpy](https://numpy.org/)

[Pandas](https://pandas.pydata.org/)

[Scikit-learn](https://scikit-learn.org/)

[Natural Language Processing Demystified](https://www.nlpdemystified.org/)
Expand All @@ -129,14 +115,4 @@ Copy ```.env.example``` to ```.env``` and edit as necessary.

[MySQL 8.0 Reference Manual](https://dev.mysql.com/doc/refman/8.0/en/)

[OpenRefine](https://openrefine.org/)

[R](https://www.r-project.org/)

[Apache OpenNLP](https://opennlp.apache.org/)

[Solr](https://solr.apache.org/)

[Data-Driven Documents (D3)](https://d3js.org/)

[Orange](https://orangedatamining.com/)
273,672 changes: 141,667 additions & 132,005 deletions dat/ords_poetry_lines.csv

Large diffs are not rendered by default.

10 changes: 10 additions & 0 deletions funcs/README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,13 @@
# About

A scrappy set of wrapper functions that alleviate some of the most tedious code repetition.

## Tests

Run all tests.

`$ python3 -m unittest discover tests/`

Run one test, e.g.

`python3 -m unittest tests/testTextFuncs.py`
11 changes: 10 additions & 1 deletion funcs/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1,10 @@
__all__ = ["pathfuncs", "envfuncs", "logfuncs", "dbfuncs", "datefuncs", "miscfuncs", "textfuncs", "deeplfuncs"]
__all__ = [
"cfg",
"pathfuncs",
"dbfuncs",
"datefuncs",
"miscfuncs",
"ordsfuncs",
"textfuncs",
"deeplfuncs",
]
54 changes: 54 additions & 0 deletions funcs/cfg.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
import os
import ast
import logging
from dotenv import load_dotenv
load_dotenv()

if not os.path.exists("dat"):
os.mkdir("dat")
if not os.path.exists("dat/ords"):
os.mkdir("dat/ords")
if not os.path.exists("log"):
os.mkdir("log")
if not os.path.exists("out"):
os.mkdir("out")

ROOT_DIR = os.path.realpath(os.path.join(os.path.dirname(__file__), ".."))
DATA_DIR = os.path.join(ROOT_DIR, "dat", "")
ORDS_DIR = os.path.join(ROOT_DIR, "dat/ords")
LOG_DIR = os.path.join(ROOT_DIR, "log", "")
OUT_DIR = os.path.join(ROOT_DIR, "out", "")

def init_logger(caller):

filename, file_ext = os.path.splitext(os.path.basename(caller))
path = os.path.join(LOG_DIR, filename + '.log')
logger = logging.getLogger()
logger.setLevel(logging.DEBUG)
fh = logging.FileHandler(path, mode='w')
fh.setLevel(logging.DEBUG)
logger.addHandler(fh)
return logger

def get_envvar(key):

if key in os.environ:
return os.environ[key]
else:
print('ERROR! {} NOT FOUND!'.format(key))
return False

def get_dbvars(con="ORDS_DB_CONN"):

try:
dbstr = os.environ.get(con)
dbdict = ast.literal_eval(dbstr)
return dbdict
except Exception as error:
print("Exception: {}".format(error))
return False

def get_version():

return "0.0.1"

Loading

0 comments on commit acb0e3c

Please sign in to comment.