Deep Learning based search engine - Jobindex.DK case

Abstract

After scrapping the data from jobindex.dk - Denmark's biggest job portal - of 4.2 million jobs, a set of different Natural Language Processing techniques and Machine Learning models were applied to the data. Specifically, TFIDF and BERT models were applied on top of a direct search (no AI).

The goal was to improve search results on a non-English data set, which was achieved, especially with BERT.

Structure

The search engine is structured into two parts:

Python script: preprocessing of text, generation of embeddings, distance calculation and file exporting.
Jupyter notebook: Determine BERT distances and present results.

Run

To generate a preprocessed dataset:

python src --process '/data/interim/jobindex_cropped_bigger.csv'

To get recommendations (a preprocessed dataset has to be generated beforehand), run the search_engine_results.ipynb notebook.

Scripts

To lint code, run:

./scripts/lint-code.sh

To start notebooks, run:

./scripts/start_notebooks.sh

Name		Name	Last commit message	Last commit date
Latest commit History 111 Commits
.github/workflows		.github/workflows
docs		docs
models		models
notebooks		notebooks
references		references
reports		reports
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyvenv.cfg		pyvenv.cfg
requirements.txt		requirements.txt
setup.py		setup.py
test_environment.py		test_environment.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep Learning based search engine - Jobindex.DK case

Abstract

Structure

Run

Scripts

About

Contributors 2

Languages

License

pmadruga/ds-jobindex

Folders and files

Latest commit

History

Repository files navigation

Deep Learning based search engine - Jobindex.DK case

Abstract

Structure

Run

Scripts

About

Topics

Resources

License

Stars

Watchers

Forks

Contributors 2

Languages