Skip to content

Wilsven/healthhub-content-optimization

Repository files navigation

Content Optimization for HealthHub

This repository contains the content optimization for HealthHub.

Install Dependencies

Anaconda (Recommended)

You can download the Anaconda Distribution for your respective operating system here. You may also find out how to get started with Anaconda Distribution here. To verfiy your installation, you can head to the Command Line Interface (CLI) and run the following command:

conda list

You should see a list of packages installed in your active environment and their versions displayed. For more information, refer here.


Once set up, create a virtual environment using conda and install dependencies:

# Create a virtual environment
conda create -n <VENV_NAME> python=3.12 -y
conda activate <VENV_NAME>

# Install dependencies
pip install -r requirements.txt

Poetry

Refer to the documentation here (recommended) on how to install Poetry based on your operating system.

Important

For Mac users, if encountering issues with poetry command not found, add export PATH="$HOME/.local/bin:$PATH" in your .zshrc file in your home folder and run source ~/.zshrc.


First create a virtual environment by running the following commands:

poetry shell

Tip

If you see the following error; The currently activated Python version 3.11.7 is not supported by the project (^3.12). Trying to find and use a compatible version., run:

poetry env use 3.12.3  # Python version used in the project

To install the defined dependencies for your project, just run the install command. The install command reads the pyproject.toml file from the current project, resolves the dependencies, and installs them.

poetry install

Warning

If you face an error installing gensim with poetry, run this command:

poetry run python -m pip install gensim --disable-pip-version-check --no-deps --no-cache-dir --no-binary gensim

If there is a poetry.lock file in the current directory, it will use the exact versions from there instead of resolving them. This ensures that everyone using the library will get the same versions of the dependencies.

If there is no poetry.lock file, Poetry will create one after dependency resolution.

Tip

It is best practice to commit the poetry.lock to version control for more reproducible builds. For more information, refer here.

File Structure

The exploratory/experimental code for content optimization is stored in the notebooks/ folder.

Usage

To run the notebooks, you can use the runner.ipynb or runner_statistical.ipynb:

# runner.ipynb

import papermill as pm
from logger import logger

pm.inspect_notebook("<INPUT_NOTEBOOK>")  # inspects and outputs the notebook's parameters

pm.execute_notebook(
    input_path="<INPUT_NOTEBOOK>",  # input notebook path
    output_path="<OUTPUT_NOTEBOOK>",  # output notebook path
    parameters={...},  # parameters to be passed to the notebook in a dictionary

Pushing to GitHub

Warning

Refrain from pushing into main branch directly — it is bad practice. Always create a new branch and make your changes on your new branch.

Every time you complete a feature or change on a branch and want to push it to GitHub to make a pull request, you need to ensure you lint your code.

You can simply run the command pre-commit run --all-files to lint your code. For more information, refer to the pre-commit docs. To see what linters are used, refer to the .pre-commit-config.yaml YAML file.

Alternatively, there is a Makefile that can also lint your code base when you run the simpler command make lint.

You should ensure that all cases are satisfied before you push to GitHub (you should see that all has passed). If not, please debug accordingly or your pull request may be rejected and closed.

The lint.yml is a GitHub workflow that kicks off several GitHub Actions when a pull request is made. These GitHub Actions check that your code have been properly linted before it is passed for review. Once all actions have passed and the PR approved, your changes will be merged to the main branch.

Note

The pre-commit will run regardless if you forget to explicitly call it. Nonetheless, it is recommended to call it explicitly so you can make any necessary changes in advanced.