Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make data directory parent configurable #8

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

bgs4free
Copy link

Added environment variable to set a custom parent directory for wiki-dataset and txtai-wikipedia.

Added environment variable to set a custom parent directory for
wiki-dataset and txtai-wikipedia.
@SomeOddCodeGuy
Copy link
Owner

I really like the concept of what you're doing here; honestly I should have thought to make this configurable before.

One question I do have: what do you think about, instead of os env variables, going with a command line argument for when you call the API? Since this is more of a one-off app, I wasn't sure if folks might feel more comfortable doing that than adding env variables for it.

The python code might look something like this:

import argparse

parser = argparse.ArgumentParser(description="Offline Wikipedia Text API")
parser.add_argument(
    "--database_dir",
    default=".",
    help="Base directory containing the wiki-dataset and txtai-wikipedia folders."
)
args = parser.parse_args()

DATABASE_DIR = args.database_dir
WIKI_DATASET_DIR = os.path.join(DATABASE_DIR, "wiki-dataset", "train")
TXT_AI_DIR = os.path.join(DATABASE_DIR, "txtai-wikipedia")

With this, you'd call

python start_api.py --database_dir "C:\temp\wikidatabases"

For the bat file, so that we can capture it and pass it along to the python script, we already do something similar in Wilmer- our .bat file might look like this:

@echo off
setlocal

:: Step 0: Parse any arguments
set "DATABASE_DIR="

:parse_args
if "%~1"=="" goto :done
if /i "%~1"=="--database_dir" (
    set "DATABASE_DIR=%~2"
    shift
)
shift
goto :parse_args

:done

...

:: If the user added --database_dir, forward it to start_api.py
python start_api.py --database_dir "%DATABASE_DIR%"

Then we'd call the bat file with

start_api.bat --database_dir "C:\temp\wikidatabases"

For the .sh file, it might look like:

#!/bin/bash

# Step 0: Parse any arguments we care about
DATABASE_DIR=""
OTHER_ARGS=()

while [[ $# -gt 0 ]]; do
  case $1 in
    --database_dir)
      DATABASE_DIR="$2"
      shift 2
      ;;
    *)
      # For any unrecognized args, store them to pass through
      OTHER_ARGS+=("$1")
      shift
      ;;
  esac
done

...

echo "API Starting..."

python3 start_api.py --database_dir "$DATABASE_DIR" "${OTHER_ARGS[@]}"

Thoughts? It expands the amount of code you'd add in the PR, but I feel like less tech savvy folks or folks just wanting to run this temporarily on a computer might feel more comfortable just adding --database_dir to the command line than an env variable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants