Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Flask Web App interface #29

Open
wants to merge 12 commits into
base: flask_app
Choose a base branch
from
Open

Conversation

fhallee
Copy link
Contributor

@fhallee fhallee commented Jan 27, 2025

This PR introduces a web app to host the CityLex database, allowing users to access the data through a user-friendly interface. The app is functional and allows users to query and download data in long TSV format.

Next steps:

  • CELEX Integration: CELEX data has not yet been incorporated. We discussed adding password protection for CELEX data, but this functionality is not yet implemented.
  • Additional output formats: We could implement short format (join on word) TSV and SQLite dump options.
  • Feature script: features.py doesn’t convert the format for all tags and needs to be updated.

Copy link
Collaborator

@kylebgorman kylebgorman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @fhallee, since you'll be back working on this before terribly long, I just gave you comments here, since I probably won't be dedicating much time to this in the next few weeks.

@@ -14,6 +14,8 @@
import pandas # type: ignore
import requests

from citylex.features import tag_to_tag
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use module-level imports, thus from citylex import features?

cursor.execute(
"""
INSERT INTO morphology (
INSERT INTO features (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just want to make sure I understand what you did here. It seems like you're still preserving the idea of features as a table with multiple sources, but when the source is CELEX you also convert that to UD and UM tags, right? And so on for other sources.

If that's right, is there any argument for doing that at DB creation time vs. on the fly as needed? I guess I could see either option as viable, but I probably would have done it on the fly.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If for sake of review you want to suppress this new feature, I'd be fine with that. We can come back to it later.

@@ -490,7 +495,7 @@ def main():
parser = argparse.ArgumentParser(description="Creates a CityLex lexicon")
parser.add_argument(
"--db_path",
default="citylex.db",
default="data/citylex.db",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to suggest at this point that we just hardcode this; put DB_PATH = "data/citylex.db" at the top of the program (but after the imports). Do we ever want more than one copy?


from flask import Flask, render_template, request, send_file

def get_args():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also suggest we hardcode the path here.

cursor = conn.cursor()

if request.method == "GET":
return render_template('index.html')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"index.html"; don't mix ' and " without good reason

us_columns.append("raw_frequency")
if "subtlexus_freq_per_million" in selected_fields:
us_columns.append("freq_per_million")
us_query = f"SELECT {', '.join(us_columns)} FROM frequency WHERE source = 'SUBTLEX-US'"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if you did this once (and not again on lines 85) by getting rid of the WHERE clause, then you sort on the basis of the result iterator, determining where it goes based on the source column?


writer.writerow(columns) # Writes header

# Fetches and writes SUBTLEX-US data
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think from here on this is a bit over-commented, with most of them noting obvious details.

us_columns.append("freq_per_million")
us_query = f"SELECT {', '.join(us_columns)} FROM frequency WHERE source = 'SUBTLEX-US'"
cursor.execute(us_query)
us_results = cursor.fetchall()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to fetchall, that's going to pull in megabytes of data all at once when you only need the data incrementally. You can just say for row in cursor: I think...

output.seek(0) # Moves the cursor to the beginning of the file

# Sends the file as a response
return send_file(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice; didn't know you could do this quite like that.

row_dict = dict(zip(elp_columns, row))
writer.writerow([row_dict.get(col, '') for col in columns]) # Write ELP data

output.seek(0) # Moves the cursor to the beginning of the file
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood re: comment, but why is this necessary?

Copy link
Collaborator

@kylebgorman kylebgorman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just testing locally now. A few things:

  • Flask and dependencies need to be added to, minimally, requirements.txt. I'd probably pin it to use the current version of Flask (whatever you're testing on) and higher, since I bet it'll be forward-compatible for a while at least.
  • Since you moved the DB to the data subdirectory, a user who just calls, say, citylex --all-free will get an inscrutable error. But it's basically saying you don't have a subdirectory called data yet. I am wondering about whether we shouldn't move it after all...better to just create it in the user's working directory by default. Or you could make it mandatory to specify its location, if you'd rather take this the other way.
  • The website looks great (in minimalist terms) actually once I got it running. I don't think a ton of design work is necessary there.
  • There is a potential issue with the license buttons. If, e.g., "GNU GPL v3" is selected and I click on it, deselecting it, it will deselect all the GPL sources. That's WAI. But I sort of expected that if the reselected that button, it would select all the GPL sources, but it does nothing. Is that WAI or a bug?
  • The SQLite dump button doesn't seem to work but maybe you already knew that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants