-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Flask Web App interface #29
base: flask_app
Are you sure you want to change the base?
Conversation
Merge branch 'master' into flask_app Needed to merge the latest changes from the master branch to integrate the "features.py" script functionality into app.py# the commit.
…e format conversion in Flask app
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @fhallee, since you'll be back working on this before terribly long, I just gave you comments here, since I probably won't be dedicating much time to this in the next few weeks.
@@ -14,6 +14,8 @@ | |||
import pandas # type: ignore | |||
import requests | |||
|
|||
from citylex.features import tag_to_tag |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use module-level imports, thus from citylex import features
?
cursor.execute( | ||
""" | ||
INSERT INTO morphology ( | ||
INSERT INTO features ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just want to make sure I understand what you did here. It seems like you're still preserving the idea of features
as a table with multiple sources, but when the source is CELEX you also convert that to UD and UM tags, right? And so on for other sources.
If that's right, is there any argument for doing that at DB creation time vs. on the fly as needed? I guess I could see either option as viable, but I probably would have done it on the fly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If for sake of review you want to suppress this new feature, I'd be fine with that. We can come back to it later.
@@ -490,7 +495,7 @@ def main(): | |||
parser = argparse.ArgumentParser(description="Creates a CityLex lexicon") | |||
parser.add_argument( | |||
"--db_path", | |||
default="citylex.db", | |||
default="data/citylex.db", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to suggest at this point that we just hardcode this; put DB_PATH = "data/citylex.db"
at the top of the program (but after the imports). Do we ever want more than one copy?
|
||
from flask import Flask, render_template, request, send_file | ||
|
||
def get_args(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also suggest we hardcode the path here.
cursor = conn.cursor() | ||
|
||
if request.method == "GET": | ||
return render_template('index.html') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"index.html"
; don't mix ' and " without good reason
us_columns.append("raw_frequency") | ||
if "subtlexus_freq_per_million" in selected_fields: | ||
us_columns.append("freq_per_million") | ||
us_query = f"SELECT {', '.join(us_columns)} FROM frequency WHERE source = 'SUBTLEX-US'" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if you did this once (and not again on lines 85) by getting rid of the WHERE clause, then you sort on the basis of the result iterator, determining where it goes based on the source column?
|
||
writer.writerow(columns) # Writes header | ||
|
||
# Fetches and writes SUBTLEX-US data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think from here on this is a bit over-commented, with most of them noting obvious details.
us_columns.append("freq_per_million") | ||
us_query = f"SELECT {', '.join(us_columns)} FROM frequency WHERE source = 'SUBTLEX-US'" | ||
cursor.execute(us_query) | ||
us_results = cursor.fetchall() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to fetchall
, that's going to pull in megabytes of data all at once when you only need the data incrementally. You can just say for row in cursor:
I think...
output.seek(0) # Moves the cursor to the beginning of the file | ||
|
||
# Sends the file as a response | ||
return send_file( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice; didn't know you could do this quite like that.
row_dict = dict(zip(elp_columns, row)) | ||
writer.writerow([row_dict.get(col, '') for col in columns]) # Write ELP data | ||
|
||
output.seek(0) # Moves the cursor to the beginning of the file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood re: comment, but why is this necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just testing locally now. A few things:
- Flask and dependencies need to be added to, minimally,
requirements.txt
. I'd probably pin it to use the current version of Flask (whatever you're testing on) and higher, since I bet it'll be forward-compatible for a while at least. - Since you moved the DB to the
data
subdirectory, a user who just calls, say,citylex --all-free
will get an inscrutable error. But it's basically saying you don't have a subdirectory calleddata
yet. I am wondering about whether we shouldn't move it after all...better to just create it in the user's working directory by default. Or you could make it mandatory to specify its location, if you'd rather take this the other way. - The website looks great (in minimalist terms) actually once I got it running. I don't think a ton of design work is necessary there.
- There is a potential issue with the license buttons. If, e.g., "GNU GPL v3" is selected and I click on it, deselecting it, it will deselect all the GPL sources. That's WAI. But I sort of expected that if the reselected that button, it would select all the GPL sources, but it does nothing. Is that WAI or a bug?
- The SQLite dump button doesn't seem to work but maybe you already knew that.
This PR introduces a web app to host the CityLex database, allowing users to access the data through a user-friendly interface. The app is functional and allows users to query and download data in long TSV format.
Next steps: