Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NL: Handle overlapping stopwords better! #2953

Open
pradh opened this issue Jul 18, 2023 · 0 comments
Open

NL: Handle overlapping stopwords better! #2953

pradh opened this issue Jul 18, 2023 · 0 comments
Labels
nl Issues dealing with the Data Commons NL interface

Comments

@pradh
Copy link
Contributor

pradh commented Jul 18, 2023

Two failure scenarios:

  1. [how many high schools are in sunnyvale] will use 'high' to mean ranking

  2. Also, "high" gets stripped as a stop-word.

@pradh pradh added the nl Issues dealing with the Data Commons NL interface label Jul 18, 2023
pradh added a commit that referenced this issue Jul 18, 2023
* The initial bad-word list is located
[here](https://storage.mtls.cloud.google.com/datcom-website-config/nl_bad_words.txt).

* Also, avoid school types from being considered as stop-words. There
were two other ripple effects to this change:
1. It also requires special-handling for fallback logic (to not say
"schools in sunnyvale" => "sunnyvale")
2. It also requires not regressing the demo query [how big are public
schools in sunnyvale]

Note: making the schools change also uncovered
#2953. This whole
stop-word removal business needs streamlining! Post fishfood maybe, and
as part of fixing 2853.
 
Screenshot


![image](https://github.com/datacommonsorg/website/assets/4375037/55a824e5-2ed3-4358-b928-e421cb9dd99f)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
nl Issues dealing with the Data Commons NL interface
Projects
None yet
Development

No branches or pull requests

1 participant