-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Duplicate Fires Crawled #23
Comments
What is the name of the fire please? |
from
from
from
|
Fire table is supposed to have records with the same names since the id is the primary key. |
thanks. |
I just checked the table, and there is an issue that Paradise west is created multiple times. I will look into it this weekend. |
Thanks. Is it hard to clean the data that is corrupted (duplicated)? I assume we can just delete the corresponding records and then rerun the fixed crawler? |
No. I will drop them and recrawl after fix it. |
Maybe we can discuss more about the details of the strategy to merge fires? seems right now it is a static separate days threshold? |
Right now it is not. Every page in the gov website is a merged fire, it crawls the website and gives it an id, then fire with the id is the merged fire. |
maybe it's better to do a F2F discussion? |
Yes, but I don't have time today. I can do it tomorrow |
No urgent. Let's move the discussion to slack, and please schedule a meeting with me if possible. |
any updates? |
There are multiple entries with the same fire name in the database. related to Fire data runnable.
@ScarlettZ98 can you check please?
The text was updated successfully, but these errors were encountered: