-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue elimination - database migration #1011
Comments
Just to say we now have a volunteer for the tagging work, so if you can generate the data for this I can get them started. They'll also be working on dealing with variant tagging, which would be good to clean up. Is it easy to generate a list of all tags and their frequency? That should hopefully enable us to determine similar tags. |
To get the issues with threads where neither the issue or the thread have tags I've used: File.open("issues_without_tags.txt", "w") do |file|
Issue.joins(:threads).where(<<~SQL
not exists (select 1 from issue_tags where issue_tags.issue_id = issues.id)
and not exists (select 1 from message_thread_tags where message_thread_tags.thread_id = message_threads.id)
SQL
).distinct.select("issues.id, issues.title").find_each { |iss| file << "https://www.cyclescape.org/issues/#{iss.to_param}\n" }
end I'd like to add a validation to ensure all future message threads have tags so we don't get new ones. Given the issue might be tagged I could do something like pre-fill the new thread's tags with the issues ones if you want but also starting with a blank doesn't seem unreasonable. For issues with multiple threads: headers = ["Issue URL", "Tags on the issue only"]
CSV.open("issues_with_multiple_threads.csv", "w", write_headers: true, headers: headers) do |csv|
Issue.where(id: iss_ids).left_joins(:tags).group(:id, :title).select("issues.id, title, json_agg(tags.name) as tag_names").find_each { |iss| csv << ["https://www.cyclescape.org/issues/#{iss.to_param}", iss.tag_names] }
end; issues_with_multiple_threads.csv Finally the top tags: headers = ["Tag name", "Tag frequency (including issues, threads and library items)"]
CSV.open("tag_with_freq.csv", "w", write_headers: true, headers: headers) do |csv|
Tag.top_tags_fresh(100000).map { |tg| csv << [tg.name, tg.tag_count] }
end |
Could we get the counts for those added to the stats page somewhere so I can keep an eye on that as we work through it at this end? I don’t mind running the queries on the DB manually from time to time to get the data output but a quick view for the three stats would be helpful.
Yes, definitely. Soon as we have the current data cleaned up, we should apply that to editing also so that the constraint is in before we do the main site migration.
I see we have some tags with zero references to them. I suggest we zap those as they aren't useful and will just create autocomplete noise. |
From #1011 (comment) can probably be removed soon.
I've added a new admin pages with the untagged issues and issues with multiple threads at I'll add the validation once all issues have tags otherwise it will get confusing. I've removed the tags with no references, I'll make an issue to check this every so often. |
Issues have voting 👍 and 👎 whereas threads have favourites ⭐ . We have 912 instances where a user has 👍 or 👎 and issue. Do we just ignore these? It matters for "popular" part of the new discussions design. |
Popular discussions is simply the discussions with the most messages in. Favourites are purely just a booking mechanism for the user. They are not used in determining popularity. Other users cannot see who has favourited what. Issues have voting, but you can basically entirely ignore that now. That data was never very interesting and will just fade away. |
Action: Nikolai to make an admin page to help this be worked through more quickly. |
This report now has zero hits, so it would be safe to start enforcing at the database level that an issue/thread must have tags. |
We discussed in depth the data migration for the elimination of issues to leave discussions only.
NB: Issue is a container but is not actually heavily used in the logic, so we can leave it lying about in parallel if necessary.
For each thread:
We add to the thread properties:
geometry
tags - merged and uniqued
We make a new first message, consisting of the following from the issue:
"Describe the problem" as message
Any photo as attachment
Any link as attachment
Any deadline as attachment
Issue creator as the person
Issue created_at as the time
The issue title is not used.
At end of message could inject a new paragraph:
[This message was auto-created from an overall description of this issue that was created in an earlier version of this website.]
If you can detect identical content between issue and first message then don't add the first message.
Data cleansing:
The text was updated successfully, but these errors were encountered: