Issue elimination - database migration #1011

mvl22 · 2022-01-05T22:47:49Z

We discussed in depth the data migration for the elimination of issues to leave discussions only.

NB: Issue is a container but is not actually heavily used in the logic, so we can leave it lying about in parallel if necessary.

For each thread:

We add to the thread properties:
geometry
tags - merged and uniqued
We make a new first message, consisting of the following from the issue:
"Describe the problem" as message
Any photo as attachment
Any link as attachment
Any deadline as attachment
Issue creator as the person
Issue created_at as the time
The issue title is not used.

At end of message could inject a new paragraph:
[This message was auto-created from an overall description of this issue that was created in an earlier version of this website.]

If you can detect identical content between issue and first message then don't add the first message.

Data cleansing:

MLS and volunteers will add tags to issue/thread where there currently is none, so we can enforce tag being a required field
If an issue has multiple threads, MLS and volunteers will add a curated tag representing the issue, e.g. bradford-canal-route for an issue "Canal route could be upgraded"

mvl22 · 2022-01-15T22:07:43Z

Just to say we now have a volunteer for the tagging work, so if you can generate the data for this I can get them started.

They'll also be working on dealing with variant tagging, which would be good to clean up. Is it easy to generate a list of all tags and their frequency? That should hopefully enable us to determine similar tags.

nikolai-b · 2022-01-16T22:08:04Z

To get the issues with threads where neither the issue or the thread have tags I've used:

File.open("issues_without_tags.txt", "w") do |file|
  Issue.joins(:threads).where(<<~SQL
    not exists (select 1 from issue_tags where issue_tags.issue_id = issues.id) 
    and not exists (select 1 from message_thread_tags where message_thread_tags.thread_id = message_threads.id)
  SQL
  ).distinct.select("issues.id, issues.title").find_each { |iss| file << "https://www.cyclescape.org/issues/#{iss.to_param}\n" }
end

issues_without_tags.txt

I'd like to add a validation to ensure all future message threads have tags so we don't get new ones. Given the issue might be tagged I could do something like pre-fill the new thread's tags with the issues ones if you want but also starting with a blank doesn't seem unreasonable.

For issues with multiple threads:

headers = ["Issue URL", "Tags on the issue only"]
CSV.open("issues_with_multiple_threads.csv", "w", write_headers: true, headers: headers) do |csv|
  Issue.where(id: iss_ids).left_joins(:tags).group(:id, :title).select("issues.id, title, json_agg(tags.name) as tag_names").find_each { |iss| csv << ["https://www.cyclescape.org/issues/#{iss.to_param}", iss.tag_names] }
end;

issues_with_multiple_threads.csv

Finally the top tags:

headers = ["Tag name", "Tag frequency (including issues, threads and library items)"]
CSV.open("tag_with_freq.csv", "w", write_headers: true, headers: headers) do |csv|
  Tag.top_tags_fresh(100000).map { |tg| csv << [tg.name, tg.tag_count] }
end

tag_with_freq.csv

mvl22 · 2022-01-17T17:54:56Z

Could we get the counts for those added to the stats page somewhere so I can keep an eye on that as we work through it at this end? I don’t mind running the queries on the DB manually from time to time to get the data output but a quick view for the three stats would be helpful.

I'd like to add a validation to ensure all future message threads have tags so we don't get new ones.

Yes, definitely. Soon as we have the current data cleaned up, we should apply that to editing also so that the constraint is in before we do the main site migration.

Finally the top tags

I see we have some tags with zero references to them. I suggest we zap those as they aren't useful and will just create autocomplete noise.

From #1011 (comment) can probably be removed soon.

nikolai-b · 2022-01-17T22:06:01Z

I've added a new admin pages with the untagged issues and issues with multiple threads at /admin/home

I'll add the validation once all issues have tags otherwise it will get confusing.

I've removed the tags with no references, I'll make an issue to check this every so often.

nikolai-b · 2022-02-19T22:41:10Z

Issues have voting 👍 and 👎 whereas threads have favourites ⭐ . We have 912 instances where a user has 👍 or 👎 and issue. Do we just ignore these? It matters for "popular" part of the new discussions design.

mvl22 · 2022-02-19T22:50:48Z

Issues have voting 👍 and 👎 whereas threads have favourites ⭐ . We have 912 instances where a user has 👍 or 👎 and issue. Do we just ignore these? It matters for "popular" part of the new discussions design.

Popular discussions is simply the discussions with the most messages in.

Favourites are purely just a booking mechanism for the user. They are not used in determining popularity. Other users cannot see who has favourited what.

Issues have voting, but you can basically entirely ignore that now. That data was never very interesting and will just fade away.

mvl22 · 2022-08-08T20:25:16Z

To get the issues with threads where neither the issue or the thread have tags I've used

Action: Nikolai to make an admin page to help this be worked through more quickly.

mvl22 · 2022-09-04T15:57:12Z

To get the issues with threads where neither the issue or the thread have tags

/stats/issues_untagged

This report now has zero hits, so it would be safe to start enforcing at the database level that an issue/thread must have tags.

mvl22 added the new-design-migration label Jan 5, 2022

nikolai-b added a commit that referenced this issue Jan 17, 2022

Add pages to help issues removal to admin

e4cf546

From #1011 (comment) can probably be removed soon.

nikolai-b mentioned this issue Jan 18, 2022

Deadline not shown when creating issue #1009

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue elimination - database migration #1011

Issue elimination - database migration #1011

mvl22 commented Jan 5, 2022

mvl22 commented Jan 15, 2022 •

edited

Loading

nikolai-b commented Jan 16, 2022 •

edited

Loading

mvl22 commented Jan 17, 2022 •

edited

Loading

nikolai-b commented Jan 17, 2022 •

edited

Loading

nikolai-b commented Feb 19, 2022

mvl22 commented Feb 19, 2022

mvl22 commented Aug 8, 2022

mvl22 commented Sep 4, 2022 •

edited

Loading

Issue elimination - database migration #1011

Issue elimination - database migration #1011

Comments

mvl22 commented Jan 5, 2022

mvl22 commented Jan 15, 2022 • edited Loading

nikolai-b commented Jan 16, 2022 • edited Loading

mvl22 commented Jan 17, 2022 • edited Loading

nikolai-b commented Jan 17, 2022 • edited Loading

nikolai-b commented Feb 19, 2022

mvl22 commented Feb 19, 2022

mvl22 commented Aug 8, 2022

mvl22 commented Sep 4, 2022 • edited Loading

mvl22 commented Jan 15, 2022 •

edited

Loading

nikolai-b commented Jan 16, 2022 •

edited

Loading

mvl22 commented Jan 17, 2022 •

edited

Loading

nikolai-b commented Jan 17, 2022 •

edited

Loading

mvl22 commented Sep 4, 2022 •

edited

Loading