-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME
105 lines (86 loc) · 3.8 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
Setup
You will need to add your aws credentials in a file located at ~/.aws/credentials (create it if it does not exist. Add the following to the file:
[bechdel]
aws_access_key_id = AKIAJTRXJSBJADGLOBAQ
aws_secret_access_key = <secret key>
How to run program
1) Open command prompt
2) paste the following command and press enter: cd C:\Users\w1515247\golang\src\women-in-media-article-entity-analysis\cmd\cli
2.5) paste the following command and press enter: cd C:\Users\w1515247\golang\src\women-in-media-article-entity-analysis\cmd\git pull to check for updates
2.6) If you get a message along the lines of "Please commit your changes or stash them before you merge. Aborting", as long as you haven't made any code changes, you should be safe to run
"git stash" (which will wipe your changes) followed by git pull. WARNING: this will delete any files containing analysis that have ended up in the directory, so make sure you have copied this
somewhere else if you want to keep it going forwards.
3) make sure that the query condition in cmd/query_condition.sql returns the articles that you want to analyse
4) Click "File -> Save all" in Atom
4) paste the following command and press enter: go build && ./cli -runType=<run type> -manuallyCorrectGender=<true or false>
in the above statement, you will need to replace <run type> with ENTITIES_AND_GENDER, JUST_ENTITIES or JUST_GENDER, and <true or false> with either true or false
This will analyse all articles that match the query.
To see the results (for now), these commands will be useful
/* see all the entities */
select * from public.article_entities
where <inset conditions>
/* see all the entities with their gender */
SELECT *, n.gender
from public.article_entities a
join public.names n
on a.text = n.name
where <inset conditions>
/* see the gender count */
select n.gender, count(*)
from public.article_entities a
join public.names n
on a.text = n.name
where <inset conditions>
group by 1
/* see the entities without a gender */
select * from
public.article_entities a
join public.names n
on a.text = n.name
where <inset conditions>
and n.gender = ''
/* see summary of all the articles, their counts and their entities */
SELECT article.id,
author.name as journlist_name,
n2.gender as journalist_gender,
count(*) FILTER (WHERE n.gender = 'Male') as men_count,
count(*) FILTER (WHERE n.gender = 'Female') as female_count,
array_agg(json_build_object('text', coalesce(text, ''), 'gender', n.gender, 'nextWord', nextWord, 'score', coalesce(score, 0)))
FROM article article
LEFT join article_entities ae
ON article.id = ae.article_id
LEFT join names n
ON ae.text = n.name
left join author_attr aa
ON article.id = aa.article_id
LEFT join author
ON aa.author_id = author.id
LEFT join names n2
ON author.name = n2.name
where <inset conditions>
/* see oveall summary of the analysis */
with analysis as (SELECT article.id,
author.name as journlist_name,
n2.gender as journalist_gender,
count(*) FILTER (WHERE n.gender = 'Female') as number_of_women,
count(*) FILTER (WHERE n.gender = 'Male') as number_of_men,
array_agg(json_build_object('text', coalesce(text, ''), 'gender', n.gender, 'nextWord', nextWord, 'score', coalesce(score, 0)))
FROM article article
LEFT join article_entities ae
ON article.id = ae.article_id
LEFT join names n
ON ae.text = n.name
left join author_attr aa
ON article.id = aa.article_id
LEFT join author
ON aa.author_id = author.id
LEFT join names n2
ON author.name = n2.name
WHERE <insert condition>
group by 1,2,3)
select count(*) as number_of_articles,
count(*) FILTER (WHERE journalist_gender = 'Female') as female_journalise_count,
count(*) FILTER (WHERE journalist_gender = 'Male') as male_journalise_count,
sum(number_of_men) as overall_men_count,
sum(number_of_women) as overall_women_count
from analysis