-
Notifications
You must be signed in to change notification settings - Fork 1
/
subjectivity.Rmd
2104 lines (1602 loc) · 179 KB
/
subjectivity.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
# Deliberative Subjectivities {#subjectivity}
> He who knows only his own side of the case knows little of that.
> His reasons may be good, and no one may have been able to refute them.
> But if he is equally unable to refute the reasons on the opposite side, if he does not so much as know what they are, he has no ground for preferring either opinion [...].
> He must be able to hear them from persons who actually believe them [...] he must know them in their most plausible and persuasive form.
>
> -–- John Stuart @Mill-1959
<!-- 1. Factor analysis (Q method) of individual q-sorts to investigate metaconsensus.
2. Correlations between factor loadings of individuals to investigate intersubjective rationality (small n problem)
3. Secondary factor analysis of factor array, with individual factors as "cases" and items as variables.
4. Correlations between factor loadings and other gathered data (SES, etc.) (small n problem) -->
<!-- - reports correlations, factors from pre- and post-sort. -->
<!-- - *factor interpretation* in substantive terms of tax and the economy. -->
<!-- TODO decommodification factor = scale dependence
one way to read the decommodification factor is also that it kinda revolts against scales that are too big
because it's not that people didn't *understand* the issue of, say treating renters and homeowners alike
it's that they understood it, but still didn't want it
they just wanted to be left alone by the abstractions of Society -->
## Administration
<!--```{r administration, child='keyneson/keyneson.wiki/q-administration.md'}```-->
<!-- TODO MH: paste condition of instruction? -->
<!-- TODO MH include data-gathering here, too -->
<!-- Notice that some people complained about sorting several items, but that 0 is in fact an appropriate value.
For more on what 0 means to q, see Brown 1980 199, also note the flac I got on the correlation coefficient email on the meaning of 0. -->
>"Factor analysis (...) is concerned with a population of n individuals each of whom has been measured in m tests or other instruments.
>The (...) correlations for these m variables are subjected to (...) factor analysis.
>But this technique (...) can also be inverted.
>We may concern ourselves with a population of N different tests (or other items), each of which is measured or scaled relatively, by M individuals."
>-- Stephenson (1936a: 334)
- "factor analysis with the data table turned sideways"; persons are *variables* and traits/tests/statements/abilities are the *sample* or *population*.
- looks at
## Data Import
All citizens as well as the researcher and the two moderators completed two Q sorts each, one at the beginning and one at the end of the conference.
At its heart, Q methodology involves factor analyses and similar data reduction techniques, procedures that are widely used and available in all general-purpose statistics programs.
However, Q methodology also requires some specialized operations, not easily accomplished in general-purpose programs.
The transposed correlation matrix, flagging participants and compositing weighted factor scores in particular, are hard or counter-intuitive to do in mainstream software.
Counter to many Q studies, this research features several conditions (before, after), groups of participants (citizens, moderators, researcher), types of items (values, beliefs, preferences) as well as an extended research question, all of which will need to be analyzed systematically.
Running and documenting all these variations will be next to impossible without programmatic extensibility of the software used.
## Import
```{r import-q-sorts, echo = FALSE, warning = FALSE}
q_sorts <- import.q.sorts(
q.sorts.dir = "schumpersorts/qsorts/",
q.set = q_set,
q.distribution = q_distribution,
conditions = c("before","after"),
manual.lookup = as.matrix(
read.csv(
"keyneson/ids.csv",
row.names=2
)
),
header = FALSE
)
names(dimnames(q_sorts)) <- c("items", "people", "conditions")
```
## Participant Feedback
```{r import-q-feedback, echo = FALSE}
#TODO all this importing stuff should be done together with exporting to dataverse in a convenient way, maybe as an R object after all
q_feedback <- import.q.feedback(
q.feedback.dir = "schumpersorts/feedback/",
q.sorts = q_sorts,
q.set = q_set,
manual.lookup = as.matrix(
read.csv(
"keyneson/ids.csv",
row.names=2
)
)
)
```
## Missing Data
```{r dropped-participants, echo = FALSE}
#q_sorts <- q_sorts[ , !colnames(q_sorts) == "Wolfgang", ] # delete researcher
q_sorts <- q_sorts[ , !colnames(q_sorts) == "Uwe", ] # left conference for personal reasons
#q_feedback <- q_feedback[ , !colnames(q_feedback) == "Wolfgang", ] # delete researcher
q_feedback <- q_feedback[ , !colnames(q_feedback) == "Uwe", ] # incomplete
```
```{r rda export, eval = FALSE}
# this is because we needed it in another package
civicon_2014 <- NULL
library(tibble)
library(lettercase)
library(tibble)
lookup <- tibble(handle = make_names(rownames(lookup.table)),
id_pretest = lookup.table[,1],
id_westfluegel = lookup.table[,2])
civicon_2014$QItems <- list(q_set = make_names(rownames(q_set)),
concourse = q_concourse,
lookup = lookup)
rownames(civicon_2014$QItems$concourse) <- make_names(rownames(q_concourse))
class(civicon_2014$QItems) <- c("QItems", "list")
civicon_2014$qData <- list(sorts = q_sorts)
civicon_2014$grid <- c("-7" = 1,
"-6" = 1,
"-5" = 2,
"-4" = 4,
"-3" = 6,
"-2" = 9,
"-1" = 10,
"0" = 11,
"1" = 10,
"2" = 9,
"3" = 6,
"4" = 4,
"5" = 2,
"6" = 1,
"7" = 1)
storage.mode(civicon_2014$grid) <- "integer"
library(devtools)
use_data(civicon_2014, pkg = "../pensieve/", overwrite = TRUE, compress = "bzip2")
```
<!-- complete import of `r ncol(q_sorts)`, three participants -->
```{r real-names, echo = FALSE, eval = TRUE}
real_names <- FALSE # this is for manually enabling renaming
if (real_names == TRUE & file.exists("../../Google Drive/CiviCon/Data/Codenames.csv")) { # renaming works only if (private) file is available and is set to true
codenames <- read.csv(file = "../../Google Drive/CiviCon/Data/Codenames.csv",header = TRUE) # read in codenames
for (name in colnames(q_sorts)) { # loop over names in q_sorts
colnames(q_sorts)[colnames(q_sorts)==name] <- as.character(codenames[codenames[,1]==name,2]) # assign original name
colnames(q_feedback)[colnames(q_feedback)==name] <- as.character(codenames[codenames[,1]==name,2]) # assign original name
}
warning("This rendering includes real names. Do not publish or pass around!")
}
```
## Q Method Analysis
Results chapters in quantitative research do not usually recount and justify every mathematical operation from raw data to final interpretation.
In reporting mainstream statistics, say, a linear regression (OLM) much in the way of axioms and preconditions is often taken for granted, though perhaps, sometimes too much.
Powerful computers, confirmation biases and time pressure for writers and readers alike may conspire to occasionally stretch thin the argumentative link between elementary probability theory and the results drawn from it.
<!-- TODO kill, make easier? -->
Q methodology is different, and requires a more careful, patient treatment.
While its mathematical core --- different methods of exploratory factor analysis (EFA) --- are well-rehearsed in mainstream quantitative social research, its application to Q is still strange to many.
<!-- TODO add quotes to EFA books here -->
1. *Transposed Data Matrix.*
Because Q method *transposes* the conventional data matrix, positing *people* as *variables*, and *items* as *cases*, all of the downstream concepts in data reduction, from covariance to eigenvalues, take on a different, Q-specific meaning, even if the mathematics stay the same.
<!-- TODO add brown quote for transposed -->
For this reason alone, it will be worth tracking every operation and grounding it in the unfamiliar epistemology of Q.
<!-- TODO note there are also more steps in Q; at the end you want item *scores* -->
2. *Marginalization and Controversy.*
Almost 80 years after William @Stephenson1935 suggested this *inversion* of factor analysis in his letter to *Nature*, Q is still an exotic methodology.
It sometimes invites scathing critiques [@Kampen-Tamas-2014], but more often, is outright ignored in mainstream outlets.
<!-- TODO cite lack in textbooks etc in tamas kampen -->
As the dynamics of marginalization go, the community of Q researchers, may, on occasion, have turned insular --- though not into the "Church of Q" @Tamas-Kampen-2015 fear.
<!-- TODO maybe add literatures for marginalization dynamics: groupthink, ingroup/outgroup dynamics, institutional inertia, path dependency -->
Q methodology may have sometimes shied away from exposing itself from rigorous criticism and disruptive trends in mainstream social research, though probably often out of sheer frustration with persistent misunderstandings, and because of genuine disagreement.
<!-- TODO add more detail on the problems of the statistics - they're out of date! -->
<!-- TODO add examples -->
In epistemological squabbles, too, crazy people [^freakshow] sometimes have real enemies --- or opponents, at any rate.
<!-- TODO to name these opponents, cite from BRown's concluding words in his book -->
Happily, within Q, too, considerable disagreement remains (for example, on the appropriate factor extraction technique), though legacy procedures and programs sometimes hamper intellectual progress.
Unfortunately, misunderstanding and marginalization is sometimes compounded by a lack of deep statistical understanding, though rarely digressing into glib ignorance of "technicalities" or outright mistakes ("varimax rotation maximizes variation", in the otherwise fine textbook introduction by @Watts2012).
<!-- TODO find precise quotes -->
<!-- FIXME make sure this is, in fact a mistake with the varimax -->
What may easily appear as articles of faith ("thou shalt not use automatic rotation"), are in fact thoroughly argued reservations, based on a deep mathematical and epistemological understanding, as evidenced in both the works of William Stephenson and Stephen Brown, though at least some of this orthodoxy now appears obsolete.
<!-- TODO reword this, based on recent experience especially with "centroid myth" -->
<!-- TODO add citations -->
These norms become hermetically sealed ideologies ("reliability does not apply to subjectivity") only when detached from their epistemological underpinnings, as, I suspect, would be the case for other methodologies.
<!-- TODO find source, argue that in fact, as per Brown, factor reliability DOES matter -->
3. *Experimental Design*
<!-- TODO fill in! -->
4. *Literate Statistics.*
Not just when, or if, a method is new or controversial, as in Q, mathematical operations and argumentative prose should not be let to diverge too widely.
The intuition of literate programming holds for statistics, too: we do not *use* unintelligible algebra to *give* us a comprehensible result to be explained.
*Neither* mathematics nor prose are *primary*, algebra and language are *both* the explaining intellect at work, just in a different registrar.
[^freakshow]:
A prominent researcher recently described the annual meeting of the [International Society for the Scientific Study of Subjectivity](http://qmethod.org/issss) as an endearing "freak show".
With heirs of the methods founder, William Stephenson, occasionally in attendance, it sure sounds different from your average disinterested get-together.
I suggest then, that at least in the following, initial analysis, some verbosity has its virtue.
Readers will be relieved to learn that I intend not to reproduce the entire mathematical apparatus of Q:
Steven Brown has accomplished that, and much more in his authoritative, insight-laced "Political Subjectivity" to which my own work is deeply indebted.
<!-- TODO make this better, humbler -->
Mathematics have their place, but formulae alone, in spite of their veneer of rigor, need analogy, intuition and impression to cover the distance to the deliberative subjectities on taxation and the economy under study here --- which themselves are not mathematized in this dissertation, nor much convincingly in the broader field, and maybe never will.
<!-- TODO find better link to previous chapter -->
One may add, that if science is also bound to a discourse ethic of reaching understanding, it too needs to relate its abstractions to the human lifeworld, from which all meaning eminates.
<!-- TODO make this better or kill it -->
<!-- TODO add Habermas reference? -->
At the same time, some of the details of this chapter may raise the ire of some established Q methodologists who never tire of stressing the substantive analysis over statistical sophistication.
They are right: the key to understanding human subjectivity lies in iterative abduction, in the thorough going back and forth between informed hunches about *what might make sense*, and what the data will bear out.
The following statistical groundwork is a worthwhile, but merely *necessary* --- not *sufficient* --- condition to a *scientific* study of subjectivity (emphasis added, though intended by the [ISSSS](International Society for the Scientific Study of Subjectivity)).
The following pages will hopefully dissuade the Q skeptics, and take along newcomers to the method.
To make sense --- as we must --- of the shared viewpoints on taxation and the economy among the participants of the first CiviCon Citizen Conference, we must first be sure what they are, and if they are, in fact, shared.
## (No) Descriptives {#descriptives}
It is common to reproduce descriptive statistics before turning to more advanced analyses.
Common summary statistics often include measures of central tendency (mean, median) and dispersion (standard deviation, range) on *variables* across *cases*.
As mentioned before, in Q methodology, variables are *people*, and cases are *items*.
The mean rank alloted by Q sorters (as the *variables*) across all items (as the *cases*), $\overline{x} = \frac{\sum{x}}{N}$, is then, unsurprisingly, `r unique(apply(q_sorts[,,"before"], 2, mean))` for *all* `r ncol(q_sorts[,,"before"])` participants.
The *average* position of items has to be zero, because the forced distribution was symmetrical: since participants placed an equal number of cards at `-1 / 1`, `-2 / 2` and so on, item ranks will always cancel out.
The most frequent *median* value, also by definition, will be `r unique(apply(q_sorts[,,"before"], 2, median))` for everyone, because the center value allowed the greatest number of (`r max(q_distribution)`) cards per the forced distribution.
By the same token, the *range* extends from `r unique(apply(q_sorts[,,"before"], 2, min))` to `r unique(apply(q_sorts[,,"before"], 2, max))` for everyone, because those extreme values are defined by the forced distribution.
The *standard deviation*, too, is the same for everyone.
$s_N = \sqrt{\frac{1}{N} \sum_{i=1}^N (x_i - \overline{x})^2}$, gives the square root of the average squared differences of item scores for a participant, `r unique(apply(q_sorts[,,"before"], 2, pop.sd))` for everyone, because the "spread" of items across the mean is defined by the forced distribution.
Conventional *R* type descriptives are, then, quite meaningless [Nahinsky 1965 as cited in @Brown1980, 265], because they are the same for all participants and are defined ex-ante by the forced distribution [^free-distro-descr].
<!-- TODO fix nahinsky quote -->
[^free-distro-descr]: Descriptives are meaningless *only* if the Q distribution is *forced*, that is, the same for everyone.
If participants are allowed to sort into differently shaped --- but symmetric --- distributions, the standard deviation $s$ may indicate the degree of polarization in viewpoints that different people hold, that is, about how many items they feel extremely.
If, additionally, participants are allowed to sort into asymmetric distributions, the mean $\overline{x}$ may indicate the overall tendency of people to agree with the sorted statements.
"Free" distributions are a frequently discussed, if rarely used possibility for Q methodology.
<!-- TODO MH add source on free/forced -->
```{r descriptives, echo = FALSE, include = FALSE}
apply(q_sorts[,,"before"],1,mean)[order(apply(q_sorts[,,"before"],1,mean))] # sort by mean
apply(q_sorts[,,"before"],1,pop.sd)[order(apply(q_sorts[,,"before"],1,pop.sd))] # sort by sd
person_cor <- cor(t(q_sorts[,,"before"])) # make cor matrix across persons
person_cor[upper.tri(person_cor, diag=TRUE)] <- NA # kill upper triangle
person_cor_m <- reshape2::melt(person_cor, na.rm=TRUE) # melt it
person_cor_m[with(person_cor_m, order(-value)), ] # order it
person_cor_m[which.max(person_cor_m$value),] # pos max
person_cor_m[which.min(person_cor_m$value),] # neg max
person_cor_m[which.min(abs(person_cor_m$value)),] # abs min
```
In search for descriptives, one may be tempted then to revert back to the R way of looking at data, and to treat Q-sorted *items* as variables, and Q-sorting people as *cases*.
One may ask, for example, which items were, on average of all participants, rated
the highest (``r rownames(q_sorts)[which.max(apply(q_sorts[,,"before"],1,mean))]`` at `r max(apply(q_sorts[,,"before"],1,mean))`),
the lowest (``r rownames(q_sorts)[which.min(apply(q_sorts[,,"before"],1,mean))]`` at `r min(apply(q_sorts[,,"before"],1,mean))`),
which items were dispersed the most (``r rownames(q_sorts)[which.max(apply(q_sorts[,,"before"],1,pop.sd))]`` at `r max(apply(q_sorts[,,"before"],1,pop.sd))`),
the least (``r rownames(q_sorts)[which.min(apply(q_sorts[,,"before"],1,pop.sd))]`` at `r min(apply(q_sorts[,,"before"],1,pop.sd))`),
or --- most offending to Q methodologists, as well as spurious ---, which *items* were correlated
the most positively (``r person_cor_m[which.max(person_cor_m$value),c(1,2)]`` at `r max(person_cor_m$value)`),
the most negatively (``r person_cor_m[which.min(person_cor_m$value),c(1,2)]`` at `r min(person_cor_m$value)`),
and the least (``r person_cor_m[which.min(abs(person_cor_m$value)),c(1,2)]`` at `r min(abs(person_cor_m$value))`).
This kind of exploration is fascinating, and it invites seemingly inductive hypotheses:
Do people respond most strongly to `poll-tax` and `simple-tax`, because these --- and other, supposedly similar items --- are easy to comprehend and relate to?
Do people respond quite differently to `pro-socialism`, and quite similarly to `land-value-tax-resource`, because the former obviously aligns with political identities, while the latter does not?
Do people feel similarly about `exchange-value` and `natural-market-order` because of a pervasive anti-market bias,
<!-- TODO add caplan reference -->
and very dissimilar about a `wealth-tax` and a `corporate-income-tax`, because they fall for a flypaper-theory of tax incidence?
<!-- TODO add mccaffery reference -->
While these survey-type approaches to the data are intriguing, they are inadequate for the data gathered here.
It may appear arbitrary to categorically rule out a range of broadly-applicable techniques, let alone summary statistics on the grounds that the data were gathered for a different *purpose*.
For example, Likert-scaled data collected for a (R-type) factor analysis may, if conditions apply, be subjected to a regression analysis.
Q, however, is not just another statistical operation, it is a *methodology*, and while the gathered data bear a superficial resemblance to R methodological research, "[t]here never was a single matrix of scores to which *both* R and Q apply" [@Stephenson1935a: 15, emphasis in original].
<!-- TODO find, verify original source. Browns 1980: 347 bibliography is unclear -->
<!-- TODO @Stephenson1952 483 on why Q and R are absolutely *not* trivially similar -->
This postulated incomparability applies to this study, too, and plays out in several ways:
1. *Generalizeability & Small N Design.*
In R, the *participants* are usually a *sample* of cases from a broader *population*, and the generalizeability of the results hinges on the quality of this ideally large, random selection.
In Q, *participants* are the *variables*, on which supposedly shared, subjective viewpoints are measured.
Sampling theory does not apply to Q, just as one would not ask *random* questions in a survey (or so one hopes).
Instead, Q method requires researchers to broadly maximize the diversity of participants to capture all existing viewpoints.
<!-- TODO add saturation sampling reference here -->
Generalizeability in Q, if applicable at all, concerns the representativeness of the sampled *items* of a "population" of statements, though no straightforward sampling theory applies to this concourse either, as it is both infinite and non-discrete.
The P-set of this study, the participants in the [CiviCon Citizen Conference](http://www.civicon.de), are self-selected and probably not representative of the broader population, though recruitment was inclusive and diverse.
This flagrantly non-random sampling alone, however, may not yet rule out generalizeability to a broader population of *citizens willing to participate in deliberation*.
As I argue elsewhere, both self-selection and diversity of point of views are crucial for meaningful, *concept-valid* deliberation, and such non-random sampling may therefore be *required* for even R-methodological research into the effects of deliberation.
<!-- TODO add reference to concept-valid deliberation elsewhere -->
Given the haphazard recruitment and great demands placed on citizens, the group of CiviCon participants must probably be considered excessively non-random even by the more charitable standards of deliberation, but an even greater problem lies in the small number of participants.
<!-- TODO add footnote about bias of sample -->
A "sample" of $N=`r ncol(q_sorts[,,"before"])`$ people (including 2 moderating "confederates") simply does not admit of generalizations toward a broader population, even of would-be deliberators.
R statistics, even the cursory summary statistics in the above, implicitly rely on this kind of generalizeability:
If we concede that, say, the low average score of ``r rownames(q_sorts)[which.min(apply(q_sorts[,,"before"],1,mean))]`` at `r min(apply(q_sorts[,,"before"],1,mean))` is, to a large extent an artifact of the people who happened to show up at a locally-advertised, five-day sleepover conference for EUR 50 compensation, it becomes dubious why we should care about this factoid at all --- other than to characterize the *bias* of the group ^[Though that comparison, too, would require some representative baseline.].
<!-- TODO actually THAT may be a good reason -->
Given limited funds, and little experience the CiviCon Citizen Conference was *designed* for a small group of people, and this constraint was --- admittedly --- operational in the choice of Q methodology.
<!-- FIXME wrong verb, it's not operational ... -->
<!-- TODO add footnote on the large-n/long tradeoff -->
The small number of participants (even by Q standards) restricts the following Q analysis, too, but in a different, provisionally acceptable way.
<!-- FIXME maybe this belongs partly somewhere else? -->
Recall that in Q, people serve as *variables*.
`r ncol(q_sorts[,,"before"])` variables will still be considered quite crude to extract several latent concepts.
<!-- FIXME uh-oh you say elsewhere that there are no latent concepts -->
Consider, for example, the data reduction in Inglehart's and Welzel's human development theory: to arrive at *two* latent value concepts, they condense *35* variables [-@InglehartWelzel-2005-aa].
<!-- FIXME find out correct number of vars -->
The issue here, however, is one of *resolution*, not generalizeability.
With relatively few people-variables, Q method will be able to extract only a few, blurred shared viewpoints.
But the exercise is not moot: given a decent Q-sample of items, potential additional participants are exceedingly unlikely to render the factors extracted here null and void --- that scenario is later rejected as a Null-Hypothesis of sorts.
<!-- TODO uh, that is maybe an overstatement - the factors might look quite different, though it is unlikely that there would be *no factors* -->
<!-- TODO add reference to why that is in fact what later tests test -->
By contrast, adding more people to this study may very well render the high average position of, say, ``r rownames(q_sorts)[which.max(apply(q_sorts[,,"before"],1,mean))]`` at `r max(apply(q_sorts[,,"before"],1,mean))` a product of randomness or bias (a high standard deviation of `r pop.sd(q_sorts["simple-tax",,"before"])` would appear to bear this out).
The now familiar analogy of LEGO bricks serves to illustrate this difference in concepts of generalizeability once more.
Tasking, say, 15 people to build something out of a given set of LEGO bricks may seem reasonable to get a preliminary idea of the *kinds* of objects (cars, houses) that people build --- though we will probably miss out on some rarer constellations (spaceships?) and nuances of existing patterns (convertibles?).
Adding more people may increase the resolution to tease out these details, but it's unlikely that something as basic as "car-like", or "house-like" will completely disappear.
By contrast, again, it seems much *less* reasonable to take the ratings of *individual* LEGO bricks by these 15 people as anything but random flukes: there are bound to be some people who like red bricks, more so if the study listing displayed red bricks.
2. *Holism & Non-Independence.*
R methodological opinion research often proceeds by analysis, then synthesis; a construct is first dissected into its constituent parts (say, items), then reassembled into some composite (say, a scale or an index).
Truth, in principle, flows from inter-individual differences on the smallest measurable unit of meaning.
Inglehart's and Welzel's work with the World Values Survey, again, serves to illustrate [-@InglehartWelzel-2005-aa].
Items are constructed with some view to a broader construct (say, civicness), and then narrowed down to number of smaller concepts and items.
The reconstruction of latent concepts happens through a very deliberate, almost literate *synthesis*: you take $x$ units of item $A$, $y$ units of item $B$, standardize the result, and out comes a theory of human development to be subjected to confirmatory factor analysis.
<!-- FIXME really need to go back to the source and figure this out. This is probably BS -->
Here, too, the latent constructs (say, traditional vs secular-rational values), are of greater interest than any individual item, but the relationship is merely *additive*, or *triangulatory*, at best: you synthesize several variables to cancel out bias, and to get at a common concept that might unite them all, but these bigger pictures can never be more than the sum of their parts, let alone *different*.
<!-- TODO also add some Brown quotes to all of this -->
Q, by contrast, has a *holistic* outlook on human subjectivity.
What matters are not the individual items, but there overall constellation, that is, how (groups of) participants value these items *relative* to one another.
Summarizes Brown:
> In moving from R to Q, a fundamental transformation takes place:
> In R, one is normally dealing with objectively scorable traits which take meaning from the postulation of individual differences between persons, e.g. that individual $a$ has more of trait $A$ than does individual $b$;
> in Q, one is dealing fundamentally with the individual's subjectivity which takes meaning in terms of the proposition that person $a$ values trait $A$ more than $B$.
>
> [@Brown1980: 19]
These epistemological differences have practical research implications.
In R, items are supposed to narrowly measure *one* concept and should not invite *multiple* interpretations.
A typical item from the World Values Survey, `How would you feel about your daughter(son) moving in with a person from a different ethnicity?` was crafted to elicit a predefined, unambiguous scenario in the minds of respondents.
<!-- TODO find example for/against double-barelled items in Q -->
In Q, an item as open-ended as `labor-no-commodity` (``r q_concourse["labor-no-commodity","english"]``) is suitable *because* it invites a variety of different interpretations on what labor or a commodity are.
<!-- TODO need to add more here - there are limits to Q openennes of interpretation too; refer back to the section on item discussions -->
R survey design, and drafting (or collecting) Q statements may both be more craft than science, but they are a very different crafts that limit what can, and cannot be done with results.
Given the confidence we place in the common understanding of an R-type item, it is reasonable to present summary statistics on this *individual* item.
Conversely, given the openness to interpretation in Q, single item summaries make little sense, precisely because they were chosen to mean very different things to different people.
<!-- TODO notice that Q items are not always crafted as in this case; that makes the difference maybe even starker -->
In R, items are (mostly) measured *independently* from one another, that is, the choice of a participant on one item should not influence or restrict her choice on another item.
This independence is not only a requirement for some of the statistical procedures frequently used, it also follows from the analytic-synthetic epistemology:
if the synthesizing operation is subject to hypothesis-testing, any relationships between individual items produced by the data gathering method would be considered an undesirable artifact.
<!-- TODO find statistical concept for non-independence-->
*Rating*, rather than *ranking* measurements are therefore the norm, and some survey research goes as far as randomizing the questionnaire to avoid ordering effects.
<!-- TODO find source for randomized order -->
In Q, by contrast, items are evaluated *only* relative to one another and participants are reminded that the absolute scores assigned (`r range(q_sorts[,,"before"], na.rm = FALSE)` in this case) merely imply *ordinal* (better *than*), not *cardinal* valuation.
<!-- TODO notice that at least I stressed that; also relate to later discussion -->
Relative item rating in Q has a technical reason, too:
Only because all items are evaluated relative to all other items, all item scores come in the *same* unit (valuation relative to the remaining items), a transposed correlation matrix becomes possible (a statistical summary, of, say, height and weight would be nonsensical).
<!-- TODO is this analogy exactly right? Have I really gotten this? -->
But strictly relative valuation has an epistemological dimension, too: if meaning can be derived from the *entire* item constellation only, participants must always choose *between* items.
*Rating* and *ranking* measurements, taken to these extremes, are not interchangeable and strictly limit the meaningful presentation of results.
An item from the World Values Survey can be summarized in isolation, because it was measured independently:
the agreement with, say, a son-in-law from a different ethnicity by any one participant does not preclude her equal agreement with another item.
An item from a Q study as this cannot be summarized in isolation, because it was measured in a dependent way: a participant ranking, for example, `labor-no-commodity` at the top, will be precluded from ranking any other item in the same way.
By extension, it also makes little sense to present a mean valuation of `labor-no-commodity` (`r mean(q_sorts["labor-no-commodity",,"before"])`) in isolation because, in absence of the means of *all other items*, we have no idea what that value *means* (it might mean very different things, for example, if all other items were supremely agreeable).
The analogy of LEGO bricks only slightly overstates the absurdity of ignoring the holism aspired to by, and non-independence implied by Q.
Recall that participants were instructed to assemble a given set of bricks into an object of their design.
It would clearly be nonsensical to take the low *absolute* (x-axis) position of, for example, a small red brick as an indication of disliking, without relating its position to the overall design: depending on the position of other bricks, it might be part of an exposed car fender, or a hidden structure in a house basement.
It would be similarly perplexing to harp on the higher average (x-axis) position of ramp-shaped bricks, compared to the cuboid-shaped bricks:
*of course*, if you are putting roof tiles on top, you *have* to place the foundation walls of a house underneath it.
<!-- TODO make a nice table contrasting the above things? -->
3. *Validity & Research Ethics.*
Finally, Q and R differ in their conceptions of validity.
In R, a measurement of inference is valid, if and to the extent that the stated concepts correspond to some objective reality "out there".
For example, the above-mentioned WVS item on `your daughter marrying someone from a different ethny` rests on the assumption that there is such a thing as an attitude on inter-ethnic relationships (which seems reasonable).
<!-- FIXME again, fix the WVS reference, or kill it -->
No matter how supposedly *inductive* --- probably just *exploratory* --- an R approach to data is, the terms of that data are always set *before the fact*, *by the researcher*.
Notice that the WVS item on inter-ethnic relationships will *not* admit of an attitude unforeseen by the researchers, say, a discriminatory sentiment based on *language*.
In Q, conventional definitions of validity do not easily apply, at least because there is, by definition, no external and objective standard to verify human subjectivity.
<!-- TODO point to respective section where I discuss that -->
Instead, one *posits* that viewpoints become *operant* through the act of sorting Q-cards, and that even a limited sample of diverse items will give people a roughly adequate material to impress their subjectivity on.
<!-- Notice Browns language on this in the email on coefficients: it's a logic -of -science kind of axiom; we model things as if - we also model them on the forced distribution, might add this here -->
Items may --- or may not --- have been crafted and sampled with some theoretical preconception in mind (for example, libertarianism for `deontological-katallaxy`), but that preconception should not, in principle, limit the meaning that participants ascribe to it.
Dramatically *different* interpretations of the same item, are emphatically *not* a threat to validity in Q.
It is easy to see how the above, cursory summary statistics shift the standard of validity, and invite what Brown calls "hypotheticodeductive" inferences [-@Brown1980: 121].
To nod at the high positive correlation of `exchange-value` and `natural-market-order` across participants is to reify a concept of "market radicalism" (or something similar), from which these items probably sprung *in the mind of the researcher*.
*"Ah, that makes sense"*, one thinks --- and by sense, meaning the sense of the researcher and (maybe similar) sense of the reader.
Looking at any particular item or relationships between items across people, always risks implying that the *meaning* of any given item is the same for everyone, as per the researcher's specification.
It is also easy --- and self-serving for Q --- to overstate the difference to a hypotheticodeductive paradigm.
Falling short of accepting a debilitating, radical constructivism, researchers --- as all people --- will always have to rely on some *common* understanding of language, including Q researchers.
Researchers cannot interpret shared viewpoints of common sorting patterns (the factor scores) without *some* reference to their own preconceptions in drafting or sampling items.
No factor interpretation of holistic patterns of items can be undertaken without *some* reference to their own preconceptions about those items.
It would be impossible to interpret even the relative position of `exchange-value`, rejecting any overlap or relationship to our academic understanding of that item:
we would be looking at a heap of structured gibberish.
Q researchers are, in fact, quite fond of their own judgment: "researchers might, on occasion, 'know what to look for'" [Stephenson 1953: 44, as cited in @Watts2012: K2308].
The epistemology of *abduction* suggests to repeatedly go back and forth between hunches, interpretation and what the data will bear out.
<!-- TODO that's weak; fix and refer to abduction section -->
The point of validity in Q methodology is not to ban all researcher's preconceptions, but to both *relax* their grip on meaning, and to *constrain* the plausibility of their explanations.
On the one hand, the meaning of items is *relaxed*, because the linear relationship to the researcher's hypotheticodeductions is no longer presupposed:
`pro-socialism`, may, or may not have been interpreted as the researcher intended.
On the other hand, the plausibility of any explanation is *constrained*, because any observed shared viewpoint must make *sense*, per *some* set of interpretations of its items.
These epistemological "Goldilocks" conditions disappear when R statistics are applied to these data:
when presented the isolated finding of the high negative correlation between `wealth-tax` and `corporate-income-tax` (`r min(person_cor_m$value)`), we can *only* refer back to some of the preconceptions on "misunderstanding taxation" presented earlier (Flypaper Theory).
<!-- TODO add reference to other section, mcccaffery -->
This may be a productive approach --- though probably not best undertaken with these items and data gathering --- but thereby "testing" people's understanding of tax emphatically is not a valid measurement of their subjectivity, and falls short of the deliberative conception of public choice espoused earlier.
<!-- TODO that's not quite right. Also, reference other section -->
These epistemological concerns also carry a research ethical imperative:
The data should be treated as participants were told it *would* be treated.
That adherence to informed consent *includes* the methodology.
<!-- TODO link to informed consent -->
I told the `r ncol(q_sorts)-2` guests at the first CiviCon Citizen Conference that I was interested in their viewpoints on taxation, that their unique perspectives on the economy mattered, and that there *would be no wrong way* to sort the Q-cards.
<!-- TODO link to the condition of instruction -->
For up to three hours, these citizens earnestly struggled with these `r nrow(q_sorts)` items "on which reasonable people could disagree", thoroughly weighing thoughts and items against another, graciously pouring their thoughts into the rigid and crude form of a Q-sort.
It would betray the trust and degrade the effort of the participants to treat these data as if they were R:
I could have just handed them a survey and saved them a lot of time.
The ethics get even more damning, when R statistics treat data as tests, and understanding turns to grading, as so easily happens when looking at the extremely negative correlation of `wealth-tax` and `corporate-income-tax`, or similar would-be inconsistencies.
In the morning session of the second day, some participants expressed worry over being "tested" in the Q-sorts, as well as the conference.
The conundrum of measuring people's understanding of taxation, without testing them on some narrowly-preconceived notion of consistency is a familiar one, and it will continue to plague this research.
<!-- TODO add link -->
The least I owe to the CiviCon participants is to make every effort to render their viewpoints understandable *from the inside*.
That cannot be done with R methodology, and implore everyone using this data *not* to try.
LEGO, again, delivers a coda by analogy.
A hypotheticodeductive inference on a persons LEGO building would rely on a preconceived understanding of what any given brick means, for example: "this is a transparent brick".
Any discussion of the (absolute) position of individual items would, absent any other information, have to revert to the preconception of the researcher.
For example, one might argue that the extreme placement (on any axis) of the transparent bricks indicates strong feelings for these types of bricks.
In the context of a LEGO house, of course, the transparent bricks may be considered *windows* by participant builders, and be placed on the exterior of the building because of that function.
More varied meanings of transparent bricks are conceivable.
A subset of builders may place *small* transparent bricks *inside* the houses, not readily explicable by their transparency: why would *anyone* greatly place almost all transparent bricks on the perimeter, but only a very small one inside, the R researcher may wonder?
Given the context, one may recognize the surroundings of the small transparent bricks as "kitchens" and venture that these builders interpreted them as maybe sinks full of water, or aquariums.
Characteristically, the meaning of "transparent brick" is relaxed considerably in the light of context, but it also remains bound to a common understanding of some of its features (transparency), and therefore constrained to the subsets of possible meanings that might make sense of it (aquarium, sink full of water, or other *transparent* objects).
One may add that the participant builders may also be quite offended, if they learned that their LEGO constructions did not receive any attention as such, but where instead intended as a test of visual acuity:
whether participants could distinguish transparent from non-transparent bricks.
The point here is not to criticize R methodology, though the near hegemony over quantitative social sciences that Brown rails against [-@Brown1980: 321], may indeed have long stretched thin its foundations and overreached its purview.
The point is that, for the present research question *or* given the present data, any excursion to R methodology would at best be a distraction, and at worst strictly unscientific.
## Correlations
To extract *shared* viewpoints, we must first establish what it means or any pair of Q-sorts to be *alike*.
Since we would expect people with similar viewpoints to sort items in similar positions, and therefore have similar values over all items, respectively, we can use a correlation coefficient between any two people's sorts as a statistical summary of their similarity.
A *correlation matrix* of all such pairs is the simplest summary statistic that can be presented for Q data, maintaining both the holistic and relative nature of the sorts.
Though Q studies do not always display or discuss the correlation matrix, all of the downstream analyses are, in fact, based on this table [@Brown1980: 207].
### Correlation Coefficients
Depending on assumptions made, *different* correlation coefficients are applicable to Q data.
These coefficients are well-known statistics, but, their underlying assumptions can be thorny for Q methodology and consequential for these data.
The proceeding discussion also serves to rehearse the uncommon person-correlations at the heart of Q methodology.
<!-- TODO notice that q textbook does not even mention these, and Brown glosses over them --- they may matter though -->
<!-- TODO it may be necessary to reorganize the following into the question of whether the data is ordinal or interval -->
1. *Pearson's $\rho$ Product-Moment-Correlation Coefficient*.
The conventionally used Pearson product-moment correlation coefficient, or Pearson's $\rho$, starts from the covariance of any pair of Q-sorts, *in their raw form*,
(@cov) $$cov(X,Y)=\sum_{i=1}^{N}{\frac{(x_i-\overline{x})(y_i-\overline{y})}{N}}$$ [^pop-corr].
<!-- TODO add reference to Pearson -->
Since in a forcibly symmetric Q distribution as ours, the mean will always be zero (see above), we can safely ignore the respective $-\overline{x}$ terms; by default, Q-sorts are already expressed as deviations from their expected values, $E$.
The covariance on all *items* for, say `Ingrid` and `Petra`, is therefore simply the cross-product of all item positions divided by the number of items, $N$.
It's easy to see how this number will be larger when the two scores have similar (absolute) values: `Petra`'s and `Ingrid`'s quite different ranks for, say `all-people-own-earth` (`r q_sorts["all-people-own-earth",c("Petra","Ingrid"),"before"]`, respectively) multiply to `r q_sorts["all-people-own-earth",c("Petra"),"before"] * q_sorts["all-people-own-earth",c("Ingrid"),"before"]`, whereas their more similar positions on `deontological-ethics` (`r q_sorts["deontological-ethics",c("Petra","Ingrid"),"before"]`, respectively) yields `r q_sorts["deontological-ethics",c("Petra"),"before"] * q_sorts["deontological-ethics",c("Ingrid"),"before"]`.
Brown reminds us that this multiplicative weighting of like extreme scores is "phenomenologically" appropriate, because respondents feel more strongly and are more certain about items ranked towards the margins [-@Brown1980: 271].
<!-- TODO maybe make this footnote? -->
The resulting covariance is, unfortunately, hard to interpret, because it is in squared units and depends on the spread of the data.
It is normalized into Pearson's product-moment correlation coefficient by dividing it by the product of the two standard deviations, in this case, `Petra`'s and `Ingrid`'s --- both of which are, by definition, the same.
(@Pearson) $$\rho(Petra,Ingrid) = \frac{\operatorname{Cov}(Petra,Ingrid)}{\sigma(Petra)\sigma(Ingrid)}$$
Consider the special case of correlating a person's Q-sort with her own Q-sort, say `Petra` with `Petra`, to understand the bounds of the Pearson's $\rho$.
The denominator in (@Pearson) yields the $\operatorname{Var}(X)$, the square of `Petra`'s standard deviation over the item scores is, by definition, her variance.
The numerator is given by the summed product of `Petra`'s scores with `Petra`'s scores, which is also the variance.
The correlation coefficient of two identical Q-sorts is therefore $1$.
A correlation coefficient of $-1$ would result if the two Q-sorts were diametrically opposed, as if mirrored around the distribution mean of zero:
in that case, the numerator would be $-\operatorname{Var}-$, and everything else would remain the same.
<!-- TODO probably need the variance in here, or earlier? -->
A correlation coefficient of $0$ would be expected if two Q-sorts are entirely unrelated.
All of these extremes are rarely reached in Q, with a range of `r range(cor(q_sorts[,,"before"])[lower.tri(cor(q_sorts[,,"before"]))])` in this case.
The Pearson correlation coefficient is a measure of *linear* association between two variables, or Q-sorts in our case.
2. *Spearman's $\rho$ Rank Correlation Coefficient.*
Spearman's $\rho$ works much like Pearson's $\rho$, but starts from *rank orders*, instead of raw Q-sort values.
Raw Q-sorts are rank ordered, and assigned their rank index.
Crucially, in a case of ties --- of which there are many in Q ---, the average of the would-be occupied ranks is assigned as a rank.
For example, if two cards are sorted in the $-5$ column, as per the Q distribution, they will both receive the rank of $3.5$, because they *would* occupy the 3rd and 4th position, if they were not tied.
A Q-sort, beginning from the left (negative) extreme will look like this if transformed into Spearman's ranks:
Raw Value Index Spearman's Rank
---------- ------ ----------------
-7 1 1
-6 2 2
-5 3 3.5
-5 4 3.5
-4 5 6.5
-4 6 6.5
-4 7 6.5
-4 8 6.5
-3 9 12
-3 10 12
-3 11 12
-3 12 12
-3 13 12
-3 14 12
---------- ------ ---------------
Spearman's procedure then computes a Pearson's correlation coefficient (as in the above) from the rank *differences* $d_i = x_i - y_i$ (instead of the difference to the mean).
<!-- TODO add reference to Spearman's -->
This version of Pearson's coefficient can be summarized thus:
(@spearmans-rho) $$\rho = 1- \frac{6 \sum d^2}{n^3}$$
The resulting coefficient is likewise bounded from $+1$ to $-1$, with perfect correlation indicating not a *linear* association, but a strictly *monotone* association, where all items are ranked the same, though possibly by different "amounts".
Conversely, two Q-sorts with mutually reversed ranks would yield a $-1$ coefficient, indicating a negative, monotone relationship.
3. *Kendall's $\tau$ Rank Correlation Coefficient*
<!-- Brown in email: Then Cartwright (1957) recommended Kendall's relatively new tau statistic for Q sorts and other cases when "it is desired to use a forced rectangular distribution, or some other forced shape of distribution which may or may not satisfy the requirements for using a product moment coefficient" (p. 102). After making this noteworthy contribution, one might have expected that Cartwright would hang around long enough to use it, but alas.
Cartwright, D.S. (1957). A computational procedure for tau correlation. Psychometrika, 22, 97-104. -->
Kendall's $\tau$ starts not with differences in scores or ranks, but with a comparison of item pairs between the two Q-sorts.
When the ranks of two items ($j$, $i$) agree between two Q-sorts ($x$, $y$), the item pair is said to be *concordant* ($x_i > x_j$ and $y_i > y_j$, or $x_i < x_j$ and $y_i < y_j$).
When the ranks between the two respondents disagree ($x_i > x_j$ and $y_i < y_j$ or $x_i < x_j$ and $y_i > y_j$), they are said to be *discordant*.
An item pair is said to be *tied* on a Q-Sort if both items have received the same score (as frequently happens in Q), $x_i = x_j$.
In the simplest form, Kendall's $\tau_A$ divides the difference between the number of concordant pairs $n_c$ and the number of discordant pairs $n_d$ by half the number of pairs squared:
$\tau_A = \frac{n_c - n_d}{n(n-1)/2}$.
For example, `Petra` and `Ingrid` are *concordant* on `poll-tax` (`r q_sorts["poll-tax",c("Petra","Ingrid"),"before"]`, respectively) and `corporate-income-tax` (`q_sorts["corporate-income-tax",c("Petra","Ingrid"),"before"]`, respectively): they both prefer `corporate-income-tax` over `poll-tax`, if by a different amount.
They are *discordent* on `infinite-growth` (`r q_sorts["infinite-growth",c("Petra","Ingrid"),"before"]`, respectively) and `use-value` (`r q_sorts["use-value",c("Petra","Ingrid"),"before"]`, respectively):
Between the two, `Petra` prefers `infinite-growth`, `Ingrid` prefers `use-value`.
Because the denominator gives the total number of combinations of two items out of the total number of items (in this case `r (77*76)/2`), Kendall's $\tau$, too, is bounded from $+1$ to $-1$.
If two Q-sorts are identical (opposed), they will (dis)agree on *all* pairwise item comparisons, divided by *all* possible such combinations, yielding ($-$)$1$.
If two Q-sorts agreed and disagreed on half of the comparisons each, the numerator will be $0$, yielding a $0$ coefficient.
Because in Q-sorts items must be stacked in some columns, there are many ties.
Kendall's $\tau_B$ additional terms in the denominator adjust for such ties, considering the number of ties $n_1$ and $n_2$ on both Q-sorts:
(@kendalls-tau) $$\tau_B = \frac{n_c - n_d}{\sqrt{(n_0-n_1)(n_0-n_2)}}$$.
Kendall's $\tau$, does not give an association between the two Q-sorts, but expresses the *probability*, that any given pair of items will be ranked equally between the two Q-sorts.
[^pop-corr]: Here, as in all of the below formulae, I reproduce the (unbiased) *population* statistic.
The population statistics seem appropriate for (at least) a forcibly symmetric distribution as this, because a population mean of *any* number or selection of sampled statements is, in fact, known:
it must be zero, per the specified distribution.
The mean need not be estimated, and no correction for estimation bias need be introduced.
Steven Brown (email conversion in February 2015) also suggests to use the population variant, though the usage and terminology in his 1980 book *Political Subjectivity* are a little confusing.
<!-- TODO add email specifics citation -->
On a technical level, however, all of the code used to calculate correlations reported here us the *sample* statistic.
Because because the de-biased $N-1$ term occurs both in the numerator and denominator of the correlation coefficient, it cancels out.
<!-- TODO fix this or add link to GH issue -->
<!-- FIXME add reference to Political Subjectivity -->
The choice among these correlation coefficients depends first on the kind of data gathered upstream, and the kind of analysis intended downstream.
Pearson's $\rho$ requires at least interval-scaled data: the *difference* between values must have *cardinal*, not just ordinal meaning. ^[Aside from these considerations, Pearson's $\rho$ is susceptible to outliers (*not robust*) and requires normally distributed data, though both are not a great concern in Q methodology, when forced distributions bound extreme values and close to a bell curve (as the current distribution is).
For example, cardinal valuation would imply cards sorted under $3$ are preferred to cards sorted $2$ by the *same amount* as cards between $7$ and $6$ respectively --- not just that higher sorted cards are valued *more*.
Both Spearman's $\rho$ and Kendall's $\tau$ relax this assumption, and can be used on ordinally-scaled data, with item scores merely rank, but not a rating.
#### Ordinal vs Interval Data
Q method is, unfortunately, a bit muddled on the issue of cardinal or ordinal valuation in Q-sorts.
Most studies simply use the default Pearson coefficient and many articles as well as the leading introductory textbook on the method [@Watts2012] do not even *mention*, let alone justify, the use of one of the three common coefficients.
This oversight might be easily forgiven, because for Q data, at least Spearman's and Pearson's coefficients appear to be closely related.
Given the relatively high number of ranks assigned (usually more than 9, 15 in this case), many researchers might have no trouble treating the data as interval-scaled, much as (shorter-ranged) Likert-scales are often interpreted.
<!-- TODO add citation on what ordinal can usually be used as interval -->
Additionally, Q data is tightly bounded, especially when a forced distribution is used: there *can* be no outliers, that might inordinately affect Pearson's, but not Spearman's.
One might then conclude, as Block [-@Block-1961: 78] and Brown [-@Brown-1971: 284] that the choice of correlation coefficient does not much matter, effectively sidestepping the fundamental question of which coefficient is the appropriate one.
Brown argues correctly that *absent ties*, Spearman's and Pearson's *must* be identical, and that illustrate with his data, that even *with ties*, Spearman's and Pearson's "produce virtually identical results [...]", which are "in turn [...] essentially the same as those obtained utilizing Kendall's $\rho$" [-@Brown1980: 279].
The problem is that, in fact, Q data has *many* ties, and using this data, the coefficients *do* differ by an average of `r mean(abs(cor(q_sorts[,,"before"],method="pearson") - cor(q_sorts[,,"before"],method="spearman")))`, ranging up to a sizable `r max(abs(cor(q_sorts[,,"before"],method="pearson") - cor(q_sorts[,,"before"],method="spearman")))` for Spearman's and Kendall's, each compared to Pearson's.
<!-- TODO add table with all the differences -->
As the sensitivity analysis in the appendix shows, these differences in correlations can also filter down to the factor analysis, and may even bias the interpretation in a *systematic* way.
<!-- TODO add, link to sensitivity analysis -->
```{r coefficients-lipset, include=FALSE, eval = FALSE}
# this is just for the list email to show that results are also different for the lipset data
data(lipset)
max(abs(cor(lipset$ldata, method = "pearson") - cor(lipset$ldata, method = "spearman")))
max(abs(cor(lipset$ldata, method = "pearson") - cor(lipset$ldata, method = "kendall")))
lipset$spearman <- qmethod(
dataset = lipset$ldata,
nfactors = 3,
rotation = "varimax",
forced = TRUE
, cor.method = "spearman"
)
lipset$pearson <- qmethod(
dataset = lipset$ldata,
nfactors = 3,
rotation = "varimax",
forced = TRUE
, cor.method = "pearson"
)
```
<!--TODO Make note on decision tree why verbosity is important: it's a combinatorial explosion, and I don't want to waste my or the reader's tiem datamining -->
Aside from Brown's here falsified *empirical* claim that "within the factor-analytic framework [...] the interval-ordinal distinction is of no importance" [-@Brown1980: 289], there appears to be no *methodological* literature to justify a choice of coefficients on Q-epistemological or ontological grounds.
The choice between interval and ordinal valuations in Q-sorts seems to turn on the status of the Q distribution.
On the one hand, we can seek to ascribe *positive* status to the Q distribution, that is, assume that there is such a thing as a *true* distribution on any given set of items for any given respondent, and that this distribution can be measured.
The simplest way to go about this may be to let people sort items freely into a fixed number of bins.
Given the relatively great number of bins (15 in this case), it seems reasonable to treat distances between adjacent bins as meaningful, because respondents were able to freely project their subjectivity, however formed.
Alternatively, one can assume that *any* respondent's feelings towards *any set of items* are normally distributed as a matter of ex-ante empirical finding.
Assuming that every person has one $-7$ item, two $-6$ items, and so forth under the bell curve, it is reasonable to assume same and meaningful distances between equal bins slicing up the (continuous!) normal distribution.
This latter assumption is, of course, quite heroic, especially considering that P-sets of items cannot be random sampled and any given set of items will likely include idiosyncracies.
I am not aware that anyone in the Q literature would make, nor provide evidence for this latter assumption, though few researchers have implemented free distributions.
On the other hand, we can ascribe *epistemological* status to the Q distribution, that is, treat it "as a formal model in the logic-of-science sense and in terms of which persons are instructed to present their points of views", in Brown's characteristically cogent words (2015, via email on Qlistserv).
<!-- TODO add email as referenc -->
Oddly --- though not unconvincingly --- this is then an *operational* definition of the *distribution* to *operantly* measure the structure of subjective viewpoints (Stephenson 1968: 501), to borrow a popular dichotomony in Q circles.
<!-- TODO Fix citation, as cited in Watts Stenner 684 -->
The quasi-normal Q distribution is *defined* by the researcher as the terms in which respondents may express the concept of subjectivity, much as R researchers define items based on some pre-existing concept.
This latter view of the distribution underpins most of Q research, though not always as explicitly as in Brown's writing.
The case for the forced-distribution-as-model is twofold.
<!-- TODO link here to an earlier discussion of the forced distribution -->
First, the forced distribution strongly single-centers all the *items*, a pre-condition for a transposed factor analysis: under a forced curve, all items are always ranked relative to the same reference point of all other items. ^[Arguably, items may still be considered single-centered if under a free distribution, as long as they are thoroughly weighted at the same time.]
<!-- TODO note here that a free coordinate system, where people have to wrestle with the relative positions of items might do this trick! -->
Second, the forced distribution extracts "hidden" preferences by making Q sorters make the same, tightly circumscribed decisions: for example, people cannot place more than one item in the leftmost bin.
A complete rank ordering might be the logical --- if impractical --- solution to reveal all hidden preferences: in a complete rank order, no ambiguities are left.
<!-- TODO is "complete" rank order the correct term? -->
<!-- TODO make this whol thing a GH issue, I can make a paper out of this -->
<!-- TODO there is a obvious link between all of this and vNM consistency -->
The former *positive* view on the Q distribution might straightforwardly justify cardinal valuation of Q-sorts.
Stephen Brown argues that cardinal valuation can be maintained for the latter, *epistemological* view on the Q distribution, too, or at least that "the equality of intervals along the Q-sort range [...] is no more subject to proof than it is to disproof, which is common to models." (Brown 2015, email to Qlistserv)
<!-- TODO add link or something, proper reference -->
It seems to me though, that the model implied here conceives of Q-sorts as *ordinal* valuations, and that data should be treated respectively.
To operationally "reveal the Q sorters' preference (as opposed to their likes)" by forcing people to decide *between* items is to negate meaningful distances (ibid.).
Stephen Brown may be correct that "preferences" and "likes" cannot be observed at the same time, akin to the quantum principle of complementarity (Brown's analogy).
However, this *same* complementarity holds between ordinal and cardinal valuations.
If preferences are to be revealed in terms of *choices* constrained by the measuring mechanism, then that is an *ordinal* measurement, without meaningful distances.
Meaningful distances require that respondents be arbitrarily free in their choices, including, for example, all items in one column, with no distinctions enforced.
An ordinal interpretation of data also dovetails with the related, relentness instructions to respondents that they worry only about the *relative* positioning of items.
<!-- TODO find citation -->
Readers may wonder, at this point, what could possibly warrant this methodological diversion in the context of the `Keyneson` data analysis.
Problematically, the correlation matrix for `Keyneson` looks not just different, depending on the chosen coefficient, but in a systematic way.
This bias stems from the different weighting of similarly (and dissimilarly) sorted items in the *extremes* of the distribution.
Recall that Pearson's $\rho$ squares the covariance, giving great weight to whatever similarity people's viewpoints express in their tails.
By comparison, Spearman's $\rho$ offsets this quadratic weighting partly by "spreading out" the center of the distribution:
for example, in Spearman's ranks, the difference between the (raw, Pearson's) $0$ and the $1$ column is $10.5$, whereas the difference between the $6$ and the $7$ columns is only $1$.
As an illustration, Spearman's $\rho$ assumes that respondents ordered the items *side to side*, with equal signs pasted where they were indifferent (tied), instead of *on top of one another* in equal-width bins.
Because the number of ties in the center is, by virtue of the forced distribution, greater than the number of ties towards the extremes, differences in valuation towards the center occupy a greater range.
<!-- TODO Make below footnote -->
For example, two equally ranked items at ($+7$) and ($+1$), will contribute `r (77-39)^2` and `r (49.5-39)^2`, respectively, at a ratio of and two equally ranked items at a ratio of `r ((77-39)^2)/((49.5-39)^2)` to 1 in Spearman's ranks, but `r 7^2` and `r 1^2` at a ratio of `r 49` to 1.
In short: similarities and differences between Q-sorts towards their middle get greater thrift with Spearman's $\rho$ compared to Pearson's.
Brown, as previously noted, suggests that the quadratic weighting of differences and similarities towards the extremes is warranted because respondents feel strongest about these items.
<!-- FIXME find citation -->
That may be so, but it implies an otherwise rejected *positive* assumption on the Q distribution, namely, that all respondents *do* feel *much* stronger about items placed in the extremes, and *by the same amount*.
*Absent* such a positive assumption, the inordinate influence of covariance in the extremes can conspire to introduce bias as a function of the *given* Q-set.
Consider that items may --- especially when they are not gathered "in the wild", but crafted, as for `Keyneson` --- differ in the starkness of their wording.
For example, `pro-socialism` might have been formulated *without* reference to the informative, but tainted term "planned economy". ^[The German "Planwirtschaft" carries negative associations with the East German economic malaise, and general bmismanagement. An equally tainted term for Americans may be "socialism".]
Consider also that *some* (and only *one*) item has to fill the extreme position, irrespective of whether a participant really feels that much stronger about the last item.
It may then be that starkly worded items tend to occupy these extreme positions, greatly load the correlation matrix and downstream factor analysis.
In the extreme case, the inclusion (or exclusion) of a divisive item such as `pro-socialism` (greatest standard deviation at `r sd(q_sorts["pro-socialism",,"before"])`) may produce so much covariance as to become something of an "anchor item" for a factor.
The issue here is not that some items "anchor" a factor more than others (that is very much the point of distinguishing statements),
<!-- TODO add footnote to QDC part -->
nor that some items are worded in a different style (that is very much the point of the factor interpretation).
The issue is that *absent* a *positive* view on the Q distribution --- which we have rejected --- the *divisiveness* of some items may crowd out other, "milder" patterns of covariance towards the middle of the distribution, even trivialize factors in an extreme scenario, all for no good reason.
Recall that without a positive distributive assumption, we have no reason to assume that the difference between, say $6$ and $7$ is the same as between $0$ and $1$.
We cannot consequently know in any absolute amount how much more or less strongly people felt about more or less divisive items: the weighting of the (mildly worded) middle covariances to the (starkly worded) extreme covariances is, becomes *arbitrary*.
Q methodology with a forced distribution has thrust us into a no-mans-land between interval and ordinal data, inhospitable to sound statistics.
If we assume interval scaling, and use Pearson's $\rho$, covariances in the extremes are weighted more, though the magnitude of that weighting is suspect, for the above-mentioned reasons.
If we assume ordinal scaling, and use Spearman's $\rho$, covariances in the extremes are weighted less, though the magnitude of that weighting is suspect, too.
Recall that the "spreading out" occurs as a function of ties within the higher columns towards the middle of the distribution.
These ties, however, are *also* enforced by the method, and did not arise organically.
Ideally, Q method would produce a thoroughly grounded clarification of how covariances across the (forced) distributions should be appropriately weighted.
Unfortunately, at least the outspoken members of the Q listserv appear untroubled by this supposed conundrum.
Bob Braswell (2015, email on Qlist) suggests that because the downstream analyses require Pearson's "the burden of proof is on someone who would like to substitute another correlation coefficient".
Braswell and Brown also doubt differences in coefficients will be consequential, and suggest any downstream effects may be counteracted by judgmental flagging and rotation.
In a way, these skeptics are right: even a (rejected) uncertainty on the choice of correlation coefficient does not invalidate Q Methodology, and final factors are likely to share a great family resemblance, as also indicated by the robustness analysis in the appendix.
<!-- TODO add link to appendix -->
Still, this ambiguity on coefficients affects Q methodology at a fundamental level, carrying on through all downstream analyses, and even small differences should matter for an avowedly *scientific* study of subjectivity.
The troublingly common resort to abductive judgment in the face of muddied statistics risks loosening the tight positive bounds placed on researcher's interpretation ever mover: if, *in addition* to the rotation method, the flagging, the number of factors and the extraction method, even the correlation coefficient were up for grabs, researchers would enjoy considerable latitude in specifying their models --- all before the supposedly bounded qualitative interpretation has begun.
These and other methodological challenges are often met by reframing them in terms of (outsider) R and (insider) Q approaches, even where, as in this case, the conundrum is thoroughly grounded in Q, possibly betraying an insular retreat of the method.
Alternatively, Q method could take *either* ordinal or cardinal valuations in Q sorts to their respective logical conclusions:
complete rank orders and non-parametric statistics in the former, freely distributed sorts in the latter.
Such innovations would require, among other things, thorough validation, and are clearly beyond the purview of this research.
Among the imperfect --- because arbitrary --- choices, Spearman's $\rho$ appears to be the *conservative* choice, and for that reason, will be used in the following.
Spearman's stresses the middle of the distribution, an area where respondents feel less strongly, and less clearly about placed items.
Any confusion or indifference about these items should result in *random* placement around the mean, and therefore, cancel out in the correlation coefficient.
Overall correlations may be *lower* (they are not at `r sum(cor(q_sorts[,,"before"],method="spearman"))` and `r sum(cor(q_sorts[,,"before"],method="pearson"))`, respectively) , or *less patterned*, and resulting factors are less likely to be superficially anchored by some agreement in the extreme.
<!-- TODO add data for factor analysis showing that Spearman's is, in fact, harder; that's also the robustness analysis. -->
<!-- TODO fix this quote in the above on approximately interval sth
@Bartholomew-Steele-etal-2011:
> 245: say that interval treatment for six or seven categories is often done.
> Still, they warn on 245: there *will* be biases (more evidence in section 9.5) -->
#### (Linear) Association vs. Probability
Assuming that the present Q-sort data best be treated as ordinal scaled, the question arises how their correlation coefficients can be appropriately and meaningfully summarized into ideal viewpoints.
Q methodology, ordinarily relying on Pearson's $\rho$ has it easy: as measure of *linear* association it lends itself well to the matrix algebra of factor or principal components analysis.
The case is more complicated for rank order statistics.
Fabrigar and Wegener rule out exploratory factor analysis for ordinal data, suggesting [-@Fabrigar-Wegener-2012: 96].
Instead of shoe-horning rank data into factor analysis , Bartholomew, Stew et al. recommend new "model-based methods" [-@Bartholomew-Steele-etal-2011: 45].
As indicated in the above, taking an ordinal view on Q-sorts to its statistical conclusion might be worthwhile, but the required re-validation and epistemological foundation is untenable here.
Luckily, other statisticians are more optimistic about factor-analyzing ordinal data.
Basilevsky recommends "*euclidian* measures such as Sperman's (sic!) $\rho$ [...]" over Kendall's $\tau$ *probability*, because the latter obscures local differences and yields no meaningful factor analytic result [-@Basilevsky-1994: 512, emphasis added].
Fittingly, Basilevsky explicitly validates a Spearman's-based principal-components analysis of an "n x n matrix [...] *among the observers*", though he might not have had Q methodology in mind [-@Basilevsky-1994: 515, emphasis added].
He concludes for Spearman's that "if intrinsic continuity can be assumed either as a working hypothesis or by invoking a priori theoretical reasoning, most factor models carry through just as if the sample had been taken directly from a continuous population" [[-@Basilevsky-1994: 518].
That bodes well for this analysis.
Intrinsic continuity is customarily assumed for "six or seven categories" [@Bartholomew-Steele-etal-2011: 245], and should be defensible for 15 categories.
This remains very much an "*ad hoc* method[]" for treating Q-sort data [@Bartholomew-Steele-etal-2011: 45, emphasis in original], but it is a conservative one.
At best, analyses based on Spearman's $\rho$ dovetail with Brown's *operational* definition of subjectivity as Q-sorts under a quasi-normal distribution, implementing an appropriate statistic for fine-grained, but ordinal data.
At worst, Spearman's-based analyses fall short off the ordinal scaling of Q-sorts, and merely effect a monotone transformation of would-be cardinal data, emphasizing covariances towards the center of the distribution.
<!-- TODO maybe later add visualization of all scatterplots? -->
### Correlation Matrix
The simplest summary statistic in lieu of descriptives we are left with is the correlation matrix.
`Qmethod` --- as other Q software, does not report the correlation matrix, but the below correlation matrix is produced calling the same function also used in later analyses, a Spearman's $\rho$ correlation coefficient.
The correlation matrix is characteristically complex, and hard to interpret --- which is probably why most Q studies, especially with more participants, do not report it.
```{r cor-viz-before, echo=FALSE}
cor <- array(data = NA, dim = c(ncol(q_sorts), ncol(q_sorts), dim(q_sorts)[3]), dimnames = list(colnames(q_sorts), colnames(q_sorts), dimnames(q_sorts)[[3]]))
cor[,,"before"] <- cor(q_sorts[,,"before"], method = "spearman") # make correlations
cor[,,"after"] <- cor(q_sorts[,,"after"], method = "spearman") # make correlations
q.corrplot(cor[,,"before"])
```
Recall, again, that the correlations in the above matrix are between *people-variables* across *item-cases*.
The below plot illustrates this for the strongest positive, the strongest negative, the weakest and a self-correlation.
<!-- TODO(maxheld83) this should probably go someplace else, maybe way above in the
cor discussion. -->
<!-- TODO(maxheld83) Add footnote that jitter was added to avoid overplotting. -->
```{r cor-example-plot, include=FALSE, eval=FALSE}
cor.ex <- vector("list", 4) # create empty list
names(cor.ex) <- c("positive", "negative", "null", "identity")
cor.ex$positive <- rownames(which(cor[,,"before"] == max(cor[,,"before"][cor[,,"before"] < 1]), arr.ind = TRUE)) # find the highest positive correlation pair
cor.ex$negative <- rownames(which(cor[,,"before"] == min(cor[,,"before"][cor[,,"before"] < 1]), arr.ind = TRUE)) # find the highest negative correlation pair
cor.ex$null <- rownames(which(cor[,,"before"] == min(abs(cor[,,"before"][cor[,,"before"] < 1])), arr.ind = TRUE)) # find the lowest correlation
cor.ex$identity <- c(rep(x = cor.ex$null[2], times = 2)) # identity is just one with him/herself
labels <- row.names(q_sorts) # make labels vector
labels[-sample(x = 1:length(row.names(q_sorts)), size = 3, replace = FALSE )] <- NA # take a sample otherwise too much overplotting
cor.ex.plots <- cor.ex # make empty list with vec.example names
g <- NULL
for (i in names(cor.ex)) {
g <- ggplot(data = as.data.frame(q_sorts[,,"before"]), mapping = aes_string(x = cor.ex[[i]][1], y = cor.ex[[i]][2])) # need aes_string because vec.examples includes strings (unhelpfully)
g <- g + geom_point(position = "jitter") # must jitter b/c overplotting, alpha is bad
g <- g + geom_smooth(method = "lm") # add regression line
g <- g + geom_text(mapping = aes(label = labels), hjust = 0, vjust = 0) # ad sampled labels
g <- g + ggtitle(i) # add titles
g <- g + coord_fixed() # make aspect ratio 1
# TODO(maxheld83) color the plots as the respective fields in the cor matrix
# TODO(maxheld8) cut superfluous space on the axes
cor.ex.plots[[i]] <- g # write out plot
}
n <- length(cor.ex.plots) # number of plots
nCol <- floor(sqrt(n)) # number of cols
do.call("grid.arrange", c(cor.ex.plots, ncol=nCol))
#TODO(maxheld83) maybe add the r value here?
```
To protect the anonymity of participants, as discussed elsewhere, I cannot provide additional details to contextualize individually high or low correlations between Q sorters.
Suffice it to say that the correlations broadly track views expressed at the conference, and, to a lesser extent, age and education.
Still, some general observations are in order.
Correlations do not go especially high, with a maximum of `r max(cor[,,"before"][cor[,,"before"]!=1])`, though merely rough similarities may be expected with a diverse and complicated Q-set as `keyneson`.
Recall that some participants reported problems sorting several items: the resulting (random) error might prevent greater correlations (greater overall, and more bifurcated correlations *after* the conference appear to bear this out.)
More remarkable is the fact that while there are several correlations close to 0 --- quite rare in Q ---, there are no substantial *negative* correlations.
Recall that the Q-set was designed for "reasonable people to disagree on", so why did respondents *not* disagree more?
One the hand, the absence of negative correlations might indicate that people do not think in strictly *opposite* ways about taxation --- or the `keyneson` items, at any rate.
Instead, people might think *differently* about it, expressing uncorrelated sorts.
The greatest difference in expressed subjectivity among the participants lies not between diametrically opposed patterns, but between different patterns.
On the other hand, the predominantly positive correlations might betray a lack of diversity, or presence of skew among the P set of participants already suspected from the conference report.
For example, in a broadly social-democratic crowd, with some socialists and conservatives sprinkled in, but no libertarian, disagreement is unlikely to be diametrical.
Neither a socialist, nor a conservative take the precisely the *opposite* viewpoint of a social democrat, though they may differ considerably.
While this lack of diversity may spell trouble for the quality of the sample, it is good news for the factor analysis.
<!-- TODO link to sections where you discuss the sample -->
Without negative correlations, it is unlikely that people will load on opposite ends of "polar factors" --- a great hassle for computing composite scores.
<!-- TODO make signifciance test, comment on why that is not relevant here the false-negative test comes later. -->
<!-- TODO make histogram of correlation coefficients -->
<!-- TODO notice the issue of positive manifolds (via Schmolck email); this seems to be given here, too -->
## Factor Extraction
Based on the above correlation matrix showing the pairwise similarities of all participants, Q methodology proceeds by summarizing these patterns into *fewer*, *shared* viewpoints.
By way of LEGO analogy, the correlation matrix superimposes any two buildings on top of another, displaying the degree of overlap between them.
<!-- TODO is that right? -->
To extract the *kinds* of objects that our participant builders have constructed, we must somehow further summarize this matrix to yield a shorter list of family resemblances among the LEGO objects.
Q methodology, in other words, rests on an *exploratory* data reduction technique.
### Which Data Reduction Technique
Just *which* of a number of such data reduction techniques would be a appropriate for Q *methodology* is, alas, again equally disputed and consequential.
The Q literature, unfortunately, offers no definitive resolution, but instead loosely refers to epistemological (metaphysical?) axioms ("indeterminacy") or invokes tradition (Stephenson, and increasingly, Brown) and *past* computational limitations (e.g. @Stephenson1935a: 18]).
<!-- phenomenal issue here: http://stats.stackexchange.com/questions/215404/is-there-factor-analysis-or-pca-for-ordinal-or-binary-data -->
<!-- MH TODO find sources for reverting to authority -->
Criticism of such Q orthodoxy is rising on the Qlistserv and elsewhere, most notably (and convincingly) by decade-long contributor (and `PQMethod` maintainer) Peter Schmolck as well as recently @Akthar-Danesh-2007, but no work has been published yet.
<!-- MH TODO link to Peter Schmolck's message, somehow include in Bib -->
Butler's critique, originally directed at factor analysis in general, oddly, still seems to apply to Q:
> "far too many [...] students complain that the only basis they have for choosing among the many types of factor analysis is the prestige of a given theorist or the fact that this or that "*maximin*" solution has often been used"
> [@Butler-1969: 252f], also see updated critique in [@Yates-1987: 6]
Absent well-justified choice in the literature, we then have to delve into the alternative methods and decide for this study on their merits.
Readers will be relieved that the purview of this investigation can be pragmatically limited to only three different extraction methods for exploratory factor analysis (EFA):
1. Principal Components Analysis (PCA), sometimes also known as the Hoteling method
2. Centroid Factor Analysis (CFA), also known as the Simple Summation Method
<!-- TODO add Thurstone 1947 in Brown 1980 as source for CFA, Burt1940 in ibid as source for simple summation -->
3. Principal Axis Factoring (PAF), also known as Common Factor Analysis
<!-- MH TODO add references for these! -->
There are many other multivariate techniques that may --- or may not --- hold promise for Q methodology, including cluster analysis (CA), multi-dimensional scaling (MDS), latent profile analysis (LPA) or structural equation modeling (SEM), as well as yet different *non*-parametric methods (latent trait and latent class analysis) [@Yates-1987: 2], as may be required by a stringently ordinal reading of Q sorts (see above).
Such a systematic reconstruction of Q *methodology* in light of currently available *techniques* may be worthwhile --- or not ---, but it is clearly beyond the scope of this study.
Of course, just because *alternative* methods may be difficult and untested does not mean that the above conventional techniques are adequate.
Limiting the discussion to the above techniques is still prudent, not only because they have been tested in Q studies, but also because they (especially PCA) are *relatively* simple mathematical operations with limited assumptions that can be readily defended.
Readers will also be relieved that only the chosen procedure will be presented in some mathematical detail; the *choice* among PCA, CFA and PAF can be well-justified on a verbal level.
The choice boils down to two features of the three methods: a supposed indeterminacy and a possibly implied latent variable model.
| Indeterminate | Determinate
-|----|----
Latent Variable Model | Centroid Factor Analysis (CFA) | Principal Axis Factoring (PAF)
Linear Composition | | Principal Components Analysis (PCA)
#### Latent Variable Model or Linear Composition
Centroid Factor Analysis (CFA) and Principal Axis Factoring (PAF) are both *latent variable models*.
They *estimate* a smaller set of latent variables (here: Q factors) from a larger set of *manifest* variables (here: individual Q sorts).
By the same token, these techniques allow for *error*: any particular *manifest* respondent's Q sort would be explained by her loadings on a set of *latent* factors, plus some residual error term, accounting for the particularities of that person's Q-sort.
As such, CFA and PAF summarize only the *common* variance among the sorts, but not the *specific* variance.
By contrast, Principal Components Analysis (PCA) considers all of the available variance, common *and* unique.
There is no concept of measurable manifest and underlying latent variables in PCA: as will be described later, the principal components are merely a compressed "vector recipe" to re-expand to the original correlation matrix.
This difference between CFA and PFA on the hand and PCA on the other also reflects in the correlation matrix passed on to the data reduction technique.
Recall that the diagonal in a correlation matrix are all ones; by definition, a Q sort is perfectly correlated with itself.
PCA takes this original correlation matrix as the starting point, with all ones in the diagonal.
If expanded again, the summary produced by (a complete) PCA would yield the original data, including the idiosyncrasies of individual sorts.
CFA and PAF start from the same correlation matrix, but replace the diagonal with some measure of that sort's degree of *common-ness*, or shared variance with the other Q sorts.
PAF places the communality $h^2$ of a variable (here: Q-sort) in the diagonal, the squared factor loadings for any given Q-sort across all the factors, indicating the percentage of *total* variance of a given Q-sort explained by all the factors.
For CFA, Brown also suggests other diagonal substitutes, including estimated test-retest reliabilities which will be discussed later [-@Brown1980: 211].
<!-- MH TODO add link to later discussion of reliability -->
Brown --- as well as many orthodox Q methodologists --- recommend CFA for its latent variable model because as a "closed model" PCA "assumes no error" [-@Brown1980: 211].
Brown is of course correct that a person is unlikely to reproduce the same Q-sort after a day or two *exactly*, though it appears implausible that her subjectivity on a general topic such as taxation and the economy would have perceptively changed in that time frame.
It is less clear however, that such test-retest reliabilities --- or any other substitution on the diagonal --- would effectively identify anything that might be called *error*.
As Brown is quick to point elsewhere, there *is* no external standard of one's subjectivity, denying conventional notions of validity.
If a Q-sort at any one point in time *is* operant subjectivity, how would we know that any difference to a later Q-sort would be in error?
It may well be that such test-retest differences reflect spurious *fluctuations* in subjectivity, but any one snapshot of such variable subjectivity would still be accurate for that moment in time.
We might want to consider such reliabilities as *resolutions*, when interpreting the factor scores, or when comparing Q-sorts after some "treatment" (as is the case here), but for now, we want a summary model that also explains this supposed "noise" in a person's Q-sort.
<!-- MH TODO link to distinguishing statements -->
As such, we would want a summary model that also explains this supposedly "unreliable" component of a person's Q-sort. [^average-sorts]
[^average-sorts]: We may alternatively filter out these possible fluctuations by looking at *average* Q-sorts, but that would imply a different operationalization as *medium-term* subjectivity, would greatly complicate the procedure and might raise additional practical problems, including saturation and consistency effects.
<!-- TODO: find source -->
A latent-variable model, more generally, would appear to fit awkwardly into Q methodology.
Consider, by contrast, an exploratory factor analysis of burnout, a supposedly externally valid phenomenon that cannot be measured directly (example taken from @Field-Miles-2012: 750).
In such a study, we may legitimately be interested *only* in those variables that are highly correlated *among* burnout patients, such as motivation, sleeping patterns, self-worth and so on.
If and to the extent that the sampling frame (burnout) is externally valid --- a big if --- we can draw a meaningful distinction between common and specific variance.
If, for example, only some burnout patients report back pain, we may reasonably consider this to be specific variance related to another (skeletal-muscular) condition, and disregard it for the present model.
In the correlation matrix of variables, we may want to replace the diagonal entry for the back pain variable with something *less* than one, because we do not require a model that also accounts for levels of back pain.
The sampling frame "burnout patients" implies --- however justifiably --- a latent variable (burnout), that we expect behind all shared manifestations.
Conversely, we would only want to model patterns of variable correlations that are widely shared by our burnout patients, and would therefore greatly "discount" idiosyncratic variables such as back pain.
<!-- This is just for future reference, it's a good quote
> @Fabrigar-Wegener-2012 31
> "the common factor model was formulated as a general mathematical framework for understanding the structure of correlations among measured variables.
> It postulated that correlations among measured variables can be explained by a relatively small set of latent constructs (common factors), and that each measured variable is a linear combination of these underlying common factors and a unique factor (comprised of a specific factor and random error).
> PCA differs from the common factor model in several notable ways (Wildama, 2007, see also Thomson, 1939)
> First, PCA was not originally designed to account for the structure of correlations among measured variables, but rather to reduce scores on a battery of measured variables to a smaller set of scores (i.e. principal components).
> This smaller set of scores are linear combinations of the original measured variable scores that retain as much information as possible (i.e., explain as much variance as possible) from the original measured variables.
> Thus, the primary goal of PCA is to account for the variances of measured variables rather than to explain the correlations (or covariances) among them.
> Similarly PCA was not designed with the intent that the principal components should be interpreted as directly corresponding to meaningful latent constructs.
> Rather the components simply represent efficient methods of capturing information in the measured variables (regardless of whether those measured variables rrepresent meaningful latent constructs).
> " -->
Recall, again, that in Q methodology, items are cases, and participants are variables.
Q methodology has a *case* sampling frame, too, but these cases are *items* in Q: it is assumed that all items relate to taxation and the economy. [^tax-sampling-frame]
[^tax-sampling-frame]: As discussed elsewhere, this sampling frame also needs to correspond to *one* externally valid concept, or be *commensurate* in Q terminology.
A Q-sample of items on, say, cat personality and architecture may not make much sense.
It is not clear, however, what the equivalent to the back-pain scenario of a "stray" variable in Q might imply.
It is possible that some participants (as the variables) will have less in common with everyone else than others.
In this study the degree of common variance, provisionally measured as the average correlation per participant row (or columns) in the correlation matrix, ranges from `r min(apply(cor[,,"before"], 2, mean))` to `r max(apply(cor[,,"before"], 2, mean))`.
<!-- TODO add explanation what happens with stray items (high variance) in the current design -->
```{r sum-corr, echo=FALSE, include=FALSE}
# look at it provisionnally with only the avg cor
range(apply(cor[,,"before"], 2, mean))
summary(cor[,,"before"])
# look at the summary loadings of the full model later
#keyneson$before$loa
#apply(keyneson$before$loa, 1, sum)
#range(t(keyneson$before$loa)) # have to transpose
#summary(t(keyneson$before$loa)) # have to transpose
# TODO really implement this, include as results
```
Applying the latent-variable logic of an ordinary, items-as-*variables* factor analysis to Q, we might then conclude that the relatively uncommon Q-sorters introduce idiosyncratic specific variance, and ought to be discounted in the model.
At least in this study, such discounting does not make sense, because we are principally interested in *all* Q-sorters, qua being participating citizens.
Recall that when studying burnout --- but not back pain --- there are many variables that you are *not* interested in, even though it may not be clear at the outset what these are.
In contrast to a burnout study, there is *no* conceivable variable-participant that we would *not* be interested in.
Health symptoms may --- or may not --- have a bearing on burnout, but *all* citizens do have a bearing on common understandings of taxation and the economy. [^q-with-latent-var]
[^q-with-latent-var]: There may be Q studies with an explicit interest in only the factors shared by a subset of well-correlated participants, as might be the case for a study into hegemony.
I am not aware of any Q study explicitly espousing such a latent-variable model.
<!-- TODO to clarify, add table here with items, cases, variables, and what is common/specific -->
A latent-variable generally seems at odds with the scientific study of human subjectivity.
In contrast to many R-type exploratory factor analyses, per the abductive logic of Q, there *is* no preconception of what might delineate specific from common variance in substantive terms.
To the extent that all Q sorters participate in the same realm of communicability, and share in the same concourse --- as is assumed by the respective theories for any modern society --- we are equally interested in maverick and mainstream Q sorts.
<!-- TODO add sources for those theories -->
In fact, to discount the poorly correlated Q-sorts would be to (slightly) stack the deck in favor of the one straightforwardly *falsifiable* hypothesis that q methodology admits of: namely, that viewpoints are, in fact *shared* (see later discussion on the number of factors). [^why-not-discount-items]
<!-- TODO maybe reference here already the justification/discussion of flagging, which would appear to contradict this argument; I guess the rescue is that flagging is only to "clean up" the ideal points, not anything else. -->
[^why-not-discount-items]: Readers may object at this point, that in fact, Q methodology *does* imply a latent-variable model concerning the *items*.
Operant subjectivity on the (somewhat haphazardly) sampled items is taken to be *manifestly* representative of similar, *latently* shared viewpoints in a broader concourse.
One may, consequently, be inclined to discount those items that are less correlated with all other items.
Here, as often, the uncommon correlation matrix of Q implies an unintuitive logic.
Items, in spite of their verbal form, are *cases*, not variables --- latent or otherwise.
To discount Q-items at this point would be analogous to discounting participant-cases --- not variables --- in an R-type factor analysis.
Clearly, some of the sampled items may be more expressive of a shared viewpoints than others, but that distinction is relevant only later in the analysis, when shared, *ideal-type* viewpoints are interpreted.
By analogy to R-type factor analysis, some people may also be more expressive of some burnout-factors than others, but that becomes meaningful only *after* the factor extraction.
We might --- again, analogously --- be particularly interested in the *ideal-typical* burnout-cases, and interpret them in more detail, but only later.
<!-- TODO check kline 1994 in Watts/Stenner on PCA vs CFA -->
<!-- TODO read Harman 1976 in the above on PCA vs CF -->
#### In(determinacy)
Proponents of Centroid Factor Analysis (CFA) often cite the technique's supposed indeterminacy in favor of the method.
It might seem odd that a statistical procedure may be favored for *not* producing the same result, with the same data.
This unfamiliar criteria might be best understood in the historical context of factor analysis, as Peter Schmolck recently suggested.
<!-- TODO actually, it's sthis email by Schmolck https://listserv.kent.edu/cgi-bin/wa.exe?A2=q-method;5c40e976.1206 -->
<!-- TODO cite Peter schmolck email with McCloy, Metheny and Knott 1938 email -->
Different methods of factor analysis and rotation were originally developed to summarize differences in human personality and intelligence.