subjectivity.Rmd

# Deliberative Subjectivities {#subjectivity}

> He who knows only his own side of the case knows little of that. 
> His reasons may be good, and no one may have been able to refute them. 
> But if he is equally unable to refute the reasons on the opposite side, if he does not so much as know what they are, he has no ground for preferring either opinion [...]. 
> He must be able to hear them from persons who actually believe them [...] he must know them in their most plausible and persuasive form.
>
> -–- John Stuart @Mill-1959

<!-- 1. Factor analysis (Q method) of individual q-sorts to investigate metaconsensus.
2. Correlations between factor loadings of individuals to investigate intersubjective rationality (small n problem)
3. Secondary factor analysis of factor array, with individual factors as "cases" and items as variables.
4. Correlations between factor loadings and other gathered data (SES, etc.) (small n problem) -->

<!-- - reports correlations, factors from pre- and post-sort. -->
<!--    - *factor interpretation* in substantive terms of tax and the economy. -->

<!-- TODO decommodification factor = scale dependence
one way to read the decommodification factor is also that it kinda revolts against scales that are too big
because it's not that people didn't *understand* the issue of, say treating renters and homeowners alike
it's that they understood it, but still didn't want it
they just wanted to be left alone by the abstractions of Society -->

## Administration

<!--```{r administration, child='keyneson/keyneson.wiki/q-administration.md'}```-->
<!-- TODO MH: paste condition of instruction? -->
<!-- TODO MH include data-gathering here, too -->

<!-- Notice that some people complained about sorting several items, but that 0 is in fact an appropriate value.
For more on what 0 means to q, see Brown 1980 199, also note the flac I got on the correlation coefficient email on the meaning of 0. -->

>"Factor analysis (...) is concerned with a population of n individuals each of whom has been measured in m tests or other instruments.
>The (...) correlations for these m variables are subjected to (...) factor analysis.
>But this technique (...) can also be inverted.
>We may concern ourselves with a population of N different tests (or other items), each of which is measured or scaled relatively, by M individuals."
>-- Stephenson (1936a: 334)

- "factor analysis with the data table turned sideways"; persons are *variables* and traits/tests/statements/abilities are the *sample* or *population*.
- looks at 

## Data Import

All citizens as well as the researcher and the two moderators completed two Q sorts each, one at the beginning and one at the end of the conference.

At its heart, Q methodology involves factor analyses and similar data reduction techniques, procedures that are widely used and available in all general-purpose statistics programs.
However, Q methodology also requires some specialized operations, not easily accomplished in general-purpose programs.
The transposed correlation matrix, flagging participants and compositing weighted factor scores in particular, are hard or counter-intuitive to do in mainstream software.

Counter to many Q studies, this research features several conditions (before, after), groups of participants (citizens, moderators, researcher), types of items (values, beliefs, preferences) as well as an extended research question, all of which will need to be analyzed systematically.
Running and documenting all these variations will be next to impossible without programmatic extensibility of the software used.


## Import

```{r import-q-sorts, echo = FALSE, warning = FALSE}
q_sorts <- import.q.sorts(
  q.sorts.dir = "schumpersorts/qsorts/",
  q.set = q_set,
  q.distribution = q_distribution,
  conditions = c("before","after"),
  manual.lookup = as.matrix(
    read.csv(
      "keyneson/ids.csv",
      row.names=2
    )
  ),
  header = FALSE
)
names(dimnames(q_sorts)) <- c("items", "people", "conditions")
```


## Participant Feedback

```{r import-q-feedback, echo = FALSE}
#TODO all this importing stuff should be done together with exporting to dataverse in a convenient way, maybe as an R object after all
q_feedback <- import.q.feedback(
  q.feedback.dir = "schumpersorts/feedback/",
  q.sorts = q_sorts,
  q.set = q_set,
  manual.lookup = as.matrix(
    read.csv(
      "keyneson/ids.csv",
      row.names=2
    )
  )
)
```


## Missing Data

```{r dropped-participants, echo = FALSE}
#q_sorts <- q_sorts[ , !colnames(q_sorts) == "Wolfgang", ]  # delete researcher
q_sorts <- q_sorts[ , !colnames(q_sorts) == "Uwe", ]  # left conference for personal reasons
#q_feedback <- q_feedback[ , !colnames(q_feedback) == "Wolfgang", ]  # delete researcher
q_feedback <- q_feedback[ , !colnames(q_feedback) == "Uwe", ]  # incomplete
```

```{r rda export, eval = FALSE}
# this is because we needed it in another package
civicon_2014 <- NULL
library(tibble)
library(lettercase)
library(tibble)
lookup <- tibble(handle = make_names(rownames(lookup.table)), 
                 id_pretest = lookup.table[,1], 
                 id_westfluegel = lookup.table[,2])
civicon_2014$QItems <- list(q_set = make_names(rownames(q_set)), 
                            concourse = q_concourse, 
                            lookup = lookup)
rownames(civicon_2014$QItems$concourse) <- make_names(rownames(q_concourse))
class(civicon_2014$QItems) <- c("QItems", "list")
civicon_2014$qData <- list(sorts = q_sorts)
civicon_2014$grid <- c("-7" = 1,
                       "-6" = 1,
                       "-5" = 2,
                       "-4" = 4,
                       "-3" = 6,
                       "-2" = 9,
                       "-1" = 10,
                       "0" = 11,
                       "1" = 10,
                       "2" = 9,
                       "3" = 6,
                       "4" = 4,
                       "5" = 2,
                       "6" = 1,
                       "7" = 1)
storage.mode(civicon_2014$grid) <- "integer"
library(devtools)
use_data(civicon_2014, pkg = "../pensieve/", overwrite = TRUE, compress = "bzip2")
```

<!-- complete import of `r ncol(q_sorts)`, three participants -->


```{r real-names, echo = FALSE, eval = TRUE}
real_names <- FALSE  # this is for manually enabling renaming
if (real_names == TRUE & file.exists("../../Google Drive/CiviCon/Data/Codenames.csv")) {  # renaming works only if (private) file is available and is set to true
  codenames <- read.csv(file = "../../Google Drive/CiviCon/Data/Codenames.csv",header = TRUE)  # read in codenames
  for (name in colnames(q_sorts)) {  # loop over names in q_sorts
    colnames(q_sorts)[colnames(q_sorts)==name] <- as.character(codenames[codenames[,1]==name,2])  # assign original name
    colnames(q_feedback)[colnames(q_feedback)==name] <- as.character(codenames[codenames[,1]==name,2])  # assign original name
  }
  warning("This rendering includes real names. Do not publish or pass around!")
}
```


## Q Method Analysis

Results chapters in quantitative research do not usually recount and justify every mathematical operation from raw data to final interpretation.
In reporting mainstream statistics, say, a linear regression (OLM) much in the way of axioms and preconditions is often taken for granted, though perhaps, sometimes too much.
Powerful computers, confirmation biases and time pressure for writers and readers alike may conspire to occasionally stretch thin the argumentative link between elementary probability theory and the results drawn from it.
<!-- TODO kill, make easier? -->

Q methodology is different, and requires a more careful, patient treatment.
While its mathematical core --- different methods of exploratory factor analysis (EFA) --- are well-rehearsed in mainstream quantitative social research, its application to Q is still strange to many.
<!-- TODO add quotes to EFA books here -->

1.  *Transposed Data Matrix.*

    Because Q method *transposes* the conventional data matrix, positing *people* as *variables*, and *items* as *cases*, all of the downstream concepts in data reduction, from covariance to eigenvalues, take on a different, Q-specific meaning, even if the mathematics stay the same.
    <!-- TODO add brown quote for transposed -->
    For this reason alone, it will be worth tracking every operation and grounding it in the unfamiliar epistemology of Q.
    <!-- TODO note there are also more steps in Q; at the end you want item *scores* -->

2. *Marginalization and Controversy.*

    Almost 80 years after William @Stephenson1935 suggested this *inversion* of factor analysis in his letter to *Nature*, Q is still an exotic methodology.
    It sometimes invites scathing critiques [@Kampen-Tamas-2014], but more often, is outright ignored in mainstream outlets.
    <!-- TODO cite lack in textbooks etc in tamas kampen -->
    As the dynamics of marginalization go, the community of Q researchers, may, on occasion, have turned insular --- though not into the "Church of Q" @Tamas-Kampen-2015 fear.
    <!-- TODO maybe add literatures for marginalization dynamics: groupthink, ingroup/outgroup dynamics, institutional inertia, path dependency -->
    Q methodology may have sometimes shied away from exposing itself from rigorous criticism and disruptive trends in mainstream social research, though probably often out of sheer frustration with persistent misunderstandings, and because of genuine disagreement.
    <!-- TODO add more detail on the problems of the statistics - they're out of date! -->
    <!-- TODO add examples -->
    In epistemological squabbles, too, crazy people [^freakshow] sometimes have real enemies --- or opponents, at any rate.
    <!-- TODO to name these opponents, cite from BRown's concluding words in his book -->

    Happily, within Q, too, considerable disagreement remains (for example, on the appropriate factor extraction technique), though legacy procedures and programs sometimes hamper intellectual progress.
    Unfortunately, misunderstanding and marginalization is sometimes compounded by a lack of deep statistical understanding, though rarely digressing into glib ignorance of "technicalities" or outright mistakes ("varimax rotation maximizes variation", in the otherwise fine textbook introduction by @Watts2012).
    <!-- TODO find precise quotes -->
    <!-- FIXME make sure this is, in fact a mistake with the varimax -->
    What may easily appear as articles of faith ("thou shalt not use automatic rotation"), are in fact thoroughly argued reservations, based on a deep mathematical and epistemological understanding, as evidenced in both the works of William Stephenson and Stephen Brown, though at least some of this orthodoxy now appears obsolete.
    <!-- TODO reword this, based on recent experience especially with "centroid myth" -->
    <!-- TODO add citations -->
    These norms become hermetically sealed ideologies ("reliability does not apply to subjectivity") only when detached from their epistemological underpinnings, as, I suspect, would be the case for other methodologies.
    <!-- TODO find source, argue that in fact, as per Brown, factor reliability DOES matter -->

3.  *Experimental Design*
  <!-- TODO fill in! -->

4.  *Literate Statistics.*

    Not just when, or if, a method is new or controversial, as in Q, mathematical operations and argumentative prose should not be let to diverge too widely.
    The intuition of literate programming holds for statistics, too: we do not *use* unintelligible algebra to *give* us a comprehensible result to be explained.
    *Neither* mathematics nor prose are *primary*, algebra and language are *both* the explaining intellect at work, just in a different registrar.

[^freakshow]:
  A prominent researcher recently described the annual meeting of the [International Society for the Scientific Study of Subjectivity](http://qmethod.org/issss) as an endearing "freak show".
  With heirs of the methods founder, William Stephenson, occasionally in attendance, it sure sounds different from your average disinterested get-together.

I suggest then, that at least in the following, initial analysis, some verbosity has its virtue.
Readers will be relieved to learn that I intend not to reproduce the entire mathematical apparatus of Q:
Steven Brown has accomplished that, and much more in his authoritative, insight-laced "Political Subjectivity" to which my own work is deeply indebted.
<!-- TODO make this better, humbler -->
Mathematics have their place, but formulae alone, in spite of their veneer of rigor, need analogy, intuition and impression to cover the distance to the deliberative subjectities on taxation and the economy under study here --- which themselves are not mathematized in this dissertation, nor much convincingly in the broader field, and maybe never will.
<!-- TODO find better link to previous chapter -->
One may add, that if science is also bound to a discourse ethic of reaching understanding, it too needs to relate its abstractions to the human lifeworld, from which all meaning eminates.
<!-- TODO make this better or kill it -->
<!-- TODO add Habermas reference? -->

At the same time, some of the details of this chapter may raise the ire of some established Q methodologists who never tire of stressing the substantive analysis over statistical sophistication.
They are right: the key to understanding human subjectivity lies in iterative abduction, in the thorough going back and forth between informed hunches about *what might make sense*, and what the data will bear out.

The following statistical groundwork is a worthwhile, but merely *necessary* --- not *sufficient* --- condition to a *scientific* study of subjectivity (emphasis added, though intended by the [ISSSS](International Society for the Scientific Study of Subjectivity)).

The following pages will hopefully dissuade the Q skeptics, and take along newcomers to the method.

To make sense --- as we must --- of the shared viewpoints on taxation and the economy among the participants of the first CiviCon Citizen Conference, we must first be sure what they are, and if they are, in fact, shared.


## (No) Descriptives {#descriptives}

It is common to reproduce descriptive statistics before turning to more advanced analyses.
Common summary statistics often include measures of central tendency (mean, median) and dispersion (standard deviation, range) on *variables* across *cases*.
As mentioned before, in Q methodology, variables are *people*, and cases are *items*.
The mean rank alloted by Q sorters (as the *variables*) across all items (as the *cases*), $\overline{x} = \frac{\sum{x}}{N}$, is then, unsurprisingly, `r unique(apply(q_sorts[,,"before"], 2, mean))` for *all* `r ncol(q_sorts[,,"before"])` participants.
The *average* position of items has to be zero, because the forced distribution was symmetrical: since participants placed an equal number of cards at `-1 / 1`, `-2 / 2` and so on, item ranks will always cancel out.
The most frequent *median* value, also by definition, will be `r unique(apply(q_sorts[,,"before"], 2, median))` for everyone, because the center value allowed the greatest number of (`r max(q_distribution)`) cards per the forced distribution.
By the same token, the *range* extends from `r unique(apply(q_sorts[,,"before"], 2, min))` to `r unique(apply(q_sorts[,,"before"], 2, max))` for everyone, because those extreme values are defined by the forced distribution.
The *standard deviation*, too, is the same for everyone.
$s_N = \sqrt{\frac{1}{N} \sum_{i=1}^N (x_i - \overline{x})^2}$, gives the square root of the average squared differences of item scores for a participant, `r unique(apply(q_sorts[,,"before"], 2, pop.sd))` for everyone, because the "spread" of items across the mean is defined by the forced distribution.

Conventional *R* type descriptives are, then, quite  meaningless [Nahinsky 1965 as cited in @Brown1980, 265], because they are the same for all participants and are defined ex-ante by the forced distribution [^free-distro-descr].
<!-- TODO fix nahinsky quote -->

[^free-distro-descr]: Descriptives are meaningless *only* if the Q distribution is *forced*, that is, the same for everyone.
  If participants are allowed to sort into differently shaped --- but symmetric --- distributions, the standard deviation $s$ may indicate the degree of polarization in viewpoints that different people hold, that is, about how many items they feel extremely.
  If, additionally, participants are allowed to sort into asymmetric distributions, the mean $\overline{x}$ may indicate the overall tendency of people to agree with the sorted statements.
  "Free" distributions are a frequently discussed, if rarely used possibility for Q methodology.
  <!-- TODO MH add source on free/forced -->

```{r descriptives, echo = FALSE, include = FALSE}
apply(q_sorts[,,"before"],1,mean)[order(apply(q_sorts[,,"before"],1,mean))]  # sort by mean
apply(q_sorts[,,"before"],1,pop.sd)[order(apply(q_sorts[,,"before"],1,pop.sd))]  # sort by sd

person_cor <- cor(t(q_sorts[,,"before"]))  # make cor matrix across persons
person_cor[upper.tri(person_cor, diag=TRUE)] <- NA  # kill upper triangle
person_cor_m <- reshape2::melt(person_cor, na.rm=TRUE)  # melt it
person_cor_m[with(person_cor_m, order(-value)), ]  # order it
person_cor_m[which.max(person_cor_m$value),]  # pos max
person_cor_m[which.min(person_cor_m$value),]  # neg max
person_cor_m[which.min(abs(person_cor_m$value)),]  # abs min
```

In search for descriptives, one may be tempted then to revert back to the R way of looking at data, and to treat Q-sorted *items* as variables, and Q-sorting people as *cases*.
One may ask, for example, which items were, on average of all participants, rated
the highest (``r rownames(q_sorts)[which.max(apply(q_sorts[,,"before"],1,mean))]`` at `r max(apply(q_sorts[,,"before"],1,mean))`),
the lowest (``r rownames(q_sorts)[which.min(apply(q_sorts[,,"before"],1,mean))]`` at `r min(apply(q_sorts[,,"before"],1,mean))`),
which items were dispersed the most (``r rownames(q_sorts)[which.max(apply(q_sorts[,,"before"],1,pop.sd))]`` at `r max(apply(q_sorts[,,"before"],1,pop.sd))`),
the least (``r rownames(q_sorts)[which.min(apply(q_sorts[,,"before"],1,pop.sd))]`` at `r min(apply(q_sorts[,,"before"],1,pop.sd))`),
or --- most offending to Q methodologists, as well as spurious ---, which *items* were correlated
the most positively (``r person_cor_m[which.max(person_cor_m$value),c(1,2)]`` at `r max(person_cor_m$value)`),
the most negatively (``r person_cor_m[which.min(person_cor_m$value),c(1,2)]`` at `r min(person_cor_m$value)`),
and the least (``r person_cor_m[which.min(abs(person_cor_m$value)),c(1,2)]`` at `r min(abs(person_cor_m$value))`).
This kind of exploration is fascinating, and it invites seemingly inductive hypotheses:
Do people respond most strongly to `poll-tax` and `simple-tax`, because these --- and other, supposedly similar items --- are easy to comprehend and relate to?
Do people respond quite differently to `pro-socialism`, and quite similarly to `land-value-tax-resource`, because the former obviously aligns with political identities, while the latter does not?
Do people feel similarly about `exchange-value` and `natural-market-order` because of a pervasive anti-market bias,
<!-- TODO add caplan reference -->
and very dissimilar about a `wealth-tax` and a `corporate-income-tax`, because they fall for a flypaper-theory of tax incidence?
<!-- TODO add mccaffery reference -->

While these survey-type approaches to the data are intriguing, they are inadequate for the data gathered here.
It may appear arbitrary to categorically rule out a range of broadly-applicable techniques, let alone summary statistics on the grounds that the data were gathered for a different *purpose*.
For example, Likert-scaled data collected for a (R-type) factor analysis may, if conditions apply, be subjected to a regression analysis.
Q, however, is not just another statistical operation, it is a *methodology*, and while the gathered data bear a superficial resemblance to R methodological research, "[t]here never was a single matrix of scores to which *both* R and Q apply" [@Stephenson1935a: 15, emphasis in original].
<!-- TODO find, verify original source. Browns 1980: 347 bibliography is unclear -->
<!-- TODO @Stephenson1952 483 on why Q and R are absolutely *not* trivially similar -->
This postulated incomparability applies to this study, too, and plays out in several ways:

 1. *Generalizeability & Small N Design.*

    In R, the *participants* are usually a *sample* of cases from a broader *population*, and the generalizeability of the results hinges on the quality of this ideally large, random selection.
    In Q, *participants* are the *variables*, on which supposedly shared, subjective viewpoints are measured.
    Sampling theory does not apply to Q, just as one would not ask *random* questions in a survey (or so one hopes).
    Instead, Q method requires researchers to broadly maximize the diversity of participants to capture all existing viewpoints.
    <!-- TODO add saturation sampling reference here -->
    Generalizeability in Q, if applicable at all, concerns the representativeness of the sampled *items* of a "population" of statements, though no straightforward sampling theory applies to this concourse either, as it is both infinite and non-discrete.
    The P-set of this study, the participants in the [CiviCon Citizen Conference](http://www.civicon.de), are self-selected and probably not representative of the broader population, though recruitment was inclusive and diverse.
    This flagrantly non-random sampling alone, however, may not yet rule out generalizeability to a broader population of *citizens willing to participate in deliberation*.
    As I argue elsewhere, both self-selection and diversity of point of views are crucial for meaningful, *concept-valid* deliberation, and such non-random sampling may therefore be *required* for even R-methodological research into the effects of deliberation.
    <!-- TODO add reference to concept-valid deliberation elsewhere -->
    Given the haphazard recruitment and great demands placed on citizens, the group of CiviCon participants must probably be considered excessively non-random even by the more charitable standards of deliberation, but an even greater problem lies in the small number of participants.
    <!-- TODO add footnote about bias of sample -->
    A "sample" of $N=`r ncol(q_sorts[,,"before"])`$ people (including 2 moderating "confederates") simply does not admit of generalizations toward a broader population, even of would-be deliberators.
    R statistics, even the cursory summary statistics in the above, implicitly rely on this kind of generalizeability:
    If we concede that, say, the low average score of ``r rownames(q_sorts)[which.min(apply(q_sorts[,,"before"],1,mean))]`` at `r min(apply(q_sorts[,,"before"],1,mean))` is, to a large extent an artifact of the people who happened to show up at a locally-advertised, five-day sleepover conference for EUR 50 compensation, it becomes dubious why we should care about this factoid at all --- other than to characterize the *bias* of the group ^[Though that comparison, too, would require some representative baseline.].
    <!-- TODO actually THAT may be a good reason -->
    Given limited funds, and little experience the CiviCon Citizen Conference was *designed* for a small group of people, and this constraint was --- admittedly --- operational in the choice of Q methodology.
    <!-- FIXME wrong verb, it's not operational ... -->
    <!-- TODO add footnote on the large-n/long tradeoff -->

    The small number of participants (even by Q standards) restricts the following Q analysis, too, but in a different, provisionally acceptable way.
    <!-- FIXME maybe this belongs partly somewhere else? -->
    Recall that in Q, people serve as *variables*.
    `r ncol(q_sorts[,,"before"])` variables will still be considered quite crude to extract several latent concepts.
    <!-- FIXME uh-oh you say elsewhere that there are no latent concepts -->
    Consider, for example, the data reduction in Inglehart's and Welzel's human development theory: to arrive at *two* latent value concepts, they condense *35* variables [-@InglehartWelzel-2005-aa].
    <!-- FIXME find out correct number of vars -->
    The issue here, however, is one of *resolution*, not generalizeability.
    With relatively few people-variables, Q method will be able to extract only a few, blurred shared viewpoints.
    But the exercise is not moot: given a decent Q-sample of items, potential additional participants are exceedingly unlikely to render the factors extracted here null and void --- that scenario is later rejected as a Null-Hypothesis of sorts.
    <!-- TODO uh, that is maybe an overstatement - the factors might look quite different, though it is unlikely that there would be *no factors* -->
    <!-- TODO add reference to why that is in fact what later tests test -->
    By contrast, adding more people to this study may very well render the high average position of, say,  ``r rownames(q_sorts)[which.max(apply(q_sorts[,,"before"],1,mean))]`` at `r max(apply(q_sorts[,,"before"],1,mean))` a product of randomness or bias (a high standard deviation of `r pop.sd(q_sorts["simple-tax",,"before"])` would appear to bear this out).

    The now familiar analogy of LEGO bricks serves to illustrate this difference in concepts of generalizeability once more.
    Tasking, say, 15 people to build something out of a given set of LEGO bricks may seem reasonable to get a preliminary idea of the *kinds* of objects (cars, houses) that people build --- though we will probably miss out on some rarer constellations (spaceships?) and nuances of existing patterns (convertibles?).
    Adding more people may increase the resolution to tease out these details, but it's unlikely that something as basic as "car-like", or "house-like" will completely disappear.
    By contrast, again, it seems much *less* reasonable to take the ratings of *individual* LEGO bricks by these 15 people as anything but random flukes: there are bound to be some people who like red bricks, more so if the study listing displayed red bricks.

  2. *Holism & Non-Independence.*

    R methodological opinion research often proceeds by analysis, then synthesis; a construct is first dissected into its constituent parts (say, items), then reassembled into some composite (say, a scale or an index).
    Truth, in principle, flows from inter-individual differences on the smallest measurable unit of meaning.
    Inglehart's and Welzel's work with the World Values Survey, again, serves to illustrate [-@InglehartWelzel-2005-aa].
    Items are constructed with some view to a broader construct (say, civicness), and then narrowed down to number of smaller concepts and items.
    The reconstruction of latent concepts happens through a very deliberate, almost literate *synthesis*: you take $x$ units of item $A$, $y$ units of item $B$, standardize the result, and out comes a theory of human development to be subjected to confirmatory factor analysis.
    <!-- FIXME really need to go back to the source and figure this out. This is probably BS -->
    Here, too, the latent constructs (say, traditional vs secular-rational values), are of greater interest than any individual item, but the relationship is merely *additive*, or *triangulatory*, at best: you synthesize several variables to cancel out bias, and to get at a common concept that might unite them all, but these bigger pictures can never be more than the sum of their parts, let alone *different*.
    <!-- TODO also add some Brown quotes to all of this -->

    Q, by contrast, has a *holistic* outlook on human subjectivity.
    What matters are not the individual items, but there overall constellation, that is, how (groups of) participants value these items *relative* to one another.

    Summarizes Brown:

    > In moving from R to Q, a fundamental transformation takes place:
    > In R, one is normally dealing with objectively scorable traits which take meaning from the postulation of individual differences between persons, e.g. that individual $a$ has more of trait $A$ than does individual $b$;
    > in Q, one is dealing fundamentally with the individual's subjectivity which takes meaning in terms of the proposition that person $a$ values trait $A$ more than $B$.
    >
    > [@Brown1980: 19]

    These epistemological differences have practical research implications.

    In R, items are supposed to narrowly measure *one* concept and should not invite *multiple* interpretations.
    A typical item from the World Values Survey, `How would you feel about your daughter(son) moving in with a person from a different ethnicity?` was crafted to elicit a predefined, unambiguous scenario in the minds of respondents.
    <!-- TODO find example for/against double-barelled items in Q -->
    In Q, an item as open-ended as `labor-no-commodity` (``r q_concourse["labor-no-commodity","english"]``) is suitable *because* it invites a variety of different interpretations on what labor or a commodity are.
    <!-- TODO need to add more here - there are limits to Q openennes of interpretation too; refer back to the section on item discussions -->
    R survey design, and drafting (or collecting) Q statements may both be more craft than science, but they are a very different crafts that limit what can, and cannot be done with results.
    Given the confidence we place in the common understanding of an R-type item, it is reasonable to present summary statistics on this *individual* item.
    Conversely, given the openness to interpretation in Q, single item summaries make little sense, precisely because they were chosen to mean very different things to different people.
    <!-- TODO notice that Q items are not always crafted as in this case; that makes the difference maybe even starker -->

    In R, items are (mostly) measured *independently* from one another, that is, the choice of a participant on one item should not influence or restrict her choice on another item.
    This independence is not only a requirement for some of the statistical procedures frequently used, it also follows from the analytic-synthetic epistemology:
    if the synthesizing operation is subject to hypothesis-testing, any relationships between individual items produced by the data gathering method would be considered an undesirable artifact.
    <!-- TODO find statistical concept for non-independence-->
    *Rating*, rather than *ranking* measurements are therefore the norm, and some survey research goes as far as randomizing the questionnaire to avoid ordering effects.
    <!-- TODO find source for randomized order -->
    In Q, by contrast, items are evaluated *only* relative to one another and participants are reminded that the absolute scores assigned (`r range(q_sorts[,,"before"], na.rm = FALSE)` in this case) merely imply *ordinal* (better *than*), not *cardinal* valuation.
    <!-- TODO notice that at least I stressed that; also relate to later discussion -->
    Relative item rating in Q has a technical reason, too:
    Only because all items are evaluated relative to all other items, all item scores come in the *same* unit (valuation relative to the remaining items), a transposed correlation matrix becomes possible (a statistical summary, of, say, height and weight would be nonsensical).
    <!-- TODO is this analogy exactly right? Have I really gotten this? -->
    But strictly relative valuation has an epistemological dimension, too: if meaning can be derived from the *entire* item constellation only, participants must always choose *between* items.
    *Rating* and *ranking* measurements, taken to these extremes, are not interchangeable and strictly limit the meaningful presentation of results.
    An item from the World Values Survey can be summarized in isolation, because it was measured independently:
    the agreement with, say, a son-in-law from a different ethnicity by any one participant does not preclude her equal agreement with another item.
    An item from a Q study as this cannot be summarized in isolation, because it was measured in a dependent way: a participant ranking, for example, `labor-no-commodity` at the top, will be precluded from ranking any other item in the same way.
    By extension, it also makes little sense to present a mean valuation of `labor-no-commodity` (`r mean(q_sorts["labor-no-commodity",,"before"])`) in isolation because, in absence of the means of *all other items*, we have no idea what that value *means* (it might mean very different things, for example, if all other items were supremely agreeable).

    The analogy of LEGO bricks only slightly overstates the absurdity of ignoring the holism aspired to by, and non-independence implied by Q.
    Recall that participants were instructed to assemble a given set of bricks into an object of their design.
    It would clearly be nonsensical to take the low *absolute* (x-axis) position of, for example, a small red brick as an indication of disliking, without relating its position to the overall design: depending on the position of other bricks, it might be part of an exposed car fender, or a hidden structure in a house basement.
    It would be similarly perplexing to harp on the higher average (x-axis) position of ramp-shaped bricks, compared to the cuboid-shaped bricks:
    *of course*, if you are putting roof tiles on top, you *have* to place the foundation walls of a house underneath it.

   <!-- TODO make a nice table contrasting the above things? -->


  3. *Validity & Research Ethics.*

    Finally, Q and R differ in their conceptions of validity.

    In R, a measurement of inference is valid, if and to the extent that the stated concepts correspond to some objective reality "out there".
    For example, the above-mentioned WVS item on `your daughter marrying someone from a different ethny` rests on the assumption that there is such a thing as an attitude on inter-ethnic relationships (which seems reasonable).
    <!-- FIXME again, fix the WVS reference, or kill it -->
    No matter how supposedly *inductive* --- probably just *exploratory* --- an R approach to data is, the terms of that data are always set *before the fact*, *by the researcher*.
    Notice that the WVS item on inter-ethnic relationships will *not* admit of an attitude unforeseen by the researchers, say, a discriminatory sentiment based on *language*.

    In Q, conventional definitions of validity do not easily apply, at least because there is, by definition, no external and objective standard to verify human subjectivity.
    <!-- TODO point to respective section where I discuss that -->
    Instead, one *posits* that viewpoints become *operant* through the act of sorting Q-cards, and that even a limited sample of diverse items will give people a roughly adequate material to impress their subjectivity on.
    <!-- Notice Browns language on this in the email on coefficients: it's a logic -of -science kind of axiom; we model things as if - we also model them on the forced distribution, might add this here -->
    Items may --- or may not --- have been crafted and sampled with some theoretical preconception in mind (for example, libertarianism for `deontological-katallaxy`), but that preconception should not, in principle, limit the meaning that participants ascribe to it.
    Dramatically *different* interpretations of the same item, are emphatically *not* a threat to validity in Q.

    It is easy to see how the above, cursory summary statistics shift the standard of validity, and invite what Brown calls "hypotheticodeductive" inferences [-@Brown1980: 121].
    To nod at the high positive correlation of `exchange-value` and `natural-market-order` across participants is to reify a concept of "market radicalism" (or something similar), from which these items probably sprung *in the mind of the researcher*.
    *"Ah, that makes sense"*, one thinks --- and by sense, meaning the sense of the researcher and (maybe similar) sense of the reader.
    Looking at any particular item or relationships between items across people, always risks implying that the *meaning* of any given item is the same for everyone, as per the researcher's specification.

    It is also easy --- and self-serving for Q --- to overstate the difference to a hypotheticodeductive paradigm.
    Falling short of accepting a debilitating, radical constructivism, researchers --- as all people --- will always have to rely on some *common* understanding of language, including Q researchers.
    Researchers cannot interpret shared viewpoints of common sorting patterns (the factor scores) without *some* reference to their own preconceptions in drafting or sampling items.

    No factor interpretation of holistic patterns of items can be undertaken without *some* reference to their own preconceptions about those items.
    It would be impossible to interpret even the relative position of `exchange-value`, rejecting any overlap or relationship to our academic understanding of that item:
    we would be looking at a heap of structured gibberish.
    Q researchers are, in fact, quite fond of their own judgment: "researchers might, on occasion, 'know what to look for'" [Stephenson 1953: 44, as cited in @Watts2012: K2308].
    The epistemology of *abduction* suggests to repeatedly go back and forth between hunches, interpretation and what the data will bear out.
    <!-- TODO that's weak; fix and refer to abduction section -->

    The point of validity in Q methodology is not to ban all researcher's preconceptions, but to both *relax* their grip on meaning, and to *constrain* the plausibility of their explanations.
    On the one hand, the meaning of items is *relaxed*, because the linear relationship to the researcher's hypotheticodeductions is no longer presupposed:
    `pro-socialism`, may, or may not have been interpreted as the researcher intended.
    On the other hand, the plausibility of any explanation is *constrained*, because any observed shared viewpoint must make *sense*, per *some* set of interpretations of its items.

    These epistemological "Goldilocks" conditions disappear when R statistics are applied to these data:
    when presented the isolated finding of the high negative correlation between `wealth-tax` and `corporate-income-tax` (`r min(person_cor_m$value)`), we can *only* refer back to some of the preconceptions on "misunderstanding taxation" presented earlier (Flypaper Theory).
    <!-- TODO add reference to other section, mcccaffery -->
    This may be a productive approach --- though probably not best undertaken with these items and data gathering --- but thereby "testing" people's understanding of tax emphatically is not a valid measurement of their subjectivity, and falls short of the deliberative conception of public choice espoused earlier.
    <!-- TODO that's not quite right. Also, reference other section -->

    These epistemological concerns also carry a research ethical imperative:
    The data should be treated as participants were told it *would* be treated.
    That adherence to informed consent *includes* the methodology.
    <!-- TODO link to informed consent -->
    I told the `r ncol(q_sorts)-2` guests at the first CiviCon Citizen Conference that I was interested in their viewpoints on taxation, that their unique perspectives on the economy mattered, and that there *would be no wrong way* to sort the Q-cards.
    <!-- TODO link to the condition of instruction -->
    For up to three hours, these citizens earnestly struggled with these `r nrow(q_sorts)` items "on which reasonable people could disagree", thoroughly weighing thoughts and items against another, graciously pouring their thoughts into the rigid and crude form of a Q-sort.
    It would betray the trust and degrade the effort of the participants to treat these data as if they were R:
    I could have just handed them a survey and saved them a lot of time.
    The ethics get even more damning, when R statistics treat data as tests, and understanding turns to grading, as so easily happens when looking at the extremely negative correlation of `wealth-tax` and `corporate-income-tax`, or similar would-be inconsistencies.
    In the morning session of the second day, some participants expressed worry over being "tested" in the Q-sorts, as well as the conference.
    The conundrum of measuring people's understanding of taxation, without testing them on some narrowly-preconceived notion of consistency is a familiar one, and it will continue to plague this research.
    <!-- TODO add link -->
    The least I owe to the CiviCon participants is to make every effort to render their viewpoints understandable *from the inside*.
    That cannot be done with R methodology, and implore everyone using this data *not* to try.

    LEGO, again, delivers a coda by analogy.
    A hypotheticodeductive inference on a persons LEGO building would rely on a preconceived understanding of what any given brick means, for example: "this is a transparent brick".
    Any discussion of the (absolute) position of individual items would, absent any other information, have to revert to the preconception of the researcher.
    For example, one might argue that the extreme placement (on any axis) of the transparent bricks indicates strong feelings for these types of bricks.
    In the context of a LEGO house, of course, the transparent bricks may be considered *windows* by participant builders, and be placed on the exterior of the building because of that function.
    More varied meanings of transparent bricks are conceivable.
    A subset of builders may place *small* transparent bricks *inside* the houses, not readily explicable by their transparency: why would *anyone* greatly place almost all transparent bricks on the perimeter, but only a very small one inside, the R researcher may wonder?
    Given the context, one may recognize the surroundings of the small transparent bricks as "kitchens" and venture that these builders interpreted them as maybe sinks full of water, or aquariums.
    Characteristically, the meaning of "transparent brick" is relaxed considerably in the light of context, but it also remains bound to a common understanding of some of its features (transparency), and therefore constrained to the subsets of possible meanings that might make sense of it (aquarium, sink full of water, or other *transparent* objects).

    One may add that the participant builders may also be quite offended, if they learned that their LEGO constructions did not receive any attention as such, but where instead intended as a test of visual acuity:
    whether participants could distinguish transparent from non-transparent bricks.

The point here is not to criticize R methodology, though the near hegemony over quantitative social sciences that Brown rails against [-@Brown1980: 321], may indeed have long stretched thin its foundations and overreached its purview.
The point is that, for the present research question *or* given the present data, any excursion to R methodology would at best be a distraction, and at worst strictly unscientific.


## Correlations

To extract *shared* viewpoints, we must first establish what it means or any pair of Q-sorts to be *alike*.
Since we would expect people with similar viewpoints to sort items in similar positions, and therefore have similar values over all items, respectively, we can use a correlation coefficient between any two people's sorts as a statistical summary of their similarity.
A *correlation matrix* of all such pairs is the simplest summary statistic that can be presented for Q data, maintaining both the holistic and relative nature of the sorts.
Though Q studies do not always display or discuss the correlation matrix, all of the downstream analyses are, in fact, based on this table [@Brown1980: 207].


### Correlation Coefficients

Depending on assumptions made, *different* correlation coefficients are applicable to Q data.
These coefficients are well-known statistics, but, their underlying assumptions can be thorny for Q methodology and consequential for these data.
The proceeding discussion also serves to rehearse the uncommon person-correlations at the heart of Q methodology.
<!-- TODO notice that q textbook does not even mention these, and Brown glosses over them --- they may matter though -->
<!-- TODO it may be necessary to reorganize the following into the question of whether the data is ordinal or interval -->

1. *Pearson's $\rho$ Product-Moment-Correlation Coefficient*.

    The conventionally used Pearson product-moment correlation coefficient, or Pearson's $\rho$, starts from the covariance of any pair of Q-sorts, *in their raw form*,

    (@cov) $$cov(X,Y)=\sum_{i=1}^{N}{\frac{(x_i-\overline{x})(y_i-\overline{y})}{N}}$$ [^pop-corr].
    <!-- TODO add reference to Pearson -->

    Since in a forcibly symmetric Q distribution as ours, the mean will always be zero (see above), we can safely ignore the respective $-\overline{x}$ terms; by default, Q-sorts are already expressed as deviations from their expected values, $E$.
    The covariance on all *items* for, say `Ingrid` and `Petra`, is therefore simply the cross-product of all item positions divided by the number of items, $N$.
    It's easy to see how this number will be larger when the two scores have similar (absolute) values: `Petra`'s and `Ingrid`'s quite different ranks for, say `all-people-own-earth` (`r q_sorts["all-people-own-earth",c("Petra","Ingrid"),"before"]`, respectively) multiply to `r q_sorts["all-people-own-earth",c("Petra"),"before"] * q_sorts["all-people-own-earth",c("Ingrid"),"before"]`, whereas their more similar positions on `deontological-ethics` (`r q_sorts["deontological-ethics",c("Petra","Ingrid"),"before"]`, respectively) yields `r q_sorts["deontological-ethics",c("Petra"),"before"] * q_sorts["deontological-ethics",c("Ingrid"),"before"]`.
    Brown reminds us that this multiplicative weighting of like extreme scores is "phenomenologically" appropriate, because respondents feel more strongly and are more certain about items ranked towards the margins [-@Brown1980: 271].
    <!-- TODO maybe make this footnote? -->
    The resulting covariance is, unfortunately, hard to interpret, because it is in squared units and depends on the spread of the data.
    It is normalized into Pearson's product-moment correlation coefficient by dividing it by the product of the two standard deviations, in this case, `Petra`'s and `Ingrid`'s --- both of which are, by definition, the same.

    (@Pearson) $$\rho(Petra,Ingrid) = \frac{\operatorname{Cov}(Petra,Ingrid)}{\sigma(Petra)\sigma(Ingrid)}$$

    Consider the special case of correlating a person's Q-sort with her own Q-sort, say `Petra` with `Petra`, to understand the bounds of the Pearson's $\rho$.
    The denominator in (@Pearson) yields the $\operatorname{Var}(X)$, the square of `Petra`'s standard deviation over the item scores is, by definition, her variance.
    The numerator is given by the summed product of `Petra`'s scores with `Petra`'s scores, which is also the variance.
    The correlation coefficient of two identical Q-sorts is therefore $1$.
    A correlation coefficient of $-1$ would result if the two Q-sorts were diametrically opposed, as if mirrored around the distribution mean of zero:
    in that case, the numerator would be $-\operatorname{Var}-$, and everything else would remain the same.
    <!-- TODO probably need the variance in here, or earlier? -->
    A correlation coefficient of $0$ would be expected if two Q-sorts are entirely unrelated.

    All of these extremes are rarely reached in Q, with a range of `r range(cor(q_sorts[,,"before"])[lower.tri(cor(q_sorts[,,"before"]))])` in this case.

    The Pearson correlation coefficient is a measure of *linear* association between two variables, or Q-sorts in our case.

2.  *Spearman's $\rho$ Rank Correlation Coefficient.*

    Spearman's $\rho$ works much like Pearson's $\rho$, but starts from *rank orders*, instead of raw Q-sort values.
    Raw Q-sorts are rank ordered, and assigned their rank index.
    Crucially, in a case of ties --- of which there are many in Q ---, the average of the would-be occupied ranks is assigned as a rank.
    For example, if two cards are sorted in the $-5$ column, as per the Q distribution, they will both receive the rank of $3.5$, because they *would* occupy the 3rd and 4th position, if they were not tied.
    A Q-sort, beginning from the left (negative) extreme will look like this if transformed into Spearman's ranks:

    Raw Value   Index   Spearman's Rank
    ----------  ------  ----------------
    -7          1       1
    -6          2       2
    -5          3       3.5
    -5          4       3.5
    -4          5       6.5
    -4          6       6.5
    -4          7       6.5
    -4          8       6.5
    -3          9       12
    -3          10      12
    -3          11      12
    -3          12      12
    -3          13      12
    -3          14      12
    ----------  ------  ---------------

    Spearman's procedure then computes a Pearson's correlation coefficient (as in the above) from the rank *differences* $d_i = x_i - y_i$ (instead of the difference to the mean).
    <!-- TODO add reference to Spearman's -->
    This version of Pearson's coefficient can be summarized thus:

    (@spearmans-rho) $$\rho = 1- \frac{6 \sum d^2}{n^3}$$

    The resulting coefficient is likewise bounded from $+1$ to $-1$, with perfect correlation indicating not a *linear* association, but a strictly *monotone* association, where all items are ranked the same, though possibly by different "amounts".
    Conversely, two Q-sorts with mutually reversed ranks would yield a $-1$ coefficient, indicating a negative, monotone relationship.

3. *Kendall's $\tau$ Rank Correlation Coefficient*

     <!-- Brown in email: Then Cartwright (1957) recommended Kendall's relatively new tau statistic for Q sorts and other cases when "it is desired to use a forced rectangular distribution, or some other forced shape of distribution which may or may not satisfy the requirements for using a product moment coefficient" (p. 102).  After making this noteworthy contribution, one might have expected that Cartwright would hang around long enough to use it, but alas.
     Cartwright, D.S. (1957).  A computational procedure for tau correlation.  Psychometrika, 22, 97-104. -->

    Kendall's $\tau$ starts not with differences in scores or ranks, but with a comparison of item pairs between the two Q-sorts.
    When the ranks of two items ($j$, $i$) agree between two Q-sorts ($x$, $y$), the item pair is said to be *concordant* ($x_i > x_j$ and $y_i > y_j$, or $x_i < x_j$ and $y_i < y_j$).
    When the ranks between the two respondents disagree ($x_i > x_j$ and $y_i < y_j$ or $x_i < x_j$ and $y_i > y_j$), they are said to be *discordant*.
    An item pair is said to be *tied* on a Q-Sort if both items have received the same score (as frequently happens in Q), $x_i = x_j$.
    In the simplest form, Kendall's $\tau_A$ divides the difference between the number of concordant pairs $n_c$ and the number of discordant pairs $n_d$ by half the number of pairs squared:
    $\tau_A = \frac{n_c - n_d}{n(n-1)/2}$.
    For example, `Petra` and `Ingrid` are *concordant* on `poll-tax` (`r q_sorts["poll-tax",c("Petra","Ingrid"),"before"]`, respectively) and `corporate-income-tax` (`q_sorts["corporate-income-tax",c("Petra","Ingrid"),"before"]`, respectively): they both prefer `corporate-income-tax` over `poll-tax`, if by a different amount.
    They are *discordent* on `infinite-growth` (`r q_sorts["infinite-growth",c("Petra","Ingrid"),"before"]`, respectively) and `use-value` (`r q_sorts["use-value",c("Petra","Ingrid"),"before"]`, respectively):
    Between the two, `Petra` prefers `infinite-growth`, `Ingrid` prefers `use-value`.

    Because the denominator gives the total number of combinations of two items out of the total number of items (in this case `r (77*76)/2`), Kendall's $\tau$, too, is bounded from $+1$ to $-1$.
    If two Q-sorts are identical (opposed), they will (dis)agree on *all* pairwise item comparisons, divided by *all* possible such combinations, yielding ($-$)$1$.
    If two Q-sorts agreed and disagreed on half of the comparisons each, the numerator will be $0$, yielding a $0$ coefficient.

    Because in Q-sorts items must be stacked in some columns, there are many ties.
    Kendall's $\tau_B$ additional terms in the denominator adjust for such ties, considering the number of ties $n_1$ and $n_2$ on both Q-sorts:

    (@kendalls-tau) $$\tau_B = \frac{n_c - n_d}{\sqrt{(n_0-n_1)(n_0-n_2)}}$$.

    Kendall's $\tau$, does not give an association between the two Q-sorts, but expresses the *probability*, that any given pair of items will be ranked equally between the two Q-sorts.

[^pop-corr]: Here, as in all of the below formulae, I reproduce the (unbiased) *population* statistic.
  The population statistics seem appropriate for (at least) a forcibly symmetric distribution as this, because a population mean of *any* number or selection of sampled statements is, in fact, known:
  it must be zero, per the specified distribution.
  The mean need not be estimated, and no correction for estimation bias need be introduced.
  Steven Brown (email conversion in February 2015) also suggests to use the population variant, though the usage and terminology in his 1980 book *Political Subjectivity* are a little confusing.
  <!-- TODO add email specifics citation -->

  On a technical level, however, all of the code used to calculate correlations reported here us the *sample* statistic.
  Because because the de-biased $N-1$ term occurs both in the numerator and denominator of the correlation coefficient, it cancels out.
  <!-- TODO fix this or add link to GH issue -->
  <!-- FIXME add reference to Political Subjectivity -->

The choice among these correlation coefficients depends first on the kind of data gathered upstream, and the kind of analysis intended downstream.

Pearson's $\rho$ requires at least interval-scaled data: the *difference* between values must have *cardinal*, not just ordinal meaning. ^[Aside from these considerations, Pearson's $\rho$ is susceptible to outliers (*not robust*) and requires normally distributed data, though both are not a great concern in Q methodology, when forced distributions bound extreme values and close to a bell curve (as the current distribution is).
For example, cardinal valuation would imply cards sorted under $3$ are preferred to cards sorted $2$ by the *same amount* as cards between $7$ and $6$ respectively --- not just that higher sorted cards are valued *more*.

Both Spearman's $\rho$ and Kendall's $\tau$ relax this assumption, and can be used on ordinally-scaled data, with item scores merely rank, but not a rating.


#### Ordinal vs Interval Data

Q method is, unfortunately, a bit muddled on the issue of cardinal or ordinal valuation in Q-sorts.
Most studies simply use the default Pearson coefficient and many articles as well as the leading introductory textbook on the method [@Watts2012] do not even *mention*, let alone justify, the use of one of the three common coefficients.
This oversight might be easily forgiven, because for Q data, at least Spearman's and Pearson's coefficients appear to be closely related.
Given the relatively high number of ranks assigned (usually more than 9, 15 in this case), many researchers might have no trouble treating the data as interval-scaled, much as (shorter-ranged) Likert-scales are often interpreted.
<!-- TODO add citation on what ordinal can usually be used as interval -->
Additionally, Q data is tightly bounded, especially when a forced distribution is used: there *can* be no outliers, that might inordinately affect Pearson's, but not Spearman's.
One might then conclude, as Block [-@Block-1961: 78] and Brown [-@Brown-1971: 284] that the choice of correlation coefficient does not much matter, effectively sidestepping the fundamental question of which coefficient is the appropriate one.
Brown argues correctly that *absent ties*, Spearman's and Pearson's *must* be identical, and that illustrate with his data, that even *with ties*, Spearman's and Pearson's "produce virtually identical results [...]", which are "in turn [...] essentially the same as those obtained utilizing Kendall's $\rho$" [-@Brown1980: 279].
The problem is that, in fact, Q data has *many* ties, and using this data, the coefficients *do* differ by an average of `r mean(abs(cor(q_sorts[,,"before"],method="pearson") - cor(q_sorts[,,"before"],method="spearman")))`, ranging up to a sizable `r max(abs(cor(q_sorts[,,"before"],method="pearson") - cor(q_sorts[,,"before"],method="spearman")))` for Spearman's and Kendall's, each compared to Pearson's.
<!-- TODO add table with all the differences -->
As the sensitivity analysis in the appendix shows, these differences in correlations can also filter down to the factor analysis, and may even bias the interpretation in a *systematic* way.
<!-- TODO add, link to sensitivity analysis -->

```{r coefficients-lipset, include=FALSE, eval = FALSE}
# this is just for the list email to show that results are also different for the lipset data
data(lipset)
max(abs(cor(lipset$ldata, method = "pearson") - cor(lipset$ldata, method = "spearman")))
max(abs(cor(lipset$ldata, method = "pearson") - cor(lipset$ldata, method = "kendall")))
lipset$spearman <- qmethod(
  dataset = lipset$ldata,
  nfactors = 3,
  rotation = "varimax",
  forced = TRUE
  , cor.method = "spearman"
)
lipset$pearson <- qmethod(
  dataset = lipset$ldata,
  nfactors = 3,
  rotation = "varimax",
  forced = TRUE
  , cor.method = "pearson"
)
```

<!--TODO  Make note on decision tree why verbosity is important: it's a combinatorial explosion, and I don't want to waste my or the reader's tiem datamining -->

Aside from Brown's here falsified *empirical* claim that "within the factor-analytic framework [...] the interval-ordinal distinction is of no importance" [-@Brown1980: 289], there appears to be no *methodological* literature to justify a choice of coefficients on Q-epistemological or ontological grounds.

The choice between interval and ordinal valuations in Q-sorts seems to turn on the status of the Q distribution.

On the one hand, we can seek to ascribe *positive* status to the Q distribution, that is, assume that there is such a thing as a *true* distribution on any given set of items for any given respondent, and that this distribution can be measured.
The simplest way to go about this may be to let people sort items freely into a fixed number of bins.
Given the relatively great number of bins (15 in this case), it seems reasonable to treat distances between adjacent bins as meaningful, because respondents were able to freely project their subjectivity, however formed.
Alternatively, one can assume that *any* respondent's feelings towards *any set of items* are normally distributed as a matter of ex-ante empirical finding.
Assuming that every person has one $-7$ item, two $-6$ items, and so forth under the bell curve, it is reasonable to assume same and meaningful distances between equal bins slicing up the (continuous!) normal distribution.
This latter assumption is, of course, quite heroic, especially considering that P-sets of items cannot be random sampled and any given set of items will likely include idiosyncracies.
I am not aware that anyone in the Q literature would make, nor provide evidence for this latter assumption, though few researchers have implemented free distributions.

On the other hand, we can ascribe *epistemological* status to the Q distribution, that is, treat it "as a formal model in the logic-of-science sense and in terms of which persons are instructed to present their points of views", in Brown's characteristically cogent words (2015, via email on Qlistserv).
<!-- TODO add email as referenc -->
Oddly --- though not unconvincingly --- this is then an *operational* definition of the *distribution* to *operantly* measure the structure of subjective viewpoints (Stephenson 1968: 501), to borrow a popular dichotomony in Q circles.
<!-- TODO Fix citation, as cited in Watts Stenner 684 -->
The quasi-normal Q distribution is *defined* by the researcher as the terms in which respondents may express the concept of subjectivity, much as R researchers define items based on some pre-existing concept.
This latter view of the distribution underpins most of Q research, though not always as explicitly as in Brown's writing.
The case for the forced-distribution-as-model is twofold.
<!-- TODO link here to an earlier discussion of the forced distribution -->
First, the forced distribution strongly single-centers all the *items*, a pre-condition for a transposed factor analysis: under a forced curve, all items are always ranked relative to the same reference point of all other items. ^[Arguably, items may still be considered single-centered if under a free distribution, as long as they are thoroughly weighted at the same time.]
<!-- TODO note here that a free coordinate system, where people have to wrestle with the relative positions of items might do this trick! -->
Second, the forced distribution extracts "hidden" preferences by making Q sorters make the same, tightly circumscribed decisions: for example, people cannot place more than one item in the leftmost bin.
A complete rank ordering might be the logical --- if impractical --- solution to reveal all hidden preferences: in a complete rank order, no ambiguities are left.
<!-- TODO is "complete" rank order the correct term? -->
<!-- TODO make this whol thing a GH issue, I can make a paper out of this -->
<!-- TODO there is a obvious link between all of this and vNM consistency -->

The former *positive* view on the Q distribution might straightforwardly justify cardinal valuation of Q-sorts.

Stephen Brown argues that cardinal valuation can be maintained for the latter, *epistemological* view on the Q distribution, too, or at least that "the equality of intervals along the Q-sort range [...] is no more subject to proof than it is to disproof, which is common to models." (Brown 2015, email to Qlistserv)
<!-- TODO add link or something, proper reference -->
It seems to me though, that the model implied here conceives of Q-sorts as *ordinal* valuations, and that data should be treated respectively.
To operationally "reveal the Q sorters' preference (as opposed to their likes)" by forcing people to decide *between* items is to negate meaningful distances (ibid.).
Stephen Brown may be correct that "preferences" and "likes" cannot be observed at the same time, akin to the quantum principle of complementarity (Brown's analogy).
However, this *same* complementarity holds between ordinal and cardinal valuations.
If preferences are to be revealed in terms of *choices* constrained by the measuring mechanism, then that is an *ordinal* measurement, without meaningful distances.
Meaningful distances require that respondents be arbitrarily free in their choices, including, for example, all items in one column, with no distinctions enforced.

An ordinal interpretation of data also dovetails with the related, relentness instructions to respondents that they worry only about the *relative* positioning of items.
<!-- TODO find citation -->

Readers may wonder, at this point, what could possibly warrant this methodological diversion in the context of the `Keyneson` data analysis.
Problematically, the correlation matrix for `Keyneson` looks not just different, depending on the chosen coefficient, but in a systematic way.

This bias stems from the different weighting of similarly (and dissimilarly) sorted items in the *extremes* of the distribution.
Recall that Pearson's $\rho$ squares the covariance, giving great weight to whatever similarity people's viewpoints express in their tails.
By comparison, Spearman's $\rho$ offsets this quadratic weighting partly by "spreading out" the center of the distribution:
for example, in Spearman's ranks, the difference between the (raw, Pearson's) $0$ and the $1$ column is $10.5$, whereas the difference between the $6$ and the $7$ columns is only $1$.
As an illustration, Spearman's $\rho$ assumes that respondents ordered the items *side to side*, with equal signs pasted where they were indifferent (tied), instead of *on top of one another* in equal-width bins.
Because the number of ties in the center is, by virtue of the forced distribution, greater than the number of ties towards the extremes, differences in valuation towards the center occupy a greater range.
<!-- TODO Make below footnote -->
For example, two equally ranked items at ($+7$) and ($+1$), will contribute `r (77-39)^2` and `r (49.5-39)^2`, respectively, at a ratio of   and two equally ranked items at a ratio of `r ((77-39)^2)/((49.5-39)^2)` to 1 in Spearman's ranks, but `r 7^2` and `r 1^2` at a ratio of `r 49` to 1.
In short: similarities and differences between Q-sorts towards their middle get greater thrift with Spearman's $\rho$ compared to Pearson's.
Brown, as previously noted, suggests that the quadratic weighting of differences and similarities towards the extremes is warranted because respondents feel strongest about these items.
<!-- FIXME find citation -->
That may be so, but it implies an otherwise rejected *positive* assumption on the Q distribution, namely, that all respondents *do* feel *much* stronger about items placed in the extremes, and *by the same amount*.

*Absent* such a positive assumption, the inordinate influence of covariance in the extremes can conspire to introduce bias as a function of the *given* Q-set.
Consider that items may --- especially when they are not gathered "in the wild", but crafted, as for `Keyneson` --- differ in the starkness of their wording.
For example, `pro-socialism` might have been formulated *without* reference to the informative, but tainted term "planned economy". ^[The German "Planwirtschaft" carries negative associations with the East German economic malaise, and general bmismanagement. An equally tainted term for Americans may be "socialism".]
Consider also that *some* (and only *one*) item has to fill the extreme position, irrespective of whether a participant really feels that much stronger about the last item.
It may then be that starkly worded items tend to occupy these extreme positions, greatly load the correlation matrix and downstream factor analysis.
In the extreme case, the inclusion (or exclusion) of a divisive item such as `pro-socialism` (greatest standard deviation at `r sd(q_sorts["pro-socialism",,"before"])`) may produce so much covariance as to become something of an "anchor item" for a factor.

The issue here is not that some items "anchor" a factor more than others (that is very much the point of distinguishing statements),
<!-- TODO add footnote to QDC part -->
nor that some items are worded in a different style (that is very much the point of the factor interpretation).
The issue is that *absent* a *positive* view on the Q distribution --- which we have rejected --- the *divisiveness* of some items may crowd out other, "milder" patterns of covariance towards the middle of the distribution, even trivialize factors in an extreme scenario, all for no good reason.
Recall that without a positive distributive assumption, we have no reason to assume that the difference between, say $6$ and $7$ is the same as between $0$ and $1$.
We cannot consequently know in any absolute amount how much more or less strongly people felt about more or less divisive items: the weighting of the (mildly worded) middle covariances to the (starkly worded) extreme covariances is, becomes *arbitrary*.

Q methodology with a forced distribution has thrust us into a no-mans-land between interval and ordinal data, inhospitable to sound statistics.
If we assume interval scaling, and use Pearson's $\rho$, covariances in the extremes are weighted more, though the magnitude of that weighting is suspect, for the above-mentioned reasons.
If we assume ordinal scaling, and use Spearman's $\rho$, covariances in the extremes are weighted less, though the magnitude of that weighting is suspect, too.
Recall that the "spreading out" occurs as a function of ties within the higher columns towards the middle of the distribution.
These ties, however, are *also* enforced by the method, and did not arise organically.

Ideally, Q method would produce a thoroughly grounded clarification of how covariances across the (forced) distributions should be appropriately weighted.
Unfortunately, at least the outspoken members of the Q listserv appear untroubled by this supposed conundrum.
Bob Braswell (2015, email on Qlist) suggests that because the downstream analyses require Pearson's "the burden of proof is on someone who would like to substitute another correlation coefficient".
Braswell and Brown also doubt differences in coefficients will be consequential, and suggest any downstream effects may be counteracted by judgmental flagging and rotation.
In a way, these skeptics are right: even a (rejected) uncertainty on the choice of correlation coefficient does not invalidate Q Methodology, and final factors are likely to share a great family resemblance, as also indicated by the robustness analysis in the appendix.
<!-- TODO add link to appendix -->
Still, this ambiguity on coefficients affects Q methodology at a fundamental level, carrying on through all downstream analyses, and even small differences should matter for an avowedly *scientific* study of subjectivity.
The troublingly common resort to abductive judgment in the face of muddied statistics risks loosening the tight positive bounds placed on researcher's interpretation ever mover: if, *in  addition* to the rotation method, the flagging, the number of factors and the extraction method, even the correlation coefficient were up for grabs, researchers would enjoy considerable latitude in specifying their models --- all before the supposedly bounded qualitative interpretation has begun.
These and other methodological challenges are often met by reframing them in terms of (outsider) R and (insider) Q approaches, even where, as in this case, the conundrum is thoroughly grounded in Q, possibly betraying an insular retreat of the method.

Alternatively, Q method could take *either* ordinal or cardinal valuations in Q sorts to their respective logical conclusions:
complete rank orders and non-parametric statistics in the former, freely distributed sorts in the latter.
Such innovations would require, among other things, thorough validation, and are clearly beyond the purview of this research.

Among the imperfect --- because arbitrary --- choices, Spearman's $\rho$ appears to be the *conservative* choice, and for that reason, will be used in the following.
Spearman's stresses the middle of the distribution, an area where respondents feel less strongly, and less clearly about placed items.
Any confusion or indifference about these items should result in *random* placement around the mean, and therefore, cancel out in the correlation coefficient.
Overall correlations may be *lower* (they are not at `r sum(cor(q_sorts[,,"before"],method="spearman"))` and `r sum(cor(q_sorts[,,"before"],method="pearson"))`, respectively) , or *less patterned*, and resulting factors are less likely to be superficially anchored by some agreement in the extreme.
<!-- TODO add data for factor analysis showing that Spearman's is, in fact, harder; that's also the robustness analysis. -->

<!-- TODO fix this quote in the above on approximately interval sth
  @Bartholomew-Steele-etal-2011:
  > 245: say that interval treatment for six or seven categories is often done.
  > Still, they warn on 245: there *will* be biases (more evidence in section 9.5) -->


#### (Linear) Association vs. Probability

Assuming that the present Q-sort data best be treated as ordinal scaled, the question arises how their correlation coefficients can be appropriately and meaningfully summarized into ideal viewpoints.

Q methodology, ordinarily relying on Pearson's $\rho$ has it easy: as measure of *linear* association it lends itself well to the matrix algebra of factor or principal components analysis.

The case is more complicated for rank order statistics.
Fabrigar and Wegener rule out exploratory factor analysis for ordinal data, suggesting  [-@Fabrigar-Wegener-2012: 96].
Instead of shoe-horning rank data into factor analysis , Bartholomew, Stew et al. recommend new "model-based methods" [-@Bartholomew-Steele-etal-2011: 45].
As indicated in the above, taking an ordinal view on Q-sorts to its statistical conclusion might be worthwhile, but the required re-validation and epistemological foundation is untenable here.

Luckily, other statisticians are more optimistic about factor-analyzing ordinal data.
Basilevsky recommends "*euclidian* measures such as Sperman's (sic!) $\rho$ [...]" over Kendall's $\tau$ *probability*, because the latter obscures local differences and yields no meaningful factor analytic result [-@Basilevsky-1994: 512, emphasis added].
Fittingly, Basilevsky explicitly validates a Spearman's-based principal-components analysis of an "n x n matrix [...] *among the observers*", though he might not have had Q methodology in mind [-@Basilevsky-1994: 515, emphasis added].
He concludes for Spearman's that "if intrinsic continuity can be assumed either as a working hypothesis or by invoking a priori theoretical reasoning, most factor models carry through just as if the sample had been taken directly from a continuous population" [[-@Basilevsky-1994: 518].
That bodes well for this analysis.
Intrinsic continuity is customarily assumed for "six or seven categories" [@Bartholomew-Steele-etal-2011: 245], and should be defensible for 15 categories.

This remains very much an "*ad hoc* method[]" for treating Q-sort data [@Bartholomew-Steele-etal-2011: 45, emphasis in original], but it is a conservative one.
At best, analyses based on Spearman's $\rho$ dovetail with Brown's *operational* definition of subjectivity as Q-sorts under a quasi-normal distribution, implementing an appropriate statistic for fine-grained, but ordinal data.
At worst, Spearman's-based analyses fall short off the ordinal scaling of Q-sorts, and merely effect a monotone transformation of would-be cardinal data, emphasizing covariances towards the center of the distribution.

<!-- TODO maybe later add visualization of all scatterplots? -->


### Correlation Matrix

The simplest summary statistic in lieu of descriptives we are left with is the correlation matrix.

`Qmethod` --- as other Q software, does not report the correlation matrix, but the below correlation matrix is produced calling the same function also used in later analyses, a Spearman's $\rho$ correlation coefficient.
The correlation matrix is characteristically complex, and hard to interpret --- which is probably why most Q studies, especially with more participants, do not report it.

```{r cor-viz-before, echo=FALSE}
cor <- array(data = NA, dim = c(ncol(q_sorts), ncol(q_sorts), dim(q_sorts)[3]), dimnames = list(colnames(q_sorts), colnames(q_sorts), dimnames(q_sorts)[[3]]))
cor[,,"before"] <- cor(q_sorts[,,"before"], method = "spearman")  # make correlations
cor[,,"after"] <- cor(q_sorts[,,"after"], method = "spearman")  # make correlations
q.corrplot(cor[,,"before"])
```

Recall, again, that the correlations in the above matrix are between *people-variables* across *item-cases*.
The below plot illustrates this for the strongest positive, the strongest negative, the weakest and a self-correlation.
<!-- TODO(maxheld83) this should probably go someplace else, maybe way above in the
cor discussion. -->
<!-- TODO(maxheld83) Add footnote that jitter was added to avoid overplotting. -->

```{r cor-example-plot, include=FALSE, eval=FALSE}
cor.ex <- vector("list", 4)  # create empty list
names(cor.ex) <- c("positive", "negative", "null", "identity")
cor.ex$positive <- rownames(which(cor[,,"before"] == max(cor[,,"before"][cor[,,"before"] < 1]), arr.ind = TRUE))  # find the highest positive correlation pair
cor.ex$negative <- rownames(which(cor[,,"before"] == min(cor[,,"before"][cor[,,"before"] < 1]), arr.ind = TRUE))  # find the highest negative correlation pair
cor.ex$null <- rownames(which(cor[,,"before"] == min(abs(cor[,,"before"][cor[,,"before"] < 1])), arr.ind = TRUE))  # find the lowest correlation
cor.ex$identity <- c(rep(x = cor.ex$null[2], times = 2))  # identity is just one with him/herself
labels <- row.names(q_sorts)  # make labels vector
labels[-sample(x = 1:length(row.names(q_sorts)), size = 3, replace = FALSE )] <- NA  # take a sample otherwise too much overplotting
cor.ex.plots <- cor.ex  # make empty list with vec.example names
g <- NULL
for (i in names(cor.ex)) {
  g <- ggplot(data = as.data.frame(q_sorts[,,"before"]), mapping = aes_string(x = cor.ex[[i]][1], y = cor.ex[[i]][2]))  # need aes_string because vec.examples includes strings (unhelpfully)
  g <- g + geom_point(position = "jitter")  # must jitter b/c overplotting, alpha is bad
  g <- g + geom_smooth(method = "lm")  # add regression line
  g <- g + geom_text(mapping = aes(label = labels), hjust = 0, vjust = 0)  # ad sampled labels
  g <- g + ggtitle(i)  # add titles
  g <- g + coord_fixed()  # make aspect ratio 1
  # TODO(maxheld83) color the plots as the respective fields in the cor matrix
  # TODO(maxheld8) cut superfluous space on the axes
  cor.ex.plots[[i]] <- g  # write out plot
}
n <- length(cor.ex.plots)  # number of plots
nCol <- floor(sqrt(n))  # number of cols
do.call("grid.arrange", c(cor.ex.plots, ncol=nCol))
#TODO(maxheld83) maybe add the r value here?
```

To protect the anonymity of participants, as discussed elsewhere, I cannot provide additional details to contextualize individually high or low correlations between Q sorters.
Suffice it to say that the correlations broadly track views expressed at the conference, and, to a lesser extent, age and education.

Still, some general observations are in order.
Correlations do not go especially high, with a maximum of `r max(cor[,,"before"][cor[,,"before"]!=1])`, though merely rough similarities may be expected with a diverse and complicated Q-set as `keyneson`.
Recall that some participants reported problems sorting several items: the resulting (random) error might prevent greater correlations (greater overall, and more bifurcated correlations *after* the conference appear to bear this out.)

More remarkable is the fact that while there are several correlations close to 0 --- quite rare in Q ---, there are no substantial *negative* correlations.
Recall that the Q-set was designed for "reasonable people to disagree on", so why did respondents *not* disagree more?
One the hand, the absence of negative correlations might indicate that people do not think in strictly *opposite* ways about taxation --- or the `keyneson` items, at any rate.
Instead, people might think *differently* about it, expressing uncorrelated sorts.
The greatest difference in expressed subjectivity among the participants lies not between diametrically opposed patterns, but between different patterns.
On the other hand, the predominantly positive correlations might betray a lack of diversity, or presence of skew among the P set of participants already suspected from the conference report.
For example, in a broadly social-democratic crowd, with some socialists and conservatives sprinkled in, but no libertarian, disagreement is unlikely to be diametrical.
Neither a socialist, nor a conservative take the precisely the *opposite* viewpoint of a social democrat, though they may differ considerably.

While this lack of diversity may spell trouble for the quality of the sample, it is good news for the factor analysis.
<!-- TODO link to sections where you discuss the sample -->
Without negative correlations, it is unlikely that people will load on opposite ends of "polar factors" --- a great hassle for computing composite scores.

<!-- TODO make signifciance test, comment on why that is not relevant here   the false-negative test comes later. -->
<!-- TODO make histogram of correlation coefficients -->
<!-- TODO notice the issue of positive manifolds (via Schmolck email); this seems to be given here, too -->


## Factor Extraction

Based on the above correlation matrix showing the pairwise similarities of all participants, Q methodology proceeds by summarizing these patterns into *fewer*, *shared* viewpoints.

By way of LEGO analogy, the correlation matrix superimposes any two buildings on top of another, displaying the degree of overlap between them.
<!-- TODO is that right? -->
To extract the *kinds* of objects that our participant builders have constructed, we must somehow further summarize this matrix to yield a shorter list of family resemblances among the LEGO objects.

Q methodology, in other words, rests on an *exploratory* data reduction technique.


### Which Data Reduction Technique

Just *which* of a number of such data reduction techniques would be a appropriate for Q *methodology* is, alas, again equally disputed and consequential.
The Q literature, unfortunately, offers no definitive resolution, but instead loosely refers to epistemological (metaphysical?) axioms ("indeterminacy") or invokes tradition (Stephenson, and increasingly, Brown) and *past* computational limitations (e.g. @Stephenson1935a: 18]).
<!-- phenomenal issue here: http://stats.stackexchange.com/questions/215404/is-there-factor-analysis-or-pca-for-ordinal-or-binary-data -->
<!-- MH TODO find sources for reverting to authority -->
Criticism of such Q orthodoxy is rising on the Qlistserv and elsewhere, most notably (and convincingly) by decade-long contributor (and `PQMethod` maintainer) Peter Schmolck as well as recently @Akthar-Danesh-2007, but no work has been published yet.
<!-- MH TODO link to Peter Schmolck's message, somehow include in Bib -->
Butler's critique, originally directed at factor analysis in general, oddly, still seems to apply to Q:

> "far too many [...] students complain that the only basis they have for choosing among the many types of factor analysis is the prestige of a given theorist or the fact that this or that "*maximin*" solution has often been used"
> [@Butler-1969: 252f], also see updated critique in [@Yates-1987: 6]

Absent well-justified choice in the literature, we then have to delve into the alternative methods and decide for this study on their merits.

Readers will be relieved that the purview of this investigation can be pragmatically limited to only three different extraction methods for exploratory factor analysis (EFA):

1. Principal Components Analysis (PCA), sometimes also known as the Hoteling method
2. Centroid Factor Analysis (CFA), also known as the Simple Summation Method
<!-- TODO add Thurstone 1947 in Brown 1980 as source for CFA, Burt1940 in ibid as source for simple summation -->
3. Principal Axis Factoring (PAF), also known as Common Factor Analysis
<!-- MH TODO add references for these! -->

There are many other multivariate techniques that may --- or may not --- hold promise for Q methodology, including cluster analysis (CA), multi-dimensional scaling (MDS), latent profile analysis (LPA) or structural equation modeling (SEM), as well as yet different *non*-parametric methods (latent trait and latent class analysis) [@Yates-1987: 2], as may be required by a stringently ordinal reading of Q sorts (see above).
Such a systematic reconstruction of Q *methodology* in light of currently available *techniques* may be worthwhile --- or not ---, but it is clearly beyond the scope of this study.
Of course, just because *alternative* methods may be difficult and untested does not mean that the above conventional techniques are adequate.
Limiting the discussion to the above techniques is still prudent, not only because they have been tested in Q studies, but also because they (especially PCA) are *relatively* simple mathematical operations with limited assumptions that can be readily defended.

Readers will also be relieved that only the chosen procedure will be presented in some mathematical detail; the *choice* among PCA, CFA and PAF can be well-justified on a verbal level.

The choice boils down to two features of the three methods: a supposed indeterminacy and a possibly implied latent variable model.

 | Indeterminate | Determinate
 -|----|----
 Latent Variable Model | Centroid Factor Analysis (CFA) | Principal Axis Factoring (PAF)
 Linear Composition |  | Principal Components Analysis (PCA)


#### Latent Variable Model or Linear Composition

Centroid Factor Analysis (CFA) and Principal Axis Factoring (PAF) are both *latent variable models*.
They *estimate* a smaller set of latent variables (here: Q factors) from a larger set of *manifest* variables (here: individual Q sorts).
By the same token, these techniques allow for *error*: any particular *manifest* respondent's Q sort would be explained by her loadings on a set of *latent* factors, plus some residual error term, accounting for the particularities of that person's Q-sort.
As such, CFA and PAF summarize only the *common* variance among the sorts, but not the *specific* variance.

By contrast, Principal Components Analysis (PCA) considers all of the available variance, common *and* unique.
There is no concept of measurable manifest and underlying latent variables in PCA: as will be described later, the principal components are merely a compressed "vector recipe" to re-expand to the original correlation matrix.

This difference between CFA and PFA on the hand and PCA on the other also reflects in the correlation matrix passed on to the data reduction technique.
Recall that the diagonal in a correlation matrix are all ones; by definition, a Q sort is perfectly correlated with itself.
PCA takes this original correlation matrix as the starting point, with all ones in the diagonal.
If expanded again, the summary produced by (a complete) PCA would yield the original data, including the idiosyncrasies of individual sorts.
CFA and PAF start from the same correlation matrix, but replace the diagonal with some measure of that sort's degree of *common-ness*, or shared variance with the other Q sorts.
PAF places the communality $h^2$ of a variable (here: Q-sort) in the diagonal, the squared factor loadings for any given Q-sort across all the factors, indicating the percentage of *total* variance of a given Q-sort explained by all the factors.
For CFA, Brown also suggests other diagonal substitutes, including estimated test-retest reliabilities which will be discussed later [-@Brown1980: 211].
<!-- MH TODO add link to later discussion of reliability -->

Brown --- as well as many orthodox Q methodologists --- recommend CFA for its latent variable model because as a "closed model" PCA "assumes no error" [-@Brown1980: 211].
Brown is of course correct that a person is unlikely to reproduce the same Q-sort after a day or two *exactly*, though it appears implausible that her subjectivity on a general topic such as taxation and the economy would have perceptively changed in that time frame.
It is less clear however, that such test-retest reliabilities --- or any other substitution on the diagonal --- would effectively identify anything that might be called *error*.
As Brown is quick to point elsewhere, there *is* no external standard of one's subjectivity, denying conventional notions of validity.
If a Q-sort at any one point in time *is* operant subjectivity, how would we know that any difference to a later Q-sort would be in error?
It may well be that such test-retest differences reflect spurious *fluctuations* in subjectivity, but any one snapshot of such variable subjectivity would still be accurate for that moment in time.
We might want to consider such reliabilities as *resolutions*, when interpreting the factor scores, or when comparing Q-sorts after some "treatment" (as is the case here), but for now, we want a summary model that also explains this supposed "noise" in a person's Q-sort.
<!-- MH TODO link to distinguishing statements -->
As such, we would want a summary model that also explains this supposedly "unreliable" component of a person's Q-sort. [^average-sorts]

[^average-sorts]: We may alternatively filter out these possible fluctuations by looking at *average* Q-sorts, but that would imply a different operationalization as *medium-term* subjectivity, would greatly complicate the procedure and might  raise additional practical problems, including saturation and consistency effects.
<!-- TODO: find source -->

A latent-variable model, more generally, would appear to fit awkwardly into Q methodology.
Consider, by contrast, an exploratory factor analysis of burnout, a supposedly externally valid phenomenon that cannot be measured directly (example taken from  @Field-Miles-2012: 750).
In such a study, we may legitimately be interested *only* in those variables that are highly correlated *among* burnout patients, such as motivation, sleeping patterns, self-worth and so on.
If and to the extent that the sampling frame (burnout) is externally valid --- a big if --- we can draw a meaningful distinction between common and specific variance.
If, for example, only some burnout patients report back pain, we may reasonably consider this to be specific variance related to another (skeletal-muscular) condition, and disregard it for the present model.
In the correlation matrix of variables, we may want to replace the diagonal entry for the back pain variable with something *less* than one, because we do not require a model that also accounts for levels of back pain.
The sampling frame "burnout patients" implies --- however justifiably --- a latent variable (burnout), that we expect behind all shared manifestations.
Conversely, we would only want to model patterns of variable correlations that are widely shared by our burnout patients, and would therefore greatly "discount" idiosyncratic variables such as back pain.


<!-- This is just for future reference, it's a good quote
> @Fabrigar-Wegener-2012 31
> "the common factor model was formulated as a general mathematical framework for understanding the structure of correlations among measured variables.
> It postulated that correlations among measured variables can be explained by a relatively small set of latent constructs (common factors), and that each measured variable is a linear combination of these underlying common factors and a unique factor (comprised of a specific factor and random error).
> PCA differs from the common factor model in several notable ways (Wildama, 2007, see also Thomson, 1939)
> First, PCA was not originally designed to account for the structure of correlations among measured variables, but rather to reduce scores on a battery of measured variables to a smaller set of scores (i.e. principal components).
> This smaller set of scores are linear combinations of the original measured variable scores that retain as much information as possible (i.e., explain as much variance as possible) from the original measured variables.
> Thus, the primary goal of PCA is to account for the variances of measured variables rather than to explain the correlations (or covariances) among them.
> Similarly PCA was not designed with the intent that the principal components should be interpreted as directly corresponding to meaningful latent constructs.
> Rather the components simply represent efficient methods of capturing information in the measured variables (regardless of whether those measured variables rrepresent meaningful latent constructs).
> " -->

Recall, again, that in Q methodology, items are cases, and participants are variables.
Q methodology has a *case* sampling frame, too, but these cases are *items* in Q: it is assumed that all items relate to taxation and the economy. [^tax-sampling-frame]

[^tax-sampling-frame]: As discussed elsewhere, this sampling frame also needs to correspond to *one* externally valid concept, or be *commensurate* in Q terminology.
  A Q-sample of items on, say, cat personality and architecture may not make much sense.

It is not clear, however, what the equivalent to the back-pain scenario of a "stray" variable in Q might imply.
It is possible that some participants (as the variables) will have less in common with everyone else than others.
In this study the degree of common variance, provisionally measured as the average correlation per participant row (or columns) in the correlation matrix, ranges from `r min(apply(cor[,,"before"], 2, mean))` to `r max(apply(cor[,,"before"], 2, mean))`.
<!-- TODO add explanation what happens with stray items (high variance) in the current design -->

```{r sum-corr, echo=FALSE, include=FALSE}
# look at it provisionnally with only the avg cor
range(apply(cor[,,"before"], 2, mean))
summary(cor[,,"before"])

# look at the summary loadings of the full model later
#keyneson$before$loa
#apply(keyneson$before$loa, 1, sum)
#range(t(keyneson$before$loa))  # have to transpose
#summary(t(keyneson$before$loa))  # have to transpose
# TODO really implement this, include as results
```

Applying the latent-variable logic of an ordinary, items-as-*variables* factor analysis to Q, we might then conclude that the relatively uncommon Q-sorters introduce idiosyncratic specific variance, and ought to be discounted in the model.

At least in this study, such discounting does not make sense, because we are principally interested in *all* Q-sorters, qua being participating citizens.
Recall that when studying burnout --- but not back pain --- there are many variables that you are *not* interested in, even though it may not be clear at the outset what these are.
In contrast to a burnout study, there is *no* conceivable variable-participant that we would *not* be interested in.
Health symptoms may --- or may not --- have a bearing on burnout, but *all* citizens do have a bearing on common understandings of taxation and the economy. [^q-with-latent-var]

[^q-with-latent-var]: There may be Q studies with an explicit interest in only the factors shared by a subset of well-correlated participants, as might be the case for a study into hegemony.
  I am not aware of any Q study explicitly espousing such a latent-variable model.

<!-- TODO to clarify, add table here with items, cases, variables, and what is common/specific -->

A latent-variable generally seems at odds with the scientific study of human subjectivity.
In contrast to many R-type exploratory factor analyses, per the abductive logic of Q, there *is* no preconception of what might delineate specific from common variance in substantive terms.
To the extent that all Q sorters participate in the same realm of communicability, and share in the same concourse --- as is assumed by the respective theories for any modern society --- we are equally interested in maverick and mainstream Q sorts.
<!-- TODO add sources for those theories -->
In fact, to discount the poorly correlated Q-sorts would be to (slightly) stack the deck in favor of the one straightforwardly *falsifiable* hypothesis that q methodology admits of: namely, that viewpoints are, in fact *shared* (see later discussion on the number of factors). [^why-not-discount-items]

<!-- TODO maybe reference here already the justification/discussion of flagging, which would appear to contradict this argument; I guess the rescue is that flagging is only to "clean up" the ideal points, not anything else. -->

[^why-not-discount-items]: Readers may object at this point, that in fact, Q methodology *does* imply a latent-variable model concerning the *items*.
  Operant subjectivity on the (somewhat haphazardly) sampled items is taken to be *manifestly* representative of similar, *latently* shared viewpoints in a  broader concourse.
  One may, consequently, be inclined to discount those items that are less correlated with all other items.
  Here, as often, the uncommon correlation matrix of Q implies an unintuitive logic.
  Items, in spite of their verbal form, are *cases*, not variables --- latent or otherwise.
  To discount Q-items at this point would be analogous to discounting participant-cases --- not variables --- in an R-type factor analysis.
  Clearly, some of the sampled items may be more expressive of a shared viewpoints than others, but that distinction is relevant only later in the analysis, when shared, *ideal-type* viewpoints are interpreted.
  By analogy to R-type factor analysis, some people may also be more expressive of some burnout-factors than others, but that becomes meaningful only *after* the factor extraction.
  We might --- again, analogously --- be particularly interested in the *ideal-typical* burnout-cases, and interpret them in more detail, but only later.

<!-- TODO check kline 1994 in Watts/Stenner on PCA vs CFA -->
<!-- TODO read  Harman 1976 in the above on PCA vs CF -->


#### In(determinacy)

Proponents of Centroid Factor Analysis (CFA) often cite the technique's supposed indeterminacy in favor of the method.
It might seem odd that a statistical procedure may be favored for *not* producing the same result, with the same data.
This unfamiliar criteria might be best understood in the historical context of factor analysis, as Peter Schmolck recently suggested.
<!-- TODO actually, it's sthis email by Schmolck https://listserv.kent.edu/cgi-bin/wa.exe?A2=q-method;5c40e976.1206 -->
<!-- TODO cite Peter schmolck email with McCloy, Metheny and Knott 1938 email -->
Different methods of factor analysis and rotation were originally developed to summarize differences in human personality and intelligence.
Some of these early factor analysts were convinced that alternative theories of personality and intelligence could be adjudicated purely based on the *mathematical* elegance of the extraction and rotation method that produced results in line with these theories.
<!-- FIXME need more sources here, better description -->
Against this backdrop, Stephenson's insistence on a supposedly more flexible method such as Centroid Factor Analysis seems plausible.
<!-- TODO find stephenson source on using centroid -->
Convincingly, @Stephenson1953 points out that "it is difficult to accept any one kind of geometrical substructure, as, in principle, the only basis for inference" [-@Stephenson1953: 41].

It is, however, less clear what the suggested indeterminacy of CFA means, and how it might meaningfully enhance an analysis in practical terms.

CFA is often associated with judgmental rotation and PCA with automatic procedures, such as varimax [@Brown1980: 33].
While these extraction and rotation procedures may have evolved in historical tandems, there is no inherent logic tying them together.
A CFA can be automatically rotated just as a PCA can be judgmentally rotated.
In fact, as Peter Schmolck has pointed out, *already* back in Stephenson's day, factor analysts pointed out that PCA, too, admits of an *infinite* number of rotations, from which one must be chosen for mathematical (varimax?) and/or theoretical (judgmental) reasons: "the factors found by the Hotelling method of principal components present the same necessity for rotation as those found by the Thurstone method of multiple factors" (McCloy, Metheny & Knott 1938).
<!-- TODO this is again, peter schmolck 2012 exchange, add page number, too -->
The question of which rotation procedure to use then arises for *every* possible extraction method and will be further discussed later on.

[@Brown1980: 211] seems suggests two additional reasons for  CFA, though a clear and accessible statement of advantages *other* than the historically associated rotation procedures is hard to come by.

1. Because CFA operates directly on the correlation matrix, diagonal entries can be easily replaced, as the researcher sees fit.
   @Brown1980 suggests various replacements on the diagonal of the correlation matrix, including communalities and test-retest reliabilities [-@Brown1980: 211], though any choice of such replacements would appear to imply one or another latent variable model.
   At least if a latent variable model is deemed inapplicable, as is the case here, so is this rationale for using CFA.
2. Fundamentally, CFA is indeterminant, because it involves *reflecting* of variables in the correlation matrix, a process that can be done in varying orders, and with varying levels of skill.
<!-- TODO for reflecting source is Holzinger 1946 in Brown 1980 -->
   Because CFA extracts factors by dividing row (or column) sums by the square root of the *sum total of all correlations*, it requires maximally *positive* correlation coefficients.
   To achieve that, negative-sum rows (and columns) are multiplied by $-1$ (reflected), effectively mirroring that Q-sort around the center.
   The factor loadings of the respective rows (or columns) are later multiplied by $-1$ again to restore the original orientation.
   The operation, done manually, is quite demanding, because an earlier reflection of a *row*, will affect other *column* sums and may require additional multiplications.
   Depending on the order in which variables are reflected, different *factor* loadings will result from the downstream extraction, with an infinite number of equally "correct" models.
   By contrast, a procedure like PCA will yield only *one* correct (unrotated) model: that which explains the greatest amount of variance.
   Oddly --- for the proponents of CFA --- such a model of maximum explained variance *also* exists theoretically for the centroid method, but is not identified by the procedure.
   It is this formal equality of *different* resulting models in CFA that supposedly invites judgmental rotation and strengthens the researcher's interpretation.
   From Brown's exposition, it remains unclear, however, how the reflection procedure --- a very early step in the analysis --- may conceivably be informed by a researcher's substantive abductions, because these hunches would need to be in the form of more or less plausible ideal viewpoints, a summary that is not available until much later in the procedure.
   It is hard to imagine how even knowledgeable and skillful researchers might be *substantively* guided in this technical and preparatory step.
   It is entirely unclear what reflecting one column first over another might *mean* in terms which only become visible *through* that very procedure.

   In addition --- though this would be remediable --- most, if not all, recently published Q studies *do* rely on computer algorithms for extraction, including the reflection stage.
   While these algorithms may proceed differently, and, accordingly, produce different results, these differences are caused by the heuristics of *programmers*, not the abductions of researchers.

In conclusion, the vicissitudes of Centroid appear less as Q epistemological necessity, than a product of its genesis.
As [@Brown1980: 209] admits, CFA was developed in the era before electronic computing "largely because of its *computational ease*", and "as only an *approximation* to the more refined methods such as principal axes" (emphases added).
<!-- TODO additional source for "computational expedient when it became apparent that the principal-factor solution was too laborious" is from Harman 1967:5 -->
Keeping in mind the necessity for rotation, Centroid Factor Analysis can, at least for this study, be considered technically "obsolote" [@Yates-1987 1] and historical "dead freight" of Q methodology.
<!-- TODO Schmlock source: email from june 2nd more on this: https://listserv.kent.edu/cgi-bin/wa.exe?A2=ind1206&L=q-method&P=R2&1=q-method&9=A&J=on&d=No+Match%3BMatch%3BMatches&z=4 -->

Principal Components Analysis (PCA) then appears the most appropriate data reduction technique for this study, if not Q methodology in general.


### Principal Components Analysis in Q Methodology
<!-- aka: what does it do -->

<!-- TODO(maxheld83) in the tree chart, make it maybe a bit tighter in the middle, that's where this happens, where we are at at the dimensionality level, going back to items -->

Principal Components Analysis (PCA), originally developed by @Pearson-1901 and (independently) by @Hotelling-1936 @Hotelling-1933, is a well-rehearsed technique with several good introductions [@Child-2006, @Fabrigar-Wegener-2012, @Basilevsky-1994,Uslaner-1978, @Bartholomew-Steele-etal-2011, @Kim-Mueller-1978, @Cureton-DAgostino-1983, @Yates-1987,Mulaik-2009].
There is, however, no published discussion of an application of PCA to Q methodology and its unfamiliar correlation matrix of people-variables.
Here again, no reiteration of elementary multivariate statistics will be necessary (nor possible), but central concepts must be briefly rehashed in their application to Q data.

Recall that exploratory factor analysis in Q serves as a *data reduction* technique, in our case from `r length(colnames(q_sorts))` individual Q-sorts by our participants (as variables) to a smaller number of shared, ideal-typical Q-sorts.
More precisely, this process *attempts* to reduce the dimensionality of the original data, from `r length(colnames(q_sorts))` dimensions (one for each participant-variable) to a lower-dimensional space, while accounting for as much of the original Q sorts as possible.

PCA accomplishes this by decomposing the dataset into orthogonal (that is, uncorrelated), *eigenvectors*.
Because `r length(colnames(q_sorts))`-dimensional spaces as are hard to wrap your head around, I illustrate how PCA operates on Q data using only selected *pairs* of Q-sorts: the most positively, most negatively and lowest correlated pair.
The results of such an attempted data reduction from two to one dimension are emphatically *not* interesting in substantive terms (because there *are* more participants), but these subsets serve well to dissect the mathematical operations.

```{r eigen-examples, include = FALSE, eval = FALSE}
source(file = "R/q.eigenpair.R")  # this produces all the example objects
eigen.examples <- Eigen.Illustrate(data = q_sorts[,,"before"])
#eigen.examples <- Eigen.Illustrate(data = q_sorts)
# do.call("grid.arrange", c(t(eigen.examples[,c(4,6)]), list(nrow = 4, ncol = 2)))  # too small to be informative
```

PCA can be (equivalently) described in at least two formalism: linear and matrix algebra.
Throughout the following examples, I rely mostly on the graphical, linear algebra but also briefly present the same operations in matrix form.


#### Eigenvalue Decomposition

The central operation in PCA, eigenvalue decomposition, can be readily illustrated for 2-dimensional data using familiar plots.
The below panels display the scatterplot of the original people-variables with overlayed eigenvectors (what might be called *eigenplots*) on the left, and the same data plotted against their respective eigenvectors as axes, with the original people-variables on top (known as *biplots*). [^source-dallas]
The points in both plots are the item-cases, that is, the Q cards positioned according to their ranks by the two Q sorters. [^overplotted-points]
To remind readers that the data points are *item-cases*, not people-cases as in R-type correlations and PCA, some points are labeled.

[^overplotted-points]: Because Q sorting allows ties, several data points overlap: *several* statements are placed in the same column by both sorters.
  To counter overplotting, data points are partly transparent and darker points indicate several points in the same position.

[^source-dallas]: The below graphs are inspired by [George Dallas](https://georgemdallas.wordpress.com/2013/10/30/principal-component-analysis-4-dummies-eigenvectors-eigenvalues-and-dimension-reduction/) tutorial on PCA [@Dallas-2015].

Beginning with the most highly correlated pair of Q sorts, ` eigen.examples[["positive","pairs"]]`, the left-hand eigenplot in the below figure displays a cloud of points, scattered around an upward-sloping axis, as would be expected for a positive correlation.

```{r linear-algebra-positive, fig.cap = c("Eigenvector Extraction from Positively Correlated Pair"), echo = FALSE, purl=FALSE, eval = FALSE}
do.call("grid.arrange", c(t(eigen.examples[c("identity"),c(4,6)]), list(nrow = 1, ncol = 2)))
```

The *first* principal component (PC) of any data is the axis through this 2-dimensional cloud of points *along which there is the most variance*.
This axis can also be imagined as the longest diameter of an ellipsoid fitted around the data points.
Because the two sorters ` eigen.examples[["positive","pairs"]]`, are *positively* correlated, this ellipsoid is oblong along a diameter with a positive slope. [^peculiar-eigenvec]

[^peculiar-eigenvec]: Because of the peculiarities of forced-distribution Q data, the first PC is always unit-sloped, though this is not a characteristic of PCA in general.
<!-- TODO better make sure this is in fact an artefact of the data, and not some screwup in my vis -->

Each principal component consists of a pair of *eigenvector* and *eigenvalue*.
The eigenvector describes the *direction* of the principal component, and the eigenvalue constitutes the *length* of that vector, illustrated with the small arrows plotted over the PC axes in the above figure.

The same variance-summarizing can be illustrated using matrix algebra, as in formula (@matrix-positive).
The relationship between the two sorters ` eigen.examples[["positive","pairs"]]` is here presented as a familiar correlation matrix, with a diagonal of `1`.
The eigenvector is simply that vector, which, multiplied with a scalar (the eigenvalue, actually), yields the original matrix, multiplied by the eigenvector.
Somewhat trivially for a two-dimensional correlation matrix, this first eigenvector in matrix form merely extracts the direction (notice the slope `1` of the vector (`.71, .71`)!) and length (`1.6`) of the main diagonal, the first PC.

(@matrix-positive) $$
```{r matrix-positive, echo = FALSE, results = "asis", eval = FALSE}
cat(eigen.examples[["positive","formula"]])
```
$$

Once the first PC is extracted, PCA continues to the next, orthogonal axis along which data points (here: Q cards as item-variables) have the second-largest variance.
This process continues until there are no possible orthogonal axes left, which is the case when there are as many PCs as there were people-variables.

In our exemplary pairs, the second PC is both easy to find and trivial: because there are only two dimensions to begin with (one for each Q-sorter in the pair), there is *only one* remaining orthogonal axis.

Crucially, a complete PCA down to the last component *does not*, in fact, reduce  dimensions.
That is precisely the point of an orthogonal transformation: it merely re-organizes the data on a new set of axes, with axis sorted by the amount of variance they account for.
The result can be seen on the right-hand side of the above figure: the *same* data is here plotted on the two PCs.
The first PC, by definition, explains more of the variance in the data, and the points are consequently more spread out on this horizontal dimension than on the vertical dimension.

Because the principal components are just linear combinations of the original axes, these people-variables, ` eigen.examples[["positive","pairs"]]`, can also be added to the left-side biplot, by placing them according to their respective (unrotated) loadings on the two components (the eigenspace).
These (unrotated) loadings indicate how much of the respective original people-variable is explained by the two PCs.
In our example of complete extraction, by construction, the two PCs together explain *all* of the original people-variables. [^unrotated-ex]
Consequently, multiplying the respective PC scores of the data points (statement-cases) with the loadings of the two original people-variables yields the original data.
This equivalence of the two plots in the above figure is borne out on close inspection: the left-hand side is, in fact, merely a 135° rotation of the original data.

[^unrotated-ex]: The unrotated loadings are characteristically uninformative; they merely split the explained variance in half between the two PCs.


#### Operant Subjectivity as Vector Bundles

Maybe more so than in R-methodological research, the language of linear algebra is well-suited to describe Q, and captures its epistemology with some surprising mathematical precision.

Recall first, that an individual sorting act, of, say `growth-empty` under column `5` is conceived as *operant subjectivity*, that is, subjectivity as and through *behavior*.
Rather than passively measuring some (scalar-type) magnitude of agreement with, or distance from a set of items, a Q sorter will *impress* her subjectivity onto the limited Q set of statements she is given by *displacing* it from the neutral origin of `0`, from which the Q distribution "distends" to both extremes.
<!-- TODO find quote for distends -->
An individual data point in Q (say, `growth-empty` under column `5`) is therefore, quite literally, best understood as a one-dimensional *vector* associated with a statement: it has a *magnitude* (`5`) *and* a *direction* (right of `0`), and physical work was done for this *displacement*.

Recall further, that respondents express their subjectivity using a *sample* of Q items.
Any one Q-sort is therefore a family of one-dimensional vector spaces, one for each sampled items.
Topologically, operant subjectivity in general is then --- loosely --- defined as a *vector bundle* over some continuous (high-dimensional?) space of human communicability, that sum of things that could be said about any subject.
<!-- TODO talk to jan whether this language is actually precise, mathematically -->
A particular subject and associated condition of instruction are then a *plane* through that space, in this case, the *concourse* of things that could be said about taxation and the economy.
The present Q study further reduces the vector bundle by sampling `r nrow(q_sorts)` statements from their respective bases.

The eigenvectors and -values of PCA, illustrated in the above example, now extract --- to the extent possible --- *common* sets of movements of item-cases *across* the people-variables.
Perplexingly, the Principal Components (precisely: their scores) are, themselves, *vector bundles*, that is, sets of displacements shared by several respondents.
Loosely, but helpfully, the first PC in the above example can be imagined as a set of *common* displacement instructions shared between people-variables: move `growth-empty` from `0` to `5`, move `ability-2-pay` from `0` to `2`, and so on.
Each individual Q sort, such as ` eigen.examples[["positive","pairs"]]` could be completely *recreated* using several sets (here, two PCs) of such instructions, each weighted by that respondents loading.

Throughout these operations, a Principal Components Analysis (PCA) maintains the epistemological and ontological stance of operant subjectivity.
Appropriately, it produces results that, as Stephenson demanded, can be quite literally read as "pure behavior": PCA scores are shared bundles of vectors, telling researchers how to displace Q items from `0`, according to particular viewpoint.
<!-- TODO add source -->


#### Data Reduction

Readers unfamiliar with PCA may now wonder how the procedure can *reduce*, in addition to
*transform* data, which is the key objective in the current analysis.
In short, PCA allows us to reduce dimensions, because it re-organizes the data in a new set of orthogonal dimensions (PCs), ranked from most to least "descriptive" of the original data.
Depending on the relative amounts of variance accounted for by subsequent PCs, "later" dimensions may be dropped, with relatively little information lost.
This familiar process again takes on specific meaning in Q methodology.

A comparison of the above eigenvalue extraction of the most positively correlated pair, ` eigen.examples[["positive","pairs"]]` with the least correlated pair, `eigen.examples[["positive","pairs"]]`, illustrates the process.

```{r linear-algebra-lowest, fig.cap = c("Eigenvector Extraction from Least Correlated Pair", echo = FALSE), eval = FALSE}
do.call("grid.arrange", c(t(eigen.examples[c("lowest"), c(4,6)]), list(nrow = 1, ncol = 2)))
```

The original, uncorrelated sorts by ` eigen.examples[["lowest", "pairs"]]` can still be decomposed into eigenvectors, and transformed accordingly, but there is very little reason to drop *one* of the components, because both explain equal amounts, and about as much *as the original case-variables*.

This relative "weakness" of the first eigenvector is also obvious in matrix form in formula (@matrix-negative): The first eigenvector is essentially the same, but the associated multiplying eigenvalue is only marginally above 1, meaning that it "explains" *little more than the original case-variables*. [^eigenex-pearson]

[^eigenex-pearson]: Readers will notice an incongruence between the matrix notation and the linear algebra notation here.
  The first eigenvector in the above eigenplot has a *positive* slope, but a *negative* slope here.
  In fact, the first eigenvector in matrix notation corresponds to the *second* PC in linear algebra notation.
  This mixup is a result of technical limitations in this illustration; because no function to extract eigenvalues *and* scores from a custom correlation matrix was readily available, the two analyses use different correlation coefficients.
  The linear algebra notation is based on Pearson's $\rho$, not Spearman's $\rho$, as all the rest of the analysis.
  Apparently, the two correlation matrices are sufficiently different in this case to switch the (marginally explanatory) PCs.

(@matrix-negative) $$
```{r matrix-negative, echo = FALSE, results = "asis", eval = FALSE}
cat(eigen.examples[["negative","formula"]])
```
$$

The ellipsoid of the eigenvector plot on the left-hand side appears quite close to circular; only marginally more variance can be projected on the first PC axis than *any* other axis.
To be sure, the original vector bundles can still be reformulated in terms of eigenvalues, as in the biplot on the right-hand panel, but little is gained by such an operation.
Instead of *two* separate sets of displacement instructions, one for each participant-variable, we now have *two* compound sets of such instructions, which *both* apply to the Q sorts, but with different weights (or loadings).
For example, instead of a simple *move `growth-empty` from `0` to `5`* `eigen.examples[["lowest","pairs"]]`'s sort might now be recreated by, say, multiplying the first PC instruction (*move `growth-empty` from `0` to `6`*) by its loading (`.5`) and adding the second PC instruction (*move `growth-empty` from `0` to `-4`*), multiplied by its loading (`-.5`).
No information is thus lost, but there is also no reason to drop any of the PCs as dimensions.
Because they are uncorrelated, the sorts by ` eigen.examples[["lowest","pairs"]]` *cannot* be meaningfully compressed without loosing a lot of information.

This futility of reducing the dimensionality of ` eigen.examples[["lowest","pairs"]]` is borne out in Q methodological terms.
Their respective operant subjectivities are expressed in quite unrelated vector bundles; these *could* be eigenvector-decomposed into two vector bundles, but both of these synthetic instruction tests would be alien to the real world, and likely unintelligible.
To further illustrate, consider the familiar LEGO analogy.
Suppose that LEGO cars and houses share no common assembly instructions.
Still, imagine these two building manuals would be bunched up in *two* new, *synthetic* manuals, where, say, the respective first lines read "*take a `ramp-shaped object` and move it up `4`*" and "*take a `ramp-shaped object` and move it down `4`*".
The meta-manual for the house would then advise you to do what the first manual said, *plus* the opposite of what the second manual said.
If you follow the instructions precisely, you will still place the roofing tile at the top of the house, where it belongs, so strictly speaking, no information would be lost.
Obviously, though, the *separate* synthetic building manuals do not, in themselves, describe any *real existing* objects, and do not invite a design critique (or otherwise).
The same holds for a Q-methodological interpretation: a synthetic vector bundle that is *not*at least roughly expressive of *actual* Q sorts is emphatically *not* subjectivity turned behavior, and allows no abductive interpretation thereof, because there is no such underlying behavior to begin with, just like there is no "car-house" (granted, there *are* motorhomes).

By contrast, consider the below, spurious eigenvalue decomposition of `eigen.examples[["identity","pairs"]]` with herself in linear and matrix algebra notation.

```{r linear-algebra-identity, fig.cap = c("Spurious Eigenvector Extraction from Identical Pair"), eval = FALSE}
do.call("grid.arrange", c(t(eigen.examples[c("identity"),c(4,6)]), list(nrow = 1, ncol = 2)))
```

(@matrix-identity) $$
```{r matrix-identity, echo = FALSE, results = "asis", eval = FALSE}
cat(eigen.examples[["identity","formula"]])
```
$$

Such a perfect (`1.00`) correlation is unlikely to exist in real Q studies, but serves as an instructive boundary case.
In the eigenplot on the left, all (overplotted) items form, by definition, a perfect straight line.
An ellipsoid shaped around this data would be infinitesimally narrow, extending along all points on that line.
There is only *one* eigenvector (the second one has a length, or eigenvalue, of `0`).
Once transposed into the biplot on the right-hand side, it becomes obvious that the second PC can be dropped as a dimension: the data *only* vary horizontally, and no information is lost.
Circularly, because these two Q sorts are identical, they can be created from *one* synthetic set of displacement instructions, which we might call ` eigen.examples[["identity","pairs"]]`.
In terms of the LEGO analogy consider a scenario, in which two supposedly distinct building manuals yielded the *exact* same LEGO car –-- you might conclude that the two separate instructions were created in error, and that you might as well keep only one of them.
Notice further that the "synthetic" vector bundle describing the construction of both Q sorts, is, in fact, correlated (perfectly!) with real-existing operant subjectivity.
In this boundary case, the scores for the first Principal Component, are, in fact, not synthetic at all: they are simply `eigen.examples[["identity","pairs"]]`'s sorting vectors.

In real world PC analysis of Q data, as is the objective here, matters are a great deal more nuanced, because correlations are often neither perfect, nor completely absent.

Recall the above figure on the most strongly correlated pair, `eigen.examples[["positive","pairs"]]` for a more realistic example.
In this scenario, it might be acceptable to drop the second PC as a dimension, and call it the ` eigen.examples[["positive","pairs"]]`-factor.
This would be a *lossy* data reduction; the variance spread on the second PC (vertical axis in the right-hand side biplot) is lost.
This second PC accounts for the *difference* between the two Q sorts, as illustrated by their case-variable arrows pointing in opposite dimensions on this vertical dimension.
By dropping the second PC, we thereby preserve and interpret *only* `eigen.examples[["positive","pairs"]]`'s *shared* subjectivities, or viewpoints –-- exactly as intended here.
Consider a LEGO-built car and train as analogous illustrations; these objects share a decent amount of features (both have wheels, headlamps, windshields, etc.) but they also differ tremendously (length, weight, undercarriage).
Dropping the second PC, only a first PC of shared characteristics would remain, a set of vector bundles that might be interpreted as a `rolling-thing` factor, or something similar.
This may well be a gross oversimplification, but notice that both a `rolling-thing` and a ` eigen.examples[["positive","pairs"]]`-factor, while synthetic, *roughly* describe observed, operant subjectivity, and may be amenable to some interpretation ("*it's designed to move*", "*it's a pretty leftist outlook*", respectively.)

Of course, there is not much to reduce two-dimensional data, of pairs of LEGO objects or Q-sorts, as in the examples above.
The need to reduce dimensions becomes pressing only when the original dimensionality is overwhelming for interpretation, as is the case for the `r length(colnames(q_sorts))`-dimensional space of Q-sorts to be analyzed here.

That is the decision we turn to next: how many dimensions to drop, and why.


### Number of Factors {#nfactors}

In choosing a number of factors (or PCs, to be precise) for a Q methodological analysis, you can err both ways.
Retaining *too many* factors leaves you with spurious viewpoints which, at best, are only marginally more expressive than an individual Q sort, and, at worst, spurious amalgams of quite *different* Q sorts, all mixed together, as in the analogous "car-house" building manual.
Retaining *too few* factors, at best, you needlessly loose nuance in people's viewpoints, and, at worst, you may disregard people's subjectivity by (through rotation) subsuming them under overly broad factors.

Much of the Q literature appears to treat this choice as *methodological* issue, to be decided based on their downstream merit for factor interpretation, and generally gives the alternative *statistical* criteria short shrift.
<!-- TODO add source -->

At least in the context of this study on *changing* operant subjectivity on taxation and the economy, the number of factors is primarily the first *empirical* concern in the chain of analyses.
Varying *shared* viewpoints on taxation and the economy are primarily a *positive*, even *falsifiable* question, notwithstanding the need for downstream abductive interpretation.
Consider, for example, a hypothetical Q sort of items written in a language that the participants *did not understand*.
In that scenario, we would assume items to be sorted randomly, and would clearly expect *no* shared viewpoints.
In fact, some respondents noted during the `before` Q-sort that they did not understand some, or even many of the items.
<!-- TODO add reference to feedback here -->
Arguably, a Q-set on taxation and the economy is thus somewhere along the continuum between intelligible and "foreign language".
The *absence* of any shared viewpoints are thus a null-hypothesis of sorts, applicable even though the analysis is ultimately geared towards an *interpretation*.
We therefore need to look at competing criteria for the number of factors to be extracted in some detail.


#### Criteria Based on Eigenvalues

```{r how many factors?, include = FALSE, eval = FALSE}
howmany <- NULL

# Separately
howmany$before <- q.nfactors(dataset = q_sorts[,,"before"], cutoff = 5, siglevel = 0.05, cor.method = "spearman")
howmany$after <- q.nfactors(dataset = q_sorts[,,"after"], cutoff = 5, siglevel = 0.05, cor.method = "spearman")
howmany$before$residuals.plots[[1]]
howmany$before$

# Together
bandaf <- cbind(q_sorts[,,"before"], q_sorts[,,"after"])
# take before and after together, this is just for kicks
colnames(bandaf)[1:18] <- paste(colnames(q_sorts), "before", sep = "-")
colnames(bandaf)[19:34] <- paste(colnames(q_sorts), "after", sep = "-")
howmany$together <- q.nfactors(dataset = bandaf, cutoff = 8, siglevel = 0.05)
howmany$together$screeplot

keyneson$bandaf <- qmethod(dataset = bandaf, nfactors = 6, rotation = "quartimax", allow.confounded = TRUE, threshold = "none", cor.method = "spearman")
bandaf.loa <- q.loaplot(results = keyneson$bandaf)
bandaf.loa$f1$f2

q.corrplot(corr.matrix = cor(bandaf, method = "spearman"))
```

```{r simulate-paran-more-p, include = FALSE, eval = FALSE, cache=TRUE}
library(paran)
resparan <- matrix(data = NA, nrow = 400, ncol = 400)  # make empty results object
for(i in 2:401) {  # loop over number of variables
  rdata <- matrix(data = rnorm(n = 77, mean = 0, sd = 2.8), nrow = 77, ncol = i)
  resparan[1:i, i] <- paran(x = rdata, iterations = 100, centile = 95, quietly = TRUE, status = FALSE)$RndEv
  print(paste("Calculating for", i, "variables."))
  flush.console()
}
resparan30 <- matrix(data = NA, nrow = 400, ncol = 400)  # make empty results object
for(i in 2:401) {  # loop over number of variables
  rdata <- matrix(data = rnorm(n = 30, mean = 0, sd = 2.8), nrow = 30, ncol = i)
  resparan30[1:i, i] <- paran(x = rdata, iterations = 100, centile = 95, quietly = TRUE, status = FALSE)$RndEv
  print(paste("Calculating for", i, "variables."))
  flush.console()
}
str(resparan30)
library(reshape2)
resparan30.long <- melt(data = resparan30, varnames = c("ncomps", "nvars"), value.name = "randomeigen")
resparan.long <- melt(data = resparan, varnames = c("ncomps", "nvars"), value.name = "randomeigen")
summary(resparan30.long)
library(ggplot2)
ggplot(data = resparan.long, mapping = aes(x = ncomps, y = randomeigen, group = nvars, colour = nvars)) + geom_line() + geom_hline(yintercept = 1, colour = "red")
g77 <- ggplot(data = resparan.long, mapping = aes(x = nvars, y = randomeigen, group = ncomps, colour = ncomps)) + geom_line() + geom_hline(yintercept = 1, colour = "red", ) + geom_vline(xintercept = 77, colour = "green") + ggtitle("N = 77 item-cases") + ylim(0,25)
g30 <- ggplot(data = resparan30.long, mapping = aes(x = nvars, y = randomeigen, group = ncomps, colour = ncomps)) + geom_line() + geom_hline(yintercept = 1, colour = "red", ) + geom_vline(xintercept = 30, colour = "green") + ggtitle("N = 33 item-cases") + ylim(0,25)
library(gridExtra)
grid.arrange(g77, g30, ncol = 2)
g30
#TODO(maxheld83) also try this for different numbers of cases, N
#TODO(maxheld83) look only at the dispersion of the first PC, others are not so interesting (says Jan)
#TODO(maxheld83) random matrix theory might be the place to look
#TODO(maxheld83) make paper out of this for quality and quantity re: Q with Jan?
#TODO(maxheld83) notice that the EVs *must* increase, because the center line of the corr matrix increase, to which the EVs must sum
#TODO(maxheld83) notice that with more vars, there will be more identical vars, that is there will be more ZERO.vectors (or close to zero) in the PCA, as a result, "earlier" eigenvalues have to grow, so it *might* end up being linear very very late in the game, with very very many people and few cards.
```

Eigenvalues and derivative statistics, summarized in the below figure, give a readily interpretable standard for deciding the number of factors: they summarize the descriptive power of each successively extracted eigenvector.
Because there are as many eigenvalues as there are people-variables in a Q study, and because eigenvalues *sum* to that number of people-variables, they can be interpreted in terms of *constituent Q-sorts*: the highest eigenvalue accounts for as much as `max(howmany$before$eigenvalues$Unadjusted)` Q sorts.
Recall that, conversely, a Q study with all zero-correlated sorts would yield eigenvalues of `1`: every extracted principal component would explain exactly as much variance as *one* original Q sort --- no data *reduction* would be possible.
A threshold eigenvalue of `1`, otherwise known as the Kaiser-Gutmann rule, [@Kaiser-1960, @Gutmann-1954] thus makes sense for a Q study, too.
*Shared* subjectivity, at least, should be more expressive than any *one* Q sort.

Eigenvalues can be easily translated into their successive contribution to the overall $R^2$ by dividing them by the number of people-variables in the study.
The first and highest principal component accounts for a dominant `max(howmany$before$eigenvalues$R2)` share of the variance, followed by distant second and third principal component.

```{r ev-table-before, eval = FALSE}
knitr::kable(howmany$before$eigenvalues)
```

By the Kaiser-Gutmann rule, also plotted as a red line in the below figure, we may retain up to ` howmany$before$summary["KaiserGuttmann", "NFactors"]` factors.

```{r screeplot-before, eval = FALSE}
howmany$after$screeplot
grid.arrange(howmany$before$screeplot, howmany$after$screeplot)
```

In addition to the level, we may consider the *change* in successive eigenvalues, as additional components are retained.
@Catell-1966 has suggested a visual criterion based on the above scree plot to guide the decision on how many factors to extract.
According to Catell, factors should be retained up to an "elbow" point, beyond which successive components drop gently in (unadjusted) eigenvalue, akin to a scree field at the foot of a steep mountain.
A scree test often involves some subjective judgement, especially when the eigenvalue curve has several elbow points, or falls gently over all components.
In this case, components beyond `3` can probably be considered "scree" to be safely dropped from further analysis, though the sharpest turn occurs already at `2`.
Under the scree test, either `2` or `3` factors should be retained for further analysis.

The Kaiser-Guttman and Catell criteria both rely on "raw", *unadjusted* eigenvalues.
While this makes intuitive sense, it risks overretention, especially when sample sizes of *item-cases* are relatively low, as is always the case in Q studies.
In small samples, some correlations may persist due to "sampling error and least-squares 'bias' " [@Horn-1965: 180], where in larger samples, these spurious similarities would likely cancel out.
We can readily imagine how this might play out in a Q study: even if the item-cases were in a foreign language, because they are so relatively few, given a sufficiently large number of people-variables, some Q sorts will likely share some variance by random chance.
From these fluke correlations, Q methodologists might then extract entirely spurious factors, and arrive at interpretations however grounded in their abductive faculties, but unsupported by facts.

To avoid such overretention, John Horn has suggested a procedure called "parallel analysis".
The procedure, here based on an implementation in R by @Dinno-2012, repeatedly extracts factors from *random* (thereby uncorrelated) data, with the same range and rank as the observed Q study. [^more-parameters].
As suggested by @Glorfeld-1995, the Dinno's parallel analysis calculates not only the mean of randomly extracted eigenvalues, but, using Monte Carlo estimations, calculates biases at arbitrary centiles, here set to `siglevel = 0.05`.
The random eigenvalues reported in the above table and plot are the limit *below* which 95% of all eigenvalues from random data fall.
The *adjusted* eigenvalues are the observed eigenvalues, corrected for the random component.

[^more-parameters]: @Hayton-Allen-etal-2004 urge that parallel analysis should be further parametrized to fit the observed data, but @Dinno-2009 disagrees and finds that given a sufficiently large number of iterations, additional parameters are unnecessary.
  The present analysis is run with `iterations = 10.000` random datasets.

Using the familiar LEGO analogy, unadjusted eigenvalues as "magnitudes of overlap" between different brick constellations, may express similarities between structures which arose out of random chance.
Imagine, for example, a set of LEGO bricks placed into a washing machine for a spin cycle. [^LEGO-washing]
The bricks may well assemble into partly similar structures merely by random chance, such as a stack of colored bricks forming a seemingly meaningful national flag.
Parallel analysis estimates the overlap between different brick structures that may, in a 95% percentile "worst case scenario" result from random chance only and then *subtracts* such a possible random component from any supposed overlap between, say, car-like and house-like objects assembled by real builders.

[^LEGO-washing]: Incidentally, LEGO bricks in a spin cycle *have* been used to model the genesis of complex structures from simple parts by random recombination, such as in evolution [@Althofer-2014].

Applying Horn's parallel analysis and Glorfeld's extension to Q methodology, we can be 95% certain only that the first *two* factors are *not* a product of random chance *and* explain more than any one constituent Q-sort.
Parallel analysis posits a fairly conservative standard characteristic for *positive* research: at the 95th percentile --- as opposed to the mean --- it sharply favors false negatives over false positives.
There *may* well be *real* factors beyond the 2nd principal component, but unless we can be at least 95% sure of their existence, we assume them to be spurious [^50th-percentile].
This strict standard is in line with mainstream empirical research, and serves well to *falsify* shared subjectivity on taxation and the economy: under a positivist standard, it may be preferable *not* to pollute scientific debate with a result unless proven robust by a conventional probability (95%).

[^50th-percentile]: It should be noted that in this case, even a more lenient 50th percentile would not change the result: only two factors are distinguishable from random data eigenvalues.


<!-- TODO: also calculate adjusted R2 -->


#### Eigenvalues and Q Methodology

Some leading Q methodologists are, perhaps unsurprisingly, skeptical about statistical criteria for factor retention in general, and eigenvalue-based thresholds in particular.

<!-- TODO write a paper / something out of this, maybe with jan? -->
<!-- Consider email from schmolck on this, very insightful -->

<!-- From Brown email response:
The drive to find a universally acceptable criterion for determining the number of factors—so that this particular loose end can be tied up once and for all—takes its place alongside principal components analysis, simple structure, Cronbach’s alpha, and other predetermined procedures, and Gigerenzer (1987) provides an interesting take on this, referring to it as “the fight against subjectivity” (i.e., against the investigator’s subjective judgment) through the mechanization of knowledge, which leads to objective knowledge (presumably) that is independent of a knower.  As Stephenson (1972) said in this regard, “objective measurements and observations can, in principle, be made by everyone (or by a piece of apparatus)…” (p. 17, italics in original).  This is protopostulatory to the social sciences (in the R-methodological mode) and is motivated, according to Gigerenzer, by a desire to approximate the successes of classical physics, which assumes a concrete reality that is separate from the knower and independent of the instruments employed. -->

Convincingly, Brown reminds readers that "eigenvalues and total variance are relatively meaningless in Q-technique studies as they depend to too great extent on the arbitrary number of persons included in the study who happen to be of one factor type, rather than another" [@Brown1980: 233].
Notice that in Q studies, *both* the item-cases and the people-variables are *not* randomly sampled.
The Q-set of item-cases, as discussed before, cannot be randomly sampled because there is no theoretically enumerable universe of the concourse.
<!-- TODO link to earlier discussion -->
The P-set of people-variables *may* theoretically be randomly sampled, but is usually not, and will arguably always remain self-selected for deliberation, as is the case here.
In any event, because P-sets are *variables*, they carry no presumption of representation: much like in an R-methodological study, you include variables because they may be *interesting*, not because they are a random subset of all *possible* variables.
<!-- TODO link to earlier discussion on saturation sampling -->
Conversely, a set of only very few correlated variables does not make the resulting factor any less interesting, *if* we can be sure that the relationship is not a product of random chance.
This is emphatically so for Q methodology, which, despite the appearance of *clustering* people is *not* designed to measure the relative prevalence of viewpoints, but their substance. [^sample-problem-eigenvalue]

[^sample-problem-eigenvalue]: It is sometimes suggested that parallel analysis would be *especially* vulnerable to the arbitrary composition of the P-set.
  That is not so: *all* eigenvalue-based criteria, including the seemingly more "interpretative" scree test are equally influenced by the P-set.
  Parallel analysis merely shifts the threshold for acceptable eigenvalues by discounting for possible random relationships.
  The procedure itself, because it draws many *random* datasets is entirely unaffected by the P-set.

Variable selection and its effects on factor and other multivariate analyses are not peculiar to Q methodology; they affect R-type research in similar ways.
Contrary to R-technique, however, Q methodology --- at least in its current incarnations --- does not have the luxury of increasing the sample size of item-cases to filter out random associations: people can be expected to sort only a limited number of items, with the current `nrow(q_sorts)` already at the upper bound.

This leaves eigenvalue-based criteria, and especially the stringent parallel analysis in a seemingly awkward spot within Q methodology: under their veneer of scientific objectivity lies the arbitrariness of any particular P-set, in this case, the participants at the first CiviCon Citizen Conference on Taxation.
With different, especially with more politically *diverse* participants, more, and different factors may have arisen.

Entirely compatible with this caveat, eigenvalue-based criteria and parallel analysis do, however, constitute *sufficient* --- not merely *necessary* --- conditions to identify shared subjectivities.
A factor retained under Horn's stringent criteria from any one, however arbitrary P set, *will* at the specified 95% likelihood *persist* through future Q studies with different participants.
Even if the factor could *not be found* on a future P set, using the same Q set, we could be certain that, if factor analyzed together, the original factors would again yield similar eigenvalues.
In fact, *because* Q methodology is not concerned with the prevalence of factors, the original factors would, by construction, persist even in an arbitrarily large and diverse P set.
On the one hand, the covariance explained by the original factors may, in an expanded Q study, be subsumed by other, more refined factors.
But on the other hand, in the unlikely scenario that even among thousands of other Q sorts, they were unique, *the original factors would still be retained* with adjusted eigenvalues above one.
<!-- TODO rephrase? -->
One way or another, the patterns of covariance robustly identified on the basis of eigenvalues and parallel analysis *will* remain recognizable in superseding P sets.

Conversely, would-be factors dropped from the analysis under eigenvalue-based criteria may well fulfill some *necessary* conditions for shared subjectivity, such as making abductive "sense", but we *cannot* be certain that these patterns will persist in a larger, or more diverse, P-set.

To err on the side of caution, and to drop factors unless proven robust over alternate P-sets might be appropriate, especially considering the substantial leeway accorded to Q methodologists during downstream judgmental rotation and abductive interpretation.

Barring any innovations allowing a (far) greater number of item-cases in Q, *reliable* sensitivity can only be increased by adding diverse people-variables to the P-set.
Recall that eigenvalues and parallel analysis --- though not scree tests --- posit *absolute*, not relative thresholds: in line with Q methodology, it does *not* matter how many people-variables contribute to a factor, only that the corresponding (discounted) eigenvalue is greater than one.
In a sufficiently large and diverse P-set, even very small groups of non-randomly


While superficially alien to Q methodology, where participants constitute people-variables, greatly increasing *and* diversifying the

While the variance explained by the original factors and shared by those relatively few, and arbitrary participants, may be subsumed under broader, or more precise factors, the same relatio


Eigenvalue-based criteria and parallel analysis, do, however, remain


not only is the Q-set of item-cases *not* randomly sampled (because there is no theoretically enumerable universe of for the concourse

<!-- Notice that Johnstone et al write about this, quite recently, and worry what happens if you have quite a lot of observations, but not so many variables. Apparently (296) this is a problem that plagues disciplines *other* than Q: climate studies, etc. -->
<!-- Johnstone 298 also notes that "the smaller n/p, the more spread the eigenvalues; even asymptotically, the spread of the empirical eigenvalues does not disappear." -->

<!-- Hayton 192: praises parallal analysis (horn) as one of the most accurate, but most underutilized methods. -->

Turner 1998 as cited in Hayton 202 noted that
> due to the interdependent nature of eigenvalues, the presence of a large first factor in a PA will reduce the size of noise eigenvalues. The consequence is that in certain situations, PA can underfactor, which is potentially more serious than overfactoring. The impact of this limitation is most serious for *smaller sample sizes*, where there is a high correlation between factors, or where a second factor is based on a relatively small number of items. (!)

<!-- Talk some more about the *slope* of the scree plot, that is important – and discouraging! -->

<!-- Schmolck says Cattell speaks of random factors, not spurious -->

<!-- Notice that increases in communality need not translate into (rotated) loadings on THAT factor -->


#### Criteria Specific to Q Methodology


#### Criteria Based on Communalities and Residual Correlations

```{r residuals, eval = FALSE}
grid.arrange(howmany$before$residuals.plots[[2]], howmany$before$residuals.plots[[3]])
do.call(grid.arrange, howmany$before$residuals.plots)
grid.arrange(howmany$before$residuals.plots)
howmany$before$residuals.plots
howmany$before$residuals[c("Susanne","Sabine","Helga"),c("Susanne","Sabine","Helga"),c(2:3)]
```

Braswell in email points out that communalities are also affected by non-random sampling and in the same way, and he is correct.

Horn is the method of CONSENSUS says Dinno

scree plot is from Catell 1966b

Wilson & Copper 2008: 867 (as cited in Watts/Stenner) define Paran:
> it "shows how big the first, second, third etc. eigenvalues typically are (...) when in reality there are no factors present in the data"

Notice the problem with Brown's loadings-based criteria: these depend on the *rotation*, and the rotation, in turn, can be decided only *after* the number of factors to be rotated on.


Kaiser-Guttman criterion is EV > 1, Guttman 1954, Kaiser 1960, 1970)

Below that accounts for *less* variance than any one Q sort

"Magic number 7" is Brown 1980: 223

n/6-8 is Watts Simon Loc 2546

Humphrey's Rule: "factor siginificant if the cross-product of its two highest loadings exceeds twice the standard error" (Brown 1980: 223)

unknown / Brown 1980: 222-223 > 2 sig loading people

> The basic logic behind parallel analysis is to improve upon the eigenvalue > 1 (principal component analysis) or eigenvalue > 0 (common factor analysis), by (1) recognizing that in finite data, some eigenvalues will be greater than 1 or less than 1 simply due to chance because the correlation matrix will not be a perfect identity matrix, and (2) correcting for this "sampling and least squares bias" by averaging the each of the p eigenvalues from "many" uncorrelated data sets of the same n and p as your observed data, and retaining only those observed eigenvalues greater than the corresponding mean of random data eigenvalues.


In plain english: it is the likelihood that it is less than ... etc.

Just what bRown means with this is characteristically unclear:

> But as will be shown, insignificant factors frequently contain small amounts of systematic variance that can help in improving the loadings on a major factor, in the same way that otherwise useless autos may contain parts that can be pillaged in order to make a good car even better.
> After rotation, insigificant residual factors are merely discarded.
> -- BRown 1980Ö: 223

comment on communality later on; it stays the same no matter how the stuff is rotated

Kaiser 1960 is just all eigenvalues greater than .1

Kaiser's generally overestimates (Field Miles discovering stats 762)

- Kaiser's rule: more than EV1
- scree plot: where is the elbow
- Chi-Square: non-significant result
- Horn's parallel analysis

Techniques most commonly used include
1) Extracting factors until the chi square of the residual matrix is not significant.
2) Extracting factors until the change in chi square from factor n to factor n+1 is not significant.
3) Extracting factors until the eigen values of the real data are less than the corresponding 27
eigen values of a random data set of the same size (parallel analysis) fa.parallel (Horn, 1965).
4) Plotting the magnitude of the successive eigen values and applying the scree test (a sudden drop in eigen values analogous to the change in slope seen when scrambling up the talus slope of a mountain and approaching the rock face (Cattell, 1966).
5) Extracting factors as long as they are interpretable.
6) Using the Very Structure Criterion (vss) (Revelle and Rocklin, 1979).
7) Using Wayne Velicer’s Minimum Average Partial (MAP) criterion (Velicer, 1976).
8) Extracting principal components until the eigen value < 1.
Each of the procedures has its advantages and disadvantages. Using either the chi square test or the change in square test is, of course, sensitive to the number of subjects and leads to the nonsensical condition that if one wants to find many factors, one simply runs more subjects. Parallel analysis is partially sensitive to sample size in that for large samples the eigen values of random factors will be very small. The scree test is quite appealing but can lead to differences of interpretation as to when the scree“breaks”. Extracting interpretable factors means that the number of factors reflects the investigators creativity more than the data. vss, while very simple to understand, will not work very well if the data are very factorially complex. (Simulations suggests it will work fine if the complexities of some of the items are no more than 2). The eigen value of 1 rule, although the default for many programs, seems to be a rough way of dividing the number of variables by 3 and is probably the worst of all criteria.

<!-- Notice that Dinno has found that actually, it's not necessary to parametrize the paran -->

<!-- hayton has more details on how to do paran, cite this -->

<!-- The word is over/underretention -->


### Other tests

Some of these criteria are general others are specific to Q

- What about bartlett test?
- test of sphericity?
- test of multicolinnearity?

<!-- # Brown's two sig-loading criterion (watts Stenner 2012: loc 2551 in Kindle), also Brown 1980: 222-223 -->

<!-- Use the lego example to explain the logic of the combinatorics - this can actually be good news, even with the small number of people -->

<!-- Also notice this in the number of factors discussion: on the one hand, loom non-factor factors, spurious things that I should avoid.
On the other hand looms the danger of ignoring factors, false negatives, that area really out there and without whom the following results may turn out to be quite trivial, and not faithful to the people who volunteered these sorts. -->

Bartlett's test of sphericity tests whether it is, in fact, worthwhile at all to extract factors. (though this may not be necessary for PCA?!)


<!-- notice: there may be stuff in the data that suggests how many factors make sense (loc 2287)
notice: I am definetely doing exploratory factor analysis; no ex-ante reason to expect one or the other result -->


<!-- for why validity makes no sense, see Brown 1980:4, that may (or may not) be so, but what definietely needs to be tested is whether, in fact, these tautologically valid viewpoints are also *patterned*, wheter some shared can be extracted.-->
<!-- In this (above context) Q methodologists often advise that, occasionally, researchers would know what to look for, -->
<!-- TODO MCH: find reference, find source from Watts Stenner on this -->
<!-- and that therefore some substantive discretion would be warranted.
In the extreme case, any number of extracted factor would be acceptable, so as long as it *made sense*.
The problem here, oddly not obvious to many Q researchers, is that this kind of "makes-sense"-discretion comes perilously close to the imposition of deductively hypothesized meaning so criticized in survey research. (see brown 3 or so) -->

<!-- This comes close to massaging data.
There is something falsifiable here; are there, or are there not common factors. -->

<!-- - Experience / Common sense: 6-8 persons per ex-ante factor (Stenner \& Watts 2012: loc 2652):
    `r ncol(q_sorts)/8` factors
- Magic Number 7 (Brown 1980: 223):
    `r ncol(q_sorts)/7` people per factor?!
- Eigenvalue of nth factor > 1 (Kaiser 1960, Guttman 1954)
- Two or more loadings (Brown 1980: 222)
- Humphrey's rule: Cross-Product of 2 highest loadings greater than twice the standard error
- Scree plot (2nd derivative < 0)
- Parallel Analysis (Horn 1965) -->


<!--
stephenson on number of factors, via verena
OS-9-3-Stephenson.pdf S. 89 "Second, a little simple factor analysis is
all that the operations demand: It will be the end
% of work in this domain if anyone thinks that its be- all and end-all is factor analysis. The less of it, the better. Three or four factors are all that most well planned studies require; there's something loose in the works if anything like ten or so factors are carved out for interpretation. The key to sound work, i.e., to make discoveries, is what one puts into Q method as abduction, not what factor analysis turns out deductively.
-->

<!-- look at communality -->

<!-- \cite[6]{Exel2005} recommends 4-5 people per viewpoint, which given 17 participants yields 3-4 factors, or viewpoints. -->

<!-- \cite{Wittenborn} says explicitly what the problem is: 132, namely that there is no test as to whether there are other people who would sort like this. -->
<!--
Brown's case (42) against statistical (as opposed to, ominously, "substantive") criteria for the number of cases seems to rest on particular people's viewpoints being important, which is not the case in my study, nor probably will it ever be in this kind of work.
This *does* make sense because you don't want to rest on numbers game along, namely the amount of people who load on a factor, that's true - but this is still a necessary, if not a sufficient condition.
Or something like that?

Brown (43) seems to be aware of the problem of false positives from Eigenvalues, but doesn't seem interested in parallel analysis.

He also argues (and more convincingly!) that (43) factor size depends on the people involved, and since the sampling of these people does not follow any kind of method, the result would be meaningless.
I think this is an overstatement; it merely implies false negatives?!? (or something)
Think this through!
Also, point out that absent any kind of such a criterion, we're left in a very uncomfortabel spot
also, in my case, arguably sampling *is* ok, because deliberative participants will always be self-selected (as argued elsewhere). -->
<!-- TODO MCH: notice in the recruitment part that a) too much money may drive out other incentives, and b) already, the VHS-kind-of-orientation-what-do-i-learn was a bit of a problem, might need greater stakes, better output format, more responsibility. This is all under the heading of external validity of the conference -->

<!-- TODO(maxheld83) notice some more that this cannot replace a real textbook, that's obvious, it's just an exercise of what all of this means for Q, which is important, because it's otherwise illiterate -->


#### Conclusion: The Degree of Shared Viewpoints Before CiviCon


### Principal Components Extraction

```{r analysis, eval = FALSE}
keyneson <- NULL
keyneson <- list("before" = c(), "after"= c ())
#keyneson$before <- list("pearson"=c(), "kendall"=c(), "spearman"=c())
keyneson$before <- qmethod(
  dataset = q_sorts[,,"before"],
  nfactors = 3,
  rotation = "quartimax",
  forced = TRUE,
  cor.method = "spearman",
  threshold = "none",
  allow.confounded = TRUE
)
keyneson$after <- qmethod(
  dataset = q_sorts[,,"after"],
  nfactors = 3,
  rotation = "quartimax",
  forced = TRUE,
  cor.method = "spearman",
  threshold = "none",
  allow.confounded = TRUE
)
keyneson$before.varimax <- qmethod(
  dataset = q_sorts[,,"before"],
  nfactors = 3,
  rotation = "varimax",
  forced = TRUE,
  cor.method = "spearman",
  threshold = "none",
  allow.confounded = TRUE
)
keyneson$before.none <- qmethod(
  dataset = q_sorts[,,"before"],
  nfactors = 3,
  rotation = "none",
  forced = TRUE,
  cor.method = "spearman",
  threshold = "none",
  allow.confounded = TRUE
)
keyneson$before.equamax <- qmethod(
  dataset = q_sorts[,,"before"],
  nfactors = 3,
  rotation = "equamax",
  forced = TRUE,
  cor.method = "spearman",
  threshold = "none",
  allow.confounded = TRUE
)

library(gridExtra)
grid.arrange(q.loaplot(keyneson$before), q.loaplot(keyneson$before.varimax))

library(reshape2)
quartimax <- keyneson$before$loa
quartimax$qsorts <- rownames(keyneson$before$loa)
quartimax$rotation <- "quartimax"
quartimax
# quartimax <- melt(data = quartimax, )

varimax <- keyneson$before.varimax$loa
varimax$qsorts <- rownames(keyneson$before.varimax$loa)
#varimax <- melt(varimax)
varimax$rotation <- "varimax"
varimax

varimax[, 1:3] - quartimax[, 1:3]

rotcompare <- rbind(varimax, quartimax)

rotcompare$`0.05thres` <- 1.96/sqrt(keyneson$before$brief$nstat) 
rotcompare$`0.01thres` <- 2.58/sqrt(keyneson$before$brief$nstat) 

couples <- NULL
couples$f1f2 <- c("f1", "f2")
couples$f1f3 <- c("f1", "f3")
couples$f2f3 <- c("f2", "f3")

i <- couples$f1f2

library(ggplot2)
for (i in couples) {
  rotcompplot <- ggplot(data = rotcompare, mapping = aes_string(x = i[1], y = i[2], label = "qsorts", group = "rotation")) + xlim(0, 1) + ylim(0,1) + coord_fixed()
  rotcompplot <- rotcompplot + geom_point(mapping = aes(shape = rotation, colour = qsorts))
  rotcompplot <- rotcompplot + geom_vline(mapping = aes(xintercept = `0.05thres`)) + geom_hline(mapping = aes(yintercept = `0.05thres`))
}

keyneson$before$brief$rotmat
keyneson$before.varimax$brief$rotmat


keyneson$after <- qmethod(
  dataset = q_sorts[,,"after"],
  nfactors = 3,
  rotation = "quartimax",
  forced = TRUE,
  cor.method = "spearman"
)
keyneson$before$flagged
keyneson$after$loa

q.loaplot(results = keyneson$before)
q.compplot(results = keyneson$before)


keyneson$before <- q.fnames(keyneson$before, fnames = c("resentment", "critical", "moderate"))
keyneson$after <- q.fnames(keyneson$after, fnames = c("decommodification", "radical", "pragmatic"))

keyneson$before <- q.fcolors(keyneson$before, color.scheme = "Set1")
keyneson$after <- q.fcolors(keyneson$after, color.scheme = "Set2")


q.scoreplot.num(keyneson$before, 1)
q.scoreplot.ord(results = keyneson$before, factor = 1, label.scale = 450) + ggtitle("Before: Resentment")
q.scoreplot.ord(results = keyneson$before, factor = 2, label.scale = 450) + ggtitle("Before: Critical")
q.scoreplot.ord(results = keyneson$before, factor = 3, label.scale = 450) + ggtitle("Before: Moderate")
q.scoreplot.ord(results = keyneson$after, factor = 1, label.scale = 450) + ggtitle("After: Decommodification")
q.scoreplot.ord(results = keyneson$after, factor = 2, label.scale = 450) + ggtitle("After: Radical")
q.scoreplot.ord(results = keyneson$after, factor = 3, label.scale = 450) + ggtitle("After: Pragmatic")


plot.before <- q.scoreplot(results = keyneson$before)
keyneson$before$loa
plot.before$resentment
keyneson$before$zsc_n[c("wealth-tax", "land-value-tax-resources", "poor-central-planner", "exchange-value"), ]
keyneson$before$zsc[c("wealth-tax", "land-value-tax-resources", "poor-central-planner"), ]
keyneson$before$qdc[c("wealth-tax", "land-value-tax-resources", "poor-central-planner"), ]
#TODO look separately at the SDs

```


### What Rotation?

<!-- COMMENT MH
agree that in fact, as Kampen and Tamas point out, the representativeness of the q sample to the universe of statements is a weak, weak spot.
Important: as cuppen in  kampen and tamas writes 3112 it's not the number of supporters that count, just the perspectives.
Consider the discussion of q methodology and validities oin kampen tamas 3112
do cluster analysis instead of factor; according to kampen and tamas. This might be interesting.
-->


<!-- Varimax-Methode: Das Ziel dieser Methode ist die Vereinfachung der Interpretation der Faktoren. Die Achsen werden so rotiert, dass sich die Anzahl der Variablen mit hohen multiplen Faktorladungen reduziert.
Quartimax-Methode: Das Ziel dieser Methode ist die Vereinfachung der Interpretation der Variablen. Die Achsen werden so rotiert, dass jede Variable durch möglichst wenig Faktoren erklärt wird (idealerweise einen).
sourcae: http://marktforschung.wikia.com/wiki/Rotation_der_Faktoren with more lit
 -->

<!-- Abductive process in factor interpretation is actually like giving reasons in interpretation, you're trying to understand factors on their own terms.
 -->

<!-- only orthogonal is even an option -->

<!-- from noori's presentation

- varimax minimizes the number of variables with high loadings, aka. maximizes variance of each factor loadings by making high loadings higher etc and lower ones lower to simplify interpretation (this might at be describe as sm? unclear?)
- quartimax: minimizes the number of factors to explain each variable (this might be considered CATEGORICAL), this is what we want on q because we don't like confounded q sorts (!) says Noori
- equamax is both
very good argument by Noori on judgmental rotation

> Although one might defend such approach as an abductive solution, the abductive process involves acquiring comprehensive knowledge of the problem at hand and generating a sound hypothesis accordingly, not testing consecutive hypotheses to reach to an acceptable conclusion! -->


```{r manual-rotation, include=FALSE, eval=FALSE}
q.mrot.choose()
q.mrot.choose(results = keyneson$before, plot.type = "base", cache = FALSE)
```
Brown calls varimax etc "atheoretical rotation" (Brown 1980: 227 ) - nice!

great example / illustration with the stuff on the beach comes from Brown 1980: 222

<!-- Talk about Thurstone's concept of simple structure, nicely explained in mulaik -->

- Judgmental / Manual
- Automatic (varimax, equamax etc.)
<!-- good piece on why judgemental rotation on brown 224: but it's also an invitation to reify stuff, or viewpoints the researcher has. notice that there's already place for judgment in the *interpretation* -->

**Varimax**
- minimizes number of variables with high (or low) loadings on a factor, makes it possible to identifiy a variable with a factor


**Quartimax**
- minimizes the number of factors needed to explain each variable. Tend to generate a general factor on which most variables load with med to high values - not helpful for resurch

**Equimax**
- combination of above.

What is a good factor structure?
Should have more than .6 total variance explained (barely so!)

<!-- Structure matrix vs pattern matrix applies only for non-orthogonal rotation -->

<!-- I don't Have people in my sample who really stand out, as Brown 1980 261 hopes there would be. -->

<!-- The problem is of course that judgmental rotation is theory laden; for example, a somewhat naive architectural theory that expects man-made objects to meaningfully exist perpendicular to earh, and what's more, perpendicular to broadly parallel lines.
this may work for run-of-the-mill cards and single-family homes, but other classes of objects, say, a tree, a vibrator might be harder to recognize.
This analogy --- if a little forced --- offers more instruction: cards and houses are easy to recognize in the same coordinate system because they are, in fact, build in much the same way, by an industrial system (compare that to natural objects, or to works of art) --- and they exist under gravity, which enforces some kinds of discipline.
the same of course, *could* be said for viewpoints.
But then, we get into trecherous territory, assuming that kind of dimensionality.
A mathematical criterion may, in fact serve us better.
Or not? -->
<!--
It is also unclear, how theoretical rotation is even supposed to work with more than 2 dimensions! (or is it, do they do that pairwise or what?) -->

<!--
explain the rotation business with lego; it is easier to recognize a house when the axis by which height etc. is described are, say, parallel to the ground --- though they need not. -->


<!-- worry about bipolar factors -->

<!-- COMMENT MH
via qlist Since Mary Furnari decided to rely on varimax rotation, Watts & Stenner (2012), for instance, would actually recommend PCA. (But note that Watts & Stenner erroneously think that varimax maximizes the amount of explained variance. To clarify: Varimax searches a simple structure solution characterized by a maximal number of either high or near zero loadings for every factor which is arrived at by maximizing the variance of the factors' loadings. The amount of explained variance is not affected by rotation.)
In my view, the meaningfulness of certain quantitative coefficients in Q, like so-called 'significance' of factor loadings is often overrated. So as if observing coefficient based rules could provide mathematical-statistical  proof for the soundness of the researcher's decisions and conclusions. Without referring to the wisdom of inferential statistics (which is not applicable in Q, IMO), however, I would dare to bet that given 60% explained variance of the 1st factor and 4% and less for the following, that Mary Furnari won't be able to assemble groups of sorts (to be flagged on different factors) that represent distinct = uncorrelated views. But that's just a bet which I possibly can lose. So nothing is lost by just trying out varimax solutions with 2, 3, 4 ... factors. Two simple (but not 100% unambiguous) rules for accepting a factor solution: (1) At least 2, better 3, defining sorts (load strongly on the respective factor only). (2) Intercorrelations of factor scores at a moderate level (possible choice of critical level: not higher than the size of the loading accepted for a defining sort).
-->

## Loadings

<!-- For the analogy that the factors are the loading people's lego buildigns superimposed on another, see Brown 240. -->


## Factor scores

<!-- Brown 34 explains the ideal type (his choice of words) analogy -->


### Flagging

<!-- COMMENT MH
check whether the two flagging criteria make sense, aren't they too restrictive?
-->

Peter SChmolck:
> The automatic flagging method in PQMethod  often misunderstood as a well-founded, decisive statistical-q-methodological solution for the problem. However, it originates neither from William Stephenson nor Steven Brown. The reason for the frequent misunderstanding might lie in bad English. With the 'pre-' prefix I wanted to say, 'preliminary', before the 'real' work has to be done. I later noticed that the prefix never appeared when people referred to this  method of selecting the defining sorts.
This appears to be a good opportunity for disclosing how the algorithm came into PQMethod. In John Atkinson's  original version of the software,  the only possible method of manual flagging was rather cumbersome, one had to enter the series of sort numbers to be flagged for every factor (this method is still available in PQMethod as the 'old' version). In the course of checking and further developing the software I had to permanently repeat running the program from start to end for testing if the last changes worked as intended. Going through the flagging stage each time was quite annoying. I therefore needed a solution for automatic processing. From my times as a research assistant in scale construction projects I had a vague recollection how so-called 'marker items' were found for factor interpretation and item selection for the scales. It was the so-called Fuerntratt Criterion: To be eligible as a marker item for a certain factor, the variance explained by that factor must exceed the total amount of variance explained by the other factors. This seemed to me a sound method for securing that a sort cannot be definer for more than one factor. Since I was aware that Q people have a strong affection for significance formulas, I added the (lenient) p<.05 'significance' requirement. The latter, BTW, has nearly never an additional effect when the first criterion is met.


### Scoring


#### QDC

<!-- QDCs are not calculated, as one might think, as t-tests, but using Brown 1980 244. -->


### What visualization

<!-- notice that while z-scores have more information, it makes this hard to see because of overplotting.
A neat visualization, in this case, is more than just cosmetic; easy navigation can be considered crucial to arrive at good factor interpretations. -->

<!-- Notice that "consensus statements" is something of a misnomer, because the significance cutoff point is asymmetric. -->

<!-- the items *together* make sense (or not) Brown 1980: 257 -->

<!-- TODO MCH: write to q list about this -->

<!--
notice that I look at sd of pop, not sample, because it's not r stats
but still, dispersion is interesting - those are the loose lego blocks
is there maybe a need to also look at the loose lego blocks *overall*? And what are those? What are the loosest blocks?
-->

```{r array-viz, fig.width = 20, fig.height=18, eval = FALSE}
arrayviz <- q.scoreplot.ord(
  results = keyneson$before
  #,f.names = c("resentment","critical","moderate") TODO(maxheld83) name them separately
  ,incl.qdc = TRUE
  #,color.scheme = "Set1"
  ,extreme.labels = c("very much disagree","very much agree")
)
q.scoreplot.ord(results = keyneson$before, factor = 1)
arrayviz[3]
```

```{r add-type, eval = FALSE}
# array.viz.data <- merge(  # add type of item
#   x = array.viz.data
#   ,y = q.sampling.structure  # that is where the types are from
#   ,by.x = 0  # these are rownames
#   ,by.y = "handle"  # that is how they are called
#   ,all = TRUE
# )
# rownames(array.viz.data) <- array.viz.data$Row.names  # restore rownames
# g <- g + geom_text(
#   aes(
#     ,fontface=c("plain","bold","italic")[metaconsensus]
#   )
#   ,size = 3.5
#)
#g <- g + scale_family_manual(c("serif","sans","mono"))
```

<!-- TODO MH: per-participation visualization see new issue -->


## Interpretation

<!--
Frank destroys the SD of pro-socialism
-->

```{r frank discussion pro socialism, eval = FALSE}
# q.feedback["pro-socialism","Frank","after"]

keyneson$before$f_char$characteristics$condition <- "before"
keyneson$after$f_char$characteristics$condition <- "after"

explvar <- rbind(keyneson$before$f_char$characteristics, keyneson$after$f_char$characteristics)
explvar$comps <- rownames(explvar)
explvar <- explvar[c(4,5,6,1,2,3), ]
explvar$condition <- factor(explvar$condition, levels = c("before", "after"))
library(ggplot2)

faccolors <-c(keyneson$before$brief$fcolors, keyneson$after$brief$fcolors)

ggplot(data = explvar, mapping = aes(x = condition, fill=comps, y=expl_var, label=comps)) + geom_bar(stat = "identity") + ylim(0,100) + scale_fill_manual(values = c("resentment" = "#E41A1C", "critical"="#377EB8", "moderate"="#4DAF4A", "decommodification"="#66C2A5", "radical" = "#FC8D62", "pragmatic" = "#8DA0CB"))

names(faccolors[4:6]) <- c("decommodification", "radical", "pragmatic")
faccolors

keyneson$before$brief$fcolors
explvar
```

<!-- - ordinal data throughout might lead to classifier, maybe that'll give probabilities not sure about that
- maybe "ideal type" is latent variable thinking -->


## Discussion

<!-- what we're loosing when then R2 increases is, by definition, some residuals.
interesting idea: maybe look at what kind of residuals was lost? could we tell a story about that? -->


<!-- cite the q "study groups" (by some grace) thread in my presentation; qmethod has been used as an instrument in consensus building, not so much as a measurement of its results. -->

<!-- from the new yorker on jane jacobs from adam gopnik 
> The sad truth is that the saints we revere for thinking for themselves almost always end up thinking by themselves.
We are disappointed to find that the self-taught are also self-centered, although a moment's reflection should tell us that you have to be self-centered to *become* self-taught.
(The more easily instructed are busy brushing their teeth, as pledged.)
The independent-minded philosopher-saints are so sure of themselves that they often lose the discipline of any kind of peer review, formal or amateur.
They end up opiniated, and alone.
-->

<!-- Using data gathered before and after the 2014 CiviCon Citizen Conference on taxation, hosted by the author, I suggest several approaches to measure deliberative quality using Q methodology. -->
<!-- At the week-long conference, 16 diverse, self-selected citizens were tasked to design a tax system "from scratch", choosing among possible combinations of base schedule.  -->
<!-- They participated in learning phases, deliberated in moderated small group and plenary sessions, met with experts and held a concluding press conference. -->
<!-- Before and after the conference, citizens sorted 79 statements on taxation and economics according to their subjective viewpoint. -->

<!-- Besides, it’s not clear that we need random sampling, maybe we need theoretical saturation, that might make more sense for Q method. -->

<!-- Following traditional Q methodology, sorts were factor analyzed to extract ideal-typical viewpoints shared by participants.  -->
<!-- Before the conference, citizens expressed resentful, radical and moderate viewpoints, including some apparent inconsistencies between beliefs, values and preferences on taxation.  -->
<!-- After the conference, citizens shared decommodifying, pragmatic and critical viewpoints and displayed a simpler, lower-dimensional structuration of viewpoints. -->

<!-- Such results are meaningful and encouraging for the use of Q in deliberation, but fall short of testing a treatment effect, because before- and after-factor models are extracted and rotated separately. -->
<!-- To tease out the effects of deliberation, I therefore also apply a newly developed 2-dimensional Q analysis (Q-2D), based on three-mode Principal Components Analysis (nPCA) by Tucker and Kroonenberg. -->
<!-- On the one hand, results indicate that, after considerable computation and graphical transformation, the effects of deliberation can be meaningfully interpreted using Q2D.  -->
<!-- Participation in the Citizen Conference appears to increase substantive consistency of political viewpoints, seemingly meeting a demanding standard for deliberative democracy.  -->
<!-- On the other hand, a unified longitudinal factor model may also unreasonably restrict the change of subjectivity during deliberation.  -->
<!-- Analyzed separately, there is some evidence that before and after factor models are, in fact, incongruent. -->

<!-- Alternative analytical approaches to Q data in empirical deliberative democracy research are discussed. -->

Using data gathered before and after the 2014 CiviCon Citizen Conference on taxation, hosted by the author, I suggest several approaches to measure deliberative quality using Q methodology.
At the week-long conference, 16 diverse, self-selected citizens were tasked to design a tax system "from scratch", choosing among possible combinations of base schedule. 
They participated in learning phases, deliberated in moderated small group and plenary sessions, met with experts and held a concluding press conference.
Before and after the conference, citizens sorted 79 statements on taxation and economics according to their subjective viewpoint.

Following traditional Q methodology, sorts were factor analyzed to extract ideal-typical viewpoints shared by participants. 
Before the conference, citizens expressed resentful, radical and moderate viewpoints, including some apparent inconsistencies between beliefs, values and preferences on taxation. 
After the conference, citizens shared decommodifying, pragmatic and critical viewpoints and displayed a simpler, lower-dimensional structuration of viewpoints.

Such results are meaningful and encouraging for the use of Q in deliberation, but fall short of testing a treatment effect, because before- and after-factor models are extracted and rotated separately.
To tease out the effects of deliberation, I therefore also apply a newly developed 2-dimensional Q analysis (Q-2D), based on three-mode Principal Components Analysis (nPCA) by Tucker and Kroonenberg.
On the one hand, results indicate that, after considerable computation and graphical transformation, the effects of deliberation can be meaningfully interpreted using Q2D. 
Participation in the Citizen Conference appears to increase substantive consistency of political viewpoints, seemingly meeting a demanding standard for deliberative democracy. 
On the other hand, a unified longitudinal factor model may also unreasonably restrict the change of subjectivity during deliberation. 
Analyzed separately, there is some evidence that before and after factor models are, in fact, incongruent.

Alternative analytical approaches to Q data in empirical deliberative democracy research are discussed.

Political subjectivity has been conceptualized by Dryzek and Niemeyer to involve *beliefs* and *values*, as well as *preferences*.
These are conceptually *independent dimensions* and the very status of some assertion as either value or belief is often contested, and thereby part of political subjectivity proper.
Here, too, Q2D can significantly improve upon conventionally analyzed, repeated Q studies by yielding *jointly extracted and rotated*, two-dimensional ideal viewpoints vis-a-vis the *same* items, readily interpretable in terms of meta-consensus and intersubjective rationality.

Algorithms and visualizations in open-source R-code are presented, conceptual limitations of Q2D are discussed and compared to alternative analytical approaches to Q data in empirical deliberative democracy research.

<!-- - Communality / nloads of factors seem to change (different people!)
- Eigenvalues increase!
- Factor of factors?
- Factor of item movements?
Look at factor CONGRUENCE (the angles!) -->

<!-- make screeplot with before/after data in one plot, that's essentially what you're hypothesizing -->


```{r diff, eval = FALSE}
diff <- NULL
diff$data <- q_sorts[,,"after"] - q_sorts[,,"before"]

test <- q.corrplot(corr.matrix = cor(diff$data, method = "pearson")) + ggtitle("Correlation of Movements")


diff$howmany <- q.nfactors(dataset = diff$data, cutoff = 8)
diff$howmany$screeplot

diff$howmany.r <- q.nfactors(dataset = t(diff$data))
diff$howmany

diff$results <- qmethod(dataset = diff$data, nfactors = 3, rotation = "equamax", forced = FALSE, cor.method = "pearson", threshold = "none", allow.confounded = TRUE, distribution = diff$distro)

diff$distro <- c(
  -7, 
  -6, 
  rep(-5, 2), 
  rep(-4, 4),
  rep(-3, 6),
  rep(-2, 9),
  rep(-1, 10),
  rep(0, 11),
  rep(1, 10),
  rep(2, 9),
  rep(3, 6),
  rep(4, 4),
  rep(5, 2),
  6,
  7
)

diff$results$f_char$characteristics

q.scoreplot.ord(results = diff$results, factor = 1)

q.corrplot(corr.matrix = (cor(t(diff$data))))
```


```{r threeway, eval = FALSE}
library(ThreeWay)
help(T3)
dimnames(q_sorts)
keyneson$threeway <- T3(data = q_sorts, laba = dimnames(q_sorts)[[1]], labb = dimnames(q_sorts)[[2]], labc = dimnames(q_sorts)[[3]])
```


<!-- is there a two-dimensional equivalent to PCA? -->


<!-- crazy idea for two-dimensional PCA, just make the lower diag different, this is via qlist from Tim Michel:
i J Christopher:

I've been reading all the responses my query  has generated, to reflect on your response:

1) I understood what you meant by fusing the Q-Sorts together.
2) However, I agree with others on the topic that it doesn't make sense to simply fuse the sorts together and then do the calculations as if they were one large Q-Sort
3) I reiterate that using a block diagonal correlation matrix would seem a more reasonable approach

i.e.  [ M   0  ] , where M are the correlations based on the 1st Qsort and N the 2nd.
      [ 0   N  ]

4) That would get away from correlating apples and oranges so to speak
5) This way we would get a cleaner result and the calculated factors would possibly reflect the subject positions on Qsort1 and Qsort2 without the other noise in the data.
6) I think one would still have to make a case that the qsets are related and would have to administered in the same temporal frame.

Now in my limited exposure to Q-Methodology, it seems to me that it is employed to find clusters of opinions or subjectivities on the topic outlined int the Q-sets. I would think that combining the Q-Sorts in this manner would identify clusters of opinions & subjectivities in the combined Q-Sets.
 -->


## Factor changes

<!-- Q2D: A Novel Approach to Measure Deliberative Quality at a Citizen Conference on Taxation -->

<!-- - reports *changes* in factor composition and loadings between pre- and post-sort. -->
<!--    - *factor interpretation* in terms of deliberative democracy (as per Niemeyer: beliefs, values, preferences) -->
<!--    - testing (increased) *meta-consensus and intersubjective rationality* (aka consistency) -->
<!--    - additional *qualitative analyses* of selected, cursorily transcripted discussions from CiviCon-1 -->

<!-- 1. **compare factor loadings**: look at the dimensionality, and loadings of factors befre and after; this would be the broadest kind of result, ideally that the factor space becomes lower dimensional -->
<!-- 2. **pooling** of data (including before and after sorts in *one* dataset for analysis): may show different before/after factors, as well as the change of people -->
<!-- 3. **2nd order analysis** (make factors out of factors): will be limited because of max. 8 cases (factors), but would be very interesting: these might be the *misunderstandings* -->
<!-- 4. **factors of deltas** compute the delta in score for each item (say, item moves from -5 to 2 for a participant), then compute factors of these movements: this may reveal the "misunderstandings", and/or the kinds of learning processes that have happened. -->
<!-- 5. **spiking** of data; place (some of the) "before" factor arrays into the "after" dataset as cases; then do judgmental rotation: this is not a very common method, but may reveal how things have changed from the vantage point of "before". -->