Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

121 of 9,346 importing from structure file - inconsistent "negative subscripts" error #141

Open
gottschoa opened this issue Apr 20, 2016 · 15 comments

Comments

@gottschoa
Copy link

Hi,

I am trying to run DAPC on my new Sceloporus dataset. I successfully got this to work before with some other datasets. I am using adegenet 2.0.0.

When I run the following line for my "Uma dataset", I am able to successfully import:

library(adegenet)

data <- read.structure("output_western_uma_121214_n60_h5_p75_editnames_2.str", n.ind=64, n.loc=597, onerowperind=FALSE, col.lab=1, col.pop=0, col.others=NULL, row.marknames=NULL, NA.char="-9", pop=NULL, ask=FALSE, quiet=FALSE)

When I run the same code for the "Sceloporus dataset":

data <- read.structure("output_sceloporus_032415_n43_h5_p75.str", n.ind=80, n.loc=1024, onerowperind=FALSE, col.lab=1, col.pop=0, col.others=NULL, row.marknames=NULL, NA.char="-9", pop=NULL, ask=FALSE, quiet=FALSE)

I get the following error:

Error in mat[, (ncol(mat) - p + 1):ncol(mat)] :
only 0's may be mixed with negative subscripts

I also tried this with adegenet v 1.4.2 and having the exact same issue.

I attached both input (structure) files to this email. They were both formatted the same way, from pyRAD v2.1.2. If anyone can figure out why one file is giving me the error, and the other isn't, I would greatly appreciate it.

I should point out that I searched the archives, a similar question has been posted about a year ago, but I didn't see it resolved:

http://lists.r-forge.r-project.org/pipermail/adegenet-forum/2014-December/001049.html

Thanks for your help! (I added .txt extension to the .str files to upload to github)

Best, Andy

output_sceloporus_032415_n43_h5_p75.str.txt
output_western_uma_121214_n60_h5_p75_editnames_2.str.txt

@thibautjombart
Copy link
Owner

Hi there,
before looking into this, have you tried with the latest version of adegenet (2.0.1)

@thibautjombart
Copy link
Owner

Is this issue still pending?

@gottschoa
Copy link
Author

Hi Dr. Jombart,

Sorry for the delayed reponse, I tried with 2.01 and still encounter the
same issue.

Best, Andy

On Fri, Aug 5, 2016 at 5:56 AM, Thibaut Jombart [email protected]
wrote:

Is this issue still pending?


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#141 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ARuMZ2Ul3ldczVX2U0VrbIOxJrTX1zURks5qcwjIgaJpZM4IMFoz
.

Andrew Gottscho, Ph.D.
[email protected]

@MagB
Copy link

MagB commented Nov 23, 2016

Hi,

I'm having the same issue. I tried with 2.01 and continue to have the same error code.
I'm using .str data.[
no_pop_map_snp_data.txt

](url)

@thibautjombart
Copy link
Owner

Hi there,
I am heading to a conference all of next week, so will not be able to look into this before a week. If this is a persistent error, this may be a bug. If you have time for this, you can try and see what is wrong using:

debug(read.structure)

before entering the command line creating the error.

@zkamvar
Copy link
Collaborator

zkamvar commented Oct 10, 2017

Hi @gottschoa, the reason why this fails is because adegenet can only detect 1019 loci and not 1024. If you read the structure file in as a table, there are only 1019 columns that register as loci.

library("adegenet")
#> Loading required package: ade4
#> 
#>    /// adegenet 2.1.0 is loaded ////////////
#> 
#>    > overview: '?adegenet'
#>    > tutorials/doc/questions: 'adegenetWeb()' 
#>    > bug reports/feature requests: adegenetIssues()
tmp <- tempfile(fileext = ".str")
download.file("https://github.com/thibautjombart/adegenet/files/228778/output_sceloporus_032415_n43_h5_p75.str.txt", 
  destfile = tmp)
read.structure(tmp, n.ind = 80, n.loc = 1024, onerowperind = FALSE, col.lab = 1, 
  col.pop = 0, col.others = NULL, row.marknames = NULL, NA.char = "-9", pop = NULL, 
  ask = FALSE, quiet = FALSE)
#> 
#>  Converting data from a STRUCTURE .stru file to a genind object...
#> Error in mat[, (ncol(mat) - p + 1):ncol(mat)]: only 0's may be mixed with negative subscripts
read.structure(tmp, n.ind = 80, n.loc = 1019, onerowperind = FALSE, col.lab = 1, 
  col.pop = 0, col.others = NULL, row.marknames = NULL, NA.char = "-9", pop = NULL, 
  ask = FALSE, quiet = FALSE)
#> 
#>  Converting data from a STRUCTURE .stru file to a genind object...
#> Warning in df2genind(X = X, pop = pop, ploidy = 2, sep = sep, ncode =
#> ncode): entirely non-type marker(s) deleted
#> /// GENIND OBJECT /////////
#> 
#>  // 80 individuals; 1,017 loci; 2,047 alleles; size: 1.1 Mb
#> 
#>  // Basic content
#>    @tab:  80 x 2047 matrix of allele counts
#>    @loc.n.all: number of alleles per locus (range: 1-4)
#>    @loc.fac: locus factor for the 2047 columns of @tab
#>    @all.names: list of allele names for each locus
#>    @ploidy: ploidy of each individual  (range: 2-2)
#>    @type:  codom
#>    @call: read.structure(file = tmp, n.ind = 80, n.loc = 1019, onerowperind = FALSE, 
#>     col.lab = 1, col.pop = 0, col.others = NULL, row.marknames = NULL, 
#>     NA.char = "-9", pop = NULL, ask = FALSE, quiet = FALSE)
#> 
#>  // Optional content
#>    - empty -
sum(!sapply(read.table(tmp, sep = "\t"), is.logical))
#> [1] 1020
Session info
devtools::session_info()
#> Session info -------------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.4.2 (2017-09-28)
#>  system   x86_64, darwin15.6.0        
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_US.UTF-8                 
#>  tz       America/Chicago             
#>  date     2017-10-10
#> Packages -----------------------------------------------------------------
#>  package    * version    date       source                        
#>  ade4       * 1.7-8      2017-08-09 cran (@1.7-8)                 
#>  adegenet   * 2.1.0      2017-10-10 local                         
#>  ape          4.1        2017-02-14 CRAN (R 3.4.0)                
#>  assertthat   0.2.0      2017-04-11 CRAN (R 3.4.0)                
#>  backports    1.1.1      2017-09-25 CRAN (R 3.4.2)                
#>  base       * 3.4.2      2017-10-04 local                         
#>  bindr        0.1        2016-11-13 CRAN (R 3.4.0)                
#>  bindrcpp     0.2        2017-06-17 CRAN (R 3.4.0)                
#>  boot         1.3-20     2017-07-30 CRAN (R 3.4.1)                
#>  cluster      2.0.6      2017-03-16 CRAN (R 3.4.0)                
#>  coda         0.19-1     2016-12-08 CRAN (R 3.4.0)                
#>  colorspace   1.3-3      2017-08-16 R-Forge (R 3.4.1)             
#>  compiler     3.4.2      2017-10-04 local                         
#>  datasets   * 3.4.2      2017-10-04 local                         
#>  deldir       0.1-14     2017-04-22 CRAN (R 3.4.0)                
#>  devtools     1.13.3     2017-08-02 CRAN (R 3.4.1)                
#>  digest       0.6.12     2017-01-27 CRAN (R 3.4.0)                
#>  dplyr        0.7.4      2017-09-28 CRAN (R 3.4.1)                
#>  evaluate     0.10.1     2017-06-24 CRAN (R 3.4.1)                
#>  expm         0.999-2    2017-03-29 CRAN (R 3.4.0)                
#>  formatR      1.5        2017-04-25 CRAN (R 3.4.0)                
#>  gdata        2.18.0     2017-06-06 CRAN (R 3.4.0)                
#>  ggplot2      2.2.1      2016-12-30 CRAN (R 3.4.0)                
#>  glue         1.1.1      2017-06-21 CRAN (R 3.4.0)                
#>  gmodels      2.16.2     2015-07-22 CRAN (R 3.4.0)                
#>  graphics   * 3.4.2      2017-10-04 local                         
#>  grDevices  * 3.4.2      2017-10-04 local                         
#>  grid         3.4.2      2017-10-04 local                         
#>  gtable       0.2.0      2016-02-26 CRAN (R 3.4.0)                
#>  gtools       3.5.0      2015-05-29 CRAN (R 3.4.0)                
#>  htmltools    0.3.6      2017-04-28 CRAN (R 3.4.0)                
#>  httpuv       1.3.5      2017-07-04 CRAN (R 3.4.1)                
#>  igraph       1.1.2      2017-07-21 cran (@1.1.2)                 
#>  knitr        1.17       2017-08-10 cran (@1.17)                  
#>  lattice      0.20-35    2017-03-25 CRAN (R 3.4.0)                
#>  lazyeval     0.2.0      2016-06-12 CRAN (R 3.4.0)                
#>  LearnBayes   2.15       2014-05-29 CRAN (R 3.4.0)                
#>  magrittr     1.5        2014-11-22 CRAN (R 3.4.0)                
#>  MASS         7.3-47     2017-04-21 CRAN (R 3.4.0)                
#>  Matrix       1.2-11     2017-08-16 CRAN (R 3.4.1)                
#>  memoise      1.1.0      2017-04-21 CRAN (R 3.4.0)                
#>  methods    * 3.4.2      2017-10-04 local                         
#>  mgcv         1.8-22     2017-09-19 CRAN (R 3.4.2)                
#>  mime         0.5        2016-07-07 CRAN (R 3.4.0)                
#>  munsell      0.4.3      2016-02-13 CRAN (R 3.4.0)                
#>  nlme         3.1-131    2017-02-06 CRAN (R 3.4.0)                
#>  parallel     3.4.2      2017-10-04 local                         
#>  permute      0.9-4      2016-09-09 CRAN (R 3.4.0)                
#>  pkgconfig    2.0.1      2017-03-21 CRAN (R 3.4.0)                
#>  plyr         1.8.4      2016-06-08 CRAN (R 3.4.0)                
#>  R6           2.2.2      2017-06-17 cran (@2.2.2)                 
#>  Rcpp         0.12.13.1  2017-10-10 Github (RcppCore/Rcpp@136d50f)
#>  reshape2     1.4.2      2016-10-22 CRAN (R 3.4.0)                
#>  rlang        0.1.2      2017-08-09 cran (@0.1.2)                 
#>  rmarkdown    1.6        2017-06-15 cran (@1.6)                   
#>  rprojroot    1.2        2017-01-16 CRAN (R 3.4.0)                
#>  scales       0.5.0.9000 2017-08-28 Github (hadley/scales@d767915)
#>  seqinr       3.4-5      2017-08-01 CRAN (R 3.4.1)                
#>  shiny        1.0.5      2017-08-23 cran (@1.0.5)                 
#>  sp           1.2-5      2017-06-29 CRAN (R 3.4.1)                
#>  spdep        0.6-15     2017-09-01 CRAN (R 3.4.1)                
#>  splines      3.4.2      2017-10-04 local                         
#>  stats      * 3.4.2      2017-10-04 local                         
#>  stringi      1.1.5      2017-04-07 CRAN (R 3.4.0)                
#>  stringr      1.2.0      2017-02-18 CRAN (R 3.4.0)                
#>  tibble       1.3.4      2017-08-22 cran (@1.3.4)                 
#>  tools        3.4.2      2017-10-04 local                         
#>  utils      * 3.4.2      2017-10-04 local                         
#>  vegan        2.4-4      2017-08-24 cran (@2.4-4)                 
#>  withr        2.0.0      2017-07-28 CRAN (R 3.4.1)                
#>  xtable       1.8-2      2016-02-05 CRAN (R 3.4.0)                
#>  yaml         2.1.14     2016-11-12 CRAN (R 3.4.0)

@saidwali
Copy link

Hi,
I still have this problem if I run dapc. I am using the lastest verison 2.1.2 It works fine if i work with imputed data. But leaving missing marker data as NA is giving me this error. "Fehler in dm[, 1L:dimen, drop = FALSE] : nur Nullen dürfen mit negativen Indizes gemischt werden"

@zkamvar
Copy link
Collaborator

zkamvar commented Feb 21, 2020

Hi,
I still have this problem if I run dapc. I am using the lastest verison 2.1.2 It works fine if i work with imputed data. But leaving missing marker data as NA is giving me this error. "Fehler in dm[, 1L:dimen, drop = FALSE] : nur Nullen dürfen mit negativen Indizes gemischt werden"

Are you leaving missing data in the file as NA or as -9?

@saidwali
Copy link

Hi,
I found out it was not working because of some stupid mistakes.
Somehow it works now also with missing data.
I am use "NA" for missing marker information. "1" for major, "2" for hetero and "3" for minor.

Some functions give me an error like "find.clusters"
"Warning in find.clusters.data.frame(as.data.frame(x), ...) : NAs introduced by coercion".
"Dudi.pca" is also not working with missing data. But I guess this is normal and I can live with that.
DAPC, scatter etc are working fine.

@zkamvar
Copy link
Collaborator

zkamvar commented Feb 21, 2020

Hi,
I found out it was not working because of some stupid mistakes.
Somehow it works now also with missing data.
I am use "NA" for missing marker information. "1" for major, "2" for hetero and "3" for minor.

Just to confirm: you are referring to an error with read.structure()?

The system you describe is not supported by adegenet and will give you incorrect results. Adegenet assumes that you represent each allele individually so that it can then represent those as counts in a sparse matrix.

@kkolis
Copy link

kkolis commented Apr 20, 2020

Hello, I am having a very similar problem with the dapc command, where I get the same error as saidwali when I run the code
"mmOfour <- dapc(genlit.vcf, pop.list$pop, n.pca = 20, n.da = 4)"
Error in dm[, 1L:dimen, drop = FALSE] :
only 0's may be mixed with negative subscripts

I am currently running the Adegenet package 2.1.2. I am generating the genlight file with vcfR.
string.vcf <- read.vcfR("file.vcf")
genlit.vcf <- vcfR2genlight(string.vcf)

The Adegenet find.clusters program works with the genlight file. Additionally, previously generated genlight files work when running dapc.

I have been spinning my wheels with this error code for the past week, as I am re-analyzing some data after some changes to upstream filtering processes. I have relaxed some filters so that the new vcf/genlit files have more SNPs, and more missing data (however no more than ~25%).

Any help would be appreciated!

@massub
Copy link

massub commented Aug 5, 2020

Hi @gottschoa, the reason why this fails is because adegenet can only detect 1019 loci and not 1024. If you read the structure file in as a table, there are only 1019 columns that register as loci.

library("adegenet")
#> Loading required package: ade4
#> 
#>    /// adegenet 2.1.0 is loaded ////////////
#> 
#>    > overview: '?adegenet'
#>    > tutorials/doc/questions: 'adegenetWeb()' 
#>    > bug reports/feature requests: adegenetIssues()
tmp <- tempfile(fileext = ".str")
download.file("https://github.com/thibautjombart/adegenet/files/228778/output_sceloporus_032415_n43_h5_p75.str.txt", 
  destfile = tmp)
read.structure(tmp, n.ind = 80, n.loc = 1024, onerowperind = FALSE, col.lab = 1, 
  col.pop = 0, col.others = NULL, row.marknames = NULL, NA.char = "-9", pop = NULL, 
  ask = FALSE, quiet = FALSE)
#> 
#>  Converting data from a STRUCTURE .stru file to a genind object...
#> Error in mat[, (ncol(mat) - p + 1):ncol(mat)]: only 0's may be mixed with negative subscripts
read.structure(tmp, n.ind = 80, n.loc = 1019, onerowperind = FALSE, col.lab = 1, 
  col.pop = 0, col.others = NULL, row.marknames = NULL, NA.char = "-9", pop = NULL, 
  ask = FALSE, quiet = FALSE)
#> 
#>  Converting data from a STRUCTURE .stru file to a genind object...
#> Warning in df2genind(X = X, pop = pop, ploidy = 2, sep = sep, ncode =
#> ncode): entirely non-type marker(s) deleted
#> /// GENIND OBJECT /////////
#> 
#>  // 80 individuals; 1,017 loci; 2,047 alleles; size: 1.1 Mb
#> 
#>  // Basic content
#>    @tab:  80 x 2047 matrix of allele counts
#>    @loc.n.all: number of alleles per locus (range: 1-4)
#>    @loc.fac: locus factor for the 2047 columns of @tab
#>    @all.names: list of allele names for each locus
#>    @ploidy: ploidy of each individual  (range: 2-2)
#>    @type:  codom
#>    @call: read.structure(file = tmp, n.ind = 80, n.loc = 1019, onerowperind = FALSE, 
#>     col.lab = 1, col.pop = 0, col.others = NULL, row.marknames = NULL, 
#>     NA.char = "-9", pop = NULL, ask = FALSE, quiet = FALSE)
#> 
#>  // Optional content
#>    - empty -
sum(!sapply(read.table(tmp, sep = "\t"), is.logical))
#> [1] 1020

Session info

Dear @zkamvar I am facing the same problem as mentioned, How will i know that how many loci are detected in the structure file? am going round and round but could not figure it out. Please help me how I will know the number of loci being read by adegent?
Thank you so much in advance,
genotypic.data.structure.AFG landrace.stru.txt

@SMoulherat
Copy link

I had the same problem on a data set of A. obstetricans:
AO_gen_F<-read.structure(
"File",
sep = ";",
n.ind=474,
n.loc=13,
onerowperind = TRUE,
NA.char="-9",
col.lab=1,
col.pop=2,
row.marknames = 1,
col.others = 0)

Comparing with other .stru I have, I saw that my working .stru have a space separator while those not working have a ; . Thus I replace ; per spaces in my not working file and obtained the expected results.
So the bug is in the parameter management.

Cheers

@thesnakeguy
Copy link

Hello, I am having a very similar problem with the dapc command, where I get the same error as saidwali when I run the code
"mmOfour <- dapc(genlit.vcf, pop.list$pop, n.pca = 20, n.da = 4)"
Error in dm[, 1L:dimen, drop = FALSE] :
only 0's may be mixed with negative subscripts

I am currently running the Adegenet package 2.1.2. I am generating the genlight file with vcfR.
string.vcf <- read.vcfR("file.vcf")
genlit.vcf <- vcfR2genlight(string.vcf)

The Adegenet find.clusters program works with the genlight file. Additionally, previously generated genlight files work when running dapc.

I have been spinning my wheels with this error code for the past week, as I am re-analyzing some data after some changes to upstream filtering processes. I have relaxed some filters so that the new vcf/genlit files have more SNPs, and more missing data (however no more than ~25%).

Any help would be appreciated!

Has this been solved? I am experiencing the same thing... I also read the VCF file with vcfR and converted it with vcfR2genlight.

@zkamvar
Copy link
Collaborator

zkamvar commented Jun 6, 2021

Please forgive the lateness of my reply. It's.... been a hell of a year for everyone.

Regarding errors in structure files

It's likely that whitespace characters are giving you problems. There is a difference between a tab and a space that doesn't show up on text editors by default, which will cause problems down the line. For example, in my answer to the initial inquiry back in 2017, I showed that only 1019 loci were being detected. What I didn't explain was that there were six columns after the ID column that were completely blank because there was a series of six tabs after the ID. The truth is, there are many reason why this could be happening. Unfortunately the structure format is quite varied and it can be really hard to debug without knowing what you were expecting (number of loci and number of individuals)

Regarding vcfR errors

These errors don't have anything to do with the initial issue. You are getting a similar error because it's a common error message in R. The problem is that I don't have any way to reproduce the error you are getting because I don't know what the state of the data is. What I do know is that the code dm[, 1L:dimen, drop=FALSE] does not come from {adegenet}, rather it comes from MASS::predict.lda(). This comes from the Discriminant Analysis portion of the DAPC:

adegenet/R/dapc.R

Lines 78 to 92 in 78be588

## PERFORM DA ##
ldaX <- lda(XU, grp, tol=1e-30) # tol=1e-30 is a kludge, but a safe (?) one to avoid fancy rescaling by lda.default
lda.dim <- sum(ldaX$svd^2 > 1e-10)
ldaX$svd <- ldaX$svd[1:lda.dim]
ldaX$scaling <- ldaX$scaling[,1:lda.dim,drop=FALSE]
if(is.null(n.da)){
barplot(ldaX$svd^2, xlab="Linear Discriminants", ylab="F-statistic", main="Discriminant analysis eigenvalues", col=heat.colors(length(levels(grp))) )
cat("Choose the number discriminant functions to retain (>=1): ")
n.da <- as.integer(readLines(con = getOption('adegenet.testcon'), n = 1))
}
##n.da <- min(n.da, length(levels(grp))-1, n.pca) # can't be more than K-1 disc. func., or more than n.pca
n.da <- round(min(n.da, lda.dim)) # can't be more than K-1 disc. func., or more than n.pca
predX <- predict(ldaX, dimen=n.da)

Unfortunately, this is as far as I can go without knowing what your data looks like. What might help in debugging is to not set n.da and see how many discriminant axes are available because that is the source of the error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants