Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when setting row.idx #21

Open
dbstern opened this issue May 15, 2019 · 7 comments
Open

Error when setting row.idx #21

dbstern opened this issue May 15, 2019 · 7 comments

Comments

@dbstern
Copy link

dbstern commented May 15, 2019

When I use row.idx I get the error
Error in [<-(*tmp*, cv.ind == i, 1:res$nl, value = res$loss) : (subscript) logical subscript too long

Maybe this error is related to line 87 being commented?
y <- fit$y # this would cause error if eval.metric == "MAPE"

Code that causes error:
train <- sample(c(T,F), size, c(.5,.5), replace = TRUE)
fit$blasso <- cv.biglasso(X = x, y = y, row.idx = which(train), penalty = "lasso", family = "binomial", nfolds = 10)

If I take "row.idx = which(train)" out it runs without any errors.

@privefl
Copy link
Contributor

privefl commented May 15, 2019

What is size, X, y, ...?

@dbstern
Copy link
Author

dbstern commented May 15, 2019

Sorry, I forgot the line
size = length(y)

Not sure if this answers your question but

str(x)
Formal class 'big.matrix' [package "bigmemory"] with 1 slot
..@ address:
str(y)
logi [1:21989] FALSE TRUE FALSE FALSE FALSE FALSE ...

@privefl
Copy link
Contributor

privefl commented May 15, 2019

I mean, could you provide a reproducible example with some example data so that we can run your code and see the error.

@dbstern
Copy link
Author

dbstern commented May 15, 2019

x <- as.big.matrix(matrix(rnorm(n= 21989,2790), nrow = 21989))
y <- sample(c(0,1), size = 21989, replace = T)
train <- sample(c(T,F), length(y), c(.5,.5), replace = TRUE)
fit <- cv.biglasso(X = x, y = y, row.idx = which(train), penalty = "lasso", family = "binomial", nfolds = 10)
Error in [<-(*tmp*, cv.ind == i, 1:res$nl, value = res$loss) :
(subscript) logical subscript too long

@privefl
Copy link
Contributor

privefl commented May 15, 2019

If we run the code step by step after using debugonce(cv.biglasso), we see a first problem where cv.ind is defining folds for the whole sample size, instead of only the indices of training set.

Specifying cv.ind = sample(rep_len(1:10, sum(train))) returns another error.

@dbstern
Copy link
Author

dbstern commented Jun 13, 2019

I used deepcopy to workaround this bug, but this was taking too much memory and the server would sometimes crashes. When I use deepcopy on a big.matrix, a corresponding file is created at "dev/shm" (I tried changing the backingfile to my external hd but wasn't successful).

I found that the bigstatsr package makes it easier to deal with these problems (althought I'm not sure if what I'm doing is ok). The code is now something like this:

exthd_path <- "."
file <- file.path(exthd_path,"test.txt")
x <- as.big.matrix(matrix(rnorm(n= 21989,2790), nrow = 21989))
write.big.matrix(x, filename = file, row.names = FALSE, col.names = T, sep = " ")
rm(x); gc()

y <- sample(c(0,1), size = 21989, replace = T)
train <- sample(c(T,F), length(y), c(.5,.5), replace = TRUE)
x <- big_read(file, select = 1)
xtrain <- big_copy(x, ind.row = which(train), backingfile = paste0(bigstatsr::sub_bk(x$bk),"-train"))
fit <- cv.biglasso(X = xtrain$bm(), y = y[train], penalty = "lasso", family = "binomial", nfolds = 10)
unlink(xtrain$bk); rm(xtrain)
unlink(c(x$bk,x$rds)); rm(x)

@privefl
Copy link
Contributor

privefl commented Jun 13, 2019

I think you can directly use big_copy() when x is a big.matrix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants