-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shuffling labels and coordinates #136
base: main
Are you sure you want to change the base?
Changes from 12 commits
14290f1
d4de939
e5204c4
c795685
43c65e1
4491e1b
ba36d8d
0a87c9c
9019b76
496aedf
f99a472
ab2b8e5
083da0f
c84a2d4
843f236
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,56 @@ | ||
#!/usr/bin/env Rscript | ||
|
||
# Author_and_contribution: Niklas Mueller-Boetticher; created template | ||
# Author_and_contribution: Kim Vucinic; modified template and created script | ||
|
||
suppressPackageStartupMessages(library(optparse)) | ||
|
||
# Arguments | ||
option_list <- list( | ||
make_option( | ||
c("-c", "--coordinates"), | ||
type = "character", default = NULL, | ||
help = "Path to coordinates (as tsv)." | ||
), | ||
make_option( | ||
c("--seed"), | ||
type = "integer", default = NULL, | ||
help = "Seed to use for random operations." | ||
), | ||
make_option( | ||
c("-o", "--out_file"), | ||
type = "character", default = NULL, | ||
help = "Output file." | ||
) | ||
) | ||
|
||
# Description | ||
description <- "Shuffling coordinates in coordinates.tsv" | ||
|
||
opt_parser <- OptionParser( | ||
usage = description, | ||
option_list = option_list | ||
) | ||
opt <- parse_args(opt_parser) | ||
|
||
# Use these filepaths as input | ||
coord_file <- opt$coordinates | ||
|
||
# Seed | ||
seed <- opt$seed | ||
set.seed(seed) | ||
|
||
## Your code goes here | ||
df <- read.delim(coord_file, sep = "\t", row.names = 1) | ||
if (any(!(c("x", "y") %in% colnames(df)))){ | ||
stop("X and y coordinates are not present in the file. Check your file.") | ||
} | ||
|
||
# Randomize IDs, but keep the same order of IDs (not really necessary) | ||
df_order <- rownames(df) | ||
rownames(df) <- sample(rownames(df)) | ||
df_final <- df[order(match(rownames(df), df_order)),] | ||
|
||
## Write output | ||
outfile <- file(opt$out_file) | ||
write.table(df_final, outfile, sep = "\t", col.names = NA, quote = FALSE) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
channels: | ||
- conda-forge | ||
- defaults | ||
dependencies: | ||
- r-base==4.3.1 | ||
- r-optparse=1.7.3 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
#!/usr/bin/env Rscript | ||
|
||
# Author_and_contribution: Niklas Mueller-Boetticher; created template | ||
# Author_and_contribution: Kim Vucinic; modified template and created script | ||
|
||
suppressPackageStartupMessages(library(optparse)) | ||
|
||
# Arguments | ||
option_list <- list( | ||
make_option( | ||
c("-l", "--labels"), | ||
type = "character", default = NULL, | ||
help = "Labels from domain clustering. Path to labels (as tsv)." | ||
), | ||
make_option( | ||
c("--seed"), | ||
type = "integer", default = NULL, | ||
help = "Seed to use for random operations." | ||
), | ||
make_option( | ||
c("-o", "--out_file"), | ||
type = "character", default = NULL, | ||
help = "Output file." | ||
) | ||
) | ||
|
||
# Description | ||
description <- "Shuffling labels..." | ||
|
||
opt_parser <- OptionParser( | ||
usage = description, | ||
option_list = option_list | ||
) | ||
opt <- parse_args(opt_parser) | ||
|
||
# Use these filepaths as input | ||
label_file <- opt$labels | ||
|
||
# Seed | ||
seed <- opt$seed | ||
set.seed(seed) | ||
|
||
## Your code goes here | ||
df <- read.delim(label_file, sep = "\t", row.names = 1) | ||
if (!("label" %in% colnames(df))){ | ||
stop("Label column not present in the file. Check your file.") | ||
} | ||
|
||
# Randomize labels | ||
df_randomized <- data.frame(label = sample(df$label)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We might have an additional colum, in this dataframe that splits the label into high and low confidence. Should that be shuffled too? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think you mean "if we shuffle the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, good point. In that case the code still needs to be adjusted to keep the additional columns untouched. Make sure only the labels are shuffled and the rownames still match all the other existing columns There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sounds good. I'll add the changes :) |
||
rownames(df_randomized) <- rownames(df) | ||
|
||
## Write output | ||
outfile <- file(opt$out_file) | ||
write.table(df_randomized, outfile, sep = "\t", col.names = NA, quote = FALSE) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
channels: | ||
- conda-forge | ||
- defaults | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would remove defaults. Shouldn't be needed here |
||
dependencies: | ||
- r-base==4.3.1 | ||
- r-optparse=1.7.3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would remove defaults. Shouldn't be needed here