Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: mottensmann/GCalignR
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: v1.0.4
Choose a base ref
...
head repository: mottensmann/GCalignR
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: master
Choose a head ref
  • 16 commits
  • 24 files changed
  • 2 contributors

Commits on Jun 19, 2022

  1. update to 1.0.5

    mottensmann committed Jun 19, 2022
    Copy the full SHA
    6161b3b View commit details
  2. Copy the full SHA
    3bc8d0c View commit details
  3. Copy the full SHA
    03c51d3 View commit details
  4. Update to 1.0.5

    mottensmann committed Jun 19, 2022
    Copy the full SHA
    0f348b9 View commit details

Commits on Jun 20, 2023

  1. small change of README

    mottensmann committed Jun 20, 2023
    Copy the full SHA
    9840a64 View commit details
  2. typo

    mottensmann committed Jun 20, 2023
    Copy the full SHA
    bd78c5f View commit details

Commits on Jan 22, 2024

  1. removed unused argument

    mottensmann committed Jan 22, 2024
    Copy the full SHA
    763e6f9 View commit details
  2. update to new bibentry()

    mottensmann committed Jan 22, 2024
    Copy the full SHA
    2489f14 View commit details
  3. release 1.0.6

    mottensmann committed Jan 22, 2024
    Copy the full SHA
    4d5b267 View commit details

Commits on Jan 24, 2024

  1. tagged v1.6.0

    mottensmann committed Jan 24, 2024
    Copy the full SHA
    1525110 View commit details

Commits on May 17, 2024

  1. Copy the full SHA
    a6e906b View commit details

Commits on Jul 3, 2024

  1. Merge pull request #28 from jarioksa/adonis-fix

    vegan::adonis is deprecated, will be defunct: use vegan::adonis2
    mottensmann authored Jul 3, 2024
    Copy the full SHA
    020f024 View commit details
  2. Merge pull request #28 from jarioksa/adonis-fix

    vegan::adonis is deprecated, will be defunct: use vegan::adonis2
    mottensmann committed Jul 3, 2024
    Copy the full SHA
    b3b46e1 View commit details
  3. release 1.0.7

    mottensmann committed Jul 3, 2024
    Copy the full SHA
    0dded4a View commit details
  4. Copy the full SHA
    ed56504 View commit details

Commits on Sep 27, 2024

  1. 1.0.7 on CRAN

    mottensmann committed Sep 27, 2024
    Copy the full SHA
    397649f View commit details
1 change: 1 addition & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
@@ -12,3 +12,4 @@ NEWS.html
internal
vignettes/test_GCalignR*
README.html
^CRAN-SUBMISSION$
3 changes: 3 additions & 0 deletions CRAN-SUBMISSION
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Version: 1.0.7
Date: 2024-07-03 17:46:40 UTC
SHA: ed565043a1a73846929ba038570c4ea7290166b4
7 changes: 4 additions & 3 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Package: GCalignR
Title: Simple Peak Alignment for Gas-Chromatography Data
Version: 1.0.4
Date: 2022-02-09
Version: 1.0.7.1
Date: 2024-09-29
Encoding: UTF-8
Authors@R: c(
person("Meinolf", "Ottensmann", email = "meinolf.ottensmann@web.de", role = c("aut","cre"),
@@ -28,11 +28,12 @@ Imports:
stringr,
utils,
pbapply,
methods,
tibble
License: GPL (>= 2) | file LICENSE
Language: en-GB
LazyData: true
RoxygenNote: 7.1.2
RoxygenNote: 7.3.2
Suggests:
knitr,
pander,
169 changes: 97 additions & 72 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,110 +1,135 @@

# GCalignR 1.0.7

------------------------------------------------------------------------

- replacing defunct vegan::adonis by vegan::adonis2

# GCalignR 1.0.6

------------------------------------------------------------------------

- Removing unused argument `gc_peak_df` from `align_peaks`

# GCalignR 1.0.5

------------------------------------------------------------------------

- Bugfix in `choose_optimal_reference` that always selected the first
sample as a reference. Thanks to Heberto del Rio who pointed this out
on <https://github.com/mottensmann/GCalignR/issues/27>

# GCalignR 1.0.3.9

- Small bug fixed that caused problems when plotting x-axis labels in
`gc_heatmap`. An error occurred in case of two peaks that were
identical when rounded to decimal places.
- Added a test for detecting inconsistently ordered retention times
within samples. Retention times are expected in increasing order,
starting with the lowest number. If this assumption is violated,
retention times are reordered as indicated by a warning.
- **Speedboost** when setting `max_diff_peak2mean = 0`: In this special
case there is no need to use a time-consuming iterative approach but
peaks can be sorted simply based on absolute values. This is
implemented in two steps. (1) Across all samples, unique retention
times are extracted, sorted in increasing temporal order and written
to a template data frame. (2) For each sample, peaks are matched to
the corresponding row of the template data frame.
- Small bug fixed that caused problems when plotting x-axis labels in
`gc_heatmap`.
- Added a test for detecting inconsistently ordered retention times
within samples. Retention times are expected in increasing order,
starting with the lowest number. If this assumption is violated,
retention times are reordered and a warning is shown.

# GCalignR 1.0.3

-----

- Added `fill = TRUE` as a parameter in `utils::read.table` when
reading data from text within internal functions. *Loading GC data
with utils::read.table failed in cases of missing values in a column
(i.e. empty). This is the correct behaviour as missing data should
always be coded explicitly by ‘NA’*
- Tibbles are now coerced to data frames
- Added a new boolean parameter `remove_empty` for the main function
`align_chromatograms`. If samples are empty (i.e.. no peak) this
parameter allows to remove those cases from the dataset to avoid
problems in post-hoc analyses. By default `FALSE`, i.e.. all but the
blank samples are kept.
- Added a new boolean parameter `permute` for the functions
`align_chromatograms` and `align_peaks`. This allows to change the
default behaviour of random permutation of samples during the
alignment and might be useful if exact replication is needed.
------------------------------------------------------------------------

- Added `fill = TRUE` as a parameter in `utils::read.table` when reading
data from text within internal functions. *Loading GC data with
utils::read.table failed in cases of missing values in a column
(i.e. empty). This is the correct behaviour as missing data should
always be coded explicitly by ‘NA’*
- Tibbles are now coerced to data frames
- Added a new boolean parameter `remove_empty` for the main function
`align_chromatograms`. If samples are empty (i.e.. no peak) this
parameter allows to remove those cases from the dataset to avoid
problems in post-hoc analyses. By default `FALSE`, i.e.. all but the
blank samples are kept.
- Added a new boolean parameter `permute` for the functions
`align_chromatograms` and `align_peaks`. This allows to change the
default behaviour of random permutation of samples during the
alignment and might be useful if exact replication is needed.

# GCalignR 1.0.2

-----
------------------------------------------------------------------------

- The accompanying manuscript is published
<https://doi.org/10.1371/journal.pone.0198311> and the citation has
been added
- The function *beta* `read_empower2` allows to import HPLC data that
has been generated using the EMPOWER 2 software
- The accompanying manuscript is published
<https://doi.org/10.1371/journal.pone.0198311> and the citation has
been added
- The function *beta* `read_empower2` allows to import HPLC data that
has been generated using the EMPOWER 2 software

# GCalignR 1.0.1

-----
------------------------------------------------------------------------

**Bugfixes**

- A bugfix was applied for handling multiple blanks correctly.
- Progressbars are removed in non-interactive R sessions
- A bugfix was applied for handling multiple blanks correctly.
- Progressbars are removed in non-interactive R sessions

-----
------------------------------------------------------------------------

# GCalignR 1.0.0

**New functions implemented**

- `choose_optimal_reference` offers an automatism to pick suitable
references.
- `draw_chromatograms` allows to represent a peak list in form of
chromatogram.
- `remove_blanks`allows to get rid of peaks that represent
contamination after aligning a dataset
- `remove_singletons` allows to remove single peaks from the dataset
after aligning
- `merge_redundant_rows` allows to merge rows that were not recognised
as redundant during the alignment by increasing the threshold value
for the evaluation of similarity
- `choose_optimal_reference` offers an automatism to pick suitable
references.
- `draw_chromatograms` allows to represent a peak list in form of
chromatogram.
- `remove_blanks`allows to get rid of peaks that represent contamination
after aligning a dataset
- `remove_singletons` allows to remove single peaks from the dataset
after aligning
- `merge_redundant_rows` allows to merge rows that were not recognised
as redundant during the alignment by increasing the threshold value
for the evaluation of similarity

**Algorithm**

- Using `pbapply`, we implemented progress bars to inform the user
about the progress and the estimated running time of intermediate
steps in the alignment of peak lists.
- By implementing more efficient code, we were able to speed up the
processing, especially picking references is faster by an order of
magnitude.
- Retention times are not rounded to two decimals anymore.
Calculations still capture a precision of two decimals for
computational reasons.
- Within the aligned results, retention times correspond to the input
values. Linear adjustments are only used internally and are
documented within the Logfile accessible in the output.
- Reference samples that are used for the coarse alignment of
retention times can be picked using a novel algorithm that
determines the average similarity across the dataset.
- Using `pbapply`, we implemented progress bars to inform the user about
the progress and the estimated running time of intermediate steps in
the alignment of peak lists.
- By implementing more efficient code, we were able to speed up the
processing, especially picking references is faster by an order of
magnitude.
- Retention times are not rounded to two decimals anymore. Calculations
still capture a precision of two decimals for computational reasons.
- Within the aligned results, retention times correspond to the input
values. Linear adjustments are only used internally and are documented
within the Logfile accessible in the output.
- Reference samples that are used for the coarse alignment of retention
times can be picked using a novel algorithm that determines the
average similarity across the dataset.

**warning messages**

- Warnings addressing formatting issues are now more explicit and
partly rephrased to avoid ambiguity.
- Warnings addressing formatting issues are now more explicit and partly
rephrased to avoid ambiguity.

**Plots**

- Added horizontal axis to barplots summarising peak numbers in
`plot.GCalign`.
- Changed to more prominent colours in binary heatmaps with
`gc_heatmap`.
- The function `draw_chromatograms` was added as another visualisation
tool.
- Added horizontal axis to barplots summarising peak numbers in
`plot.GCalign`.
- Changed to more prominent colours in binary heatmaps with
`gc_heatmap`.
- The function `draw_chromatograms` was added as another visualisation
tool.

**Vignettes**

- We included a second vignette that explains the algorithm and the
supported data in detail.
- We included a second vignette that explains the algorithm and the
supported data in detail.

**Documentation**

- Helpfiles were rewritten to enhance clarity.
- Helpfiles were rewritten to enhance clarity.

-----
------------------------------------------------------------------------
13 changes: 13 additions & 0 deletions NEWS.rmd
Original file line number Diff line number Diff line change
@@ -3,6 +3,19 @@ output: github_document
html_preview: false
---

# GCalignR 1.0.7
___
* replacing defunct vegan::adonis by vegan::adonis2

# GCalignR 1.0.6
___
* Removing unused argument `gc_peak_df` from `align_peaks`

# GCalignR 1.0.5
___

* Bugfix in `choose_optimal_reference` that always selected the first sample as a reference. Thanks to Heberto del Rio who pointed this out on https://github.com/mottensmann/GCalignR/issues/27

# GCalignR 1.0.3.9

* **Speedboost** when setting `max_diff_peak2mean = 0`: In this special case there is no need to use a time-consuming iterative approach but peaks can be sorted simply based on absolute values. This is implemented in two steps. (1) Across all samples, unique retention times are extracted, sorted in increasing temporal order and written to a template data frame. (2) For each sample, peaks are matched to the corresponding row of the template data frame.
3 changes: 1 addition & 2 deletions R/GCalignR.R
Original file line number Diff line number Diff line change
@@ -30,7 +30,6 @@
#'@details
#' More details on the package are found in the vignettes that can be accessed via \code{browseVignettes("GCalignR")}.
#'
#' @docType package
#' @name GCalignR
#'
NULL
"_PACKAGE"
4 changes: 0 additions & 4 deletions R/align_peaks.R
Original file line number Diff line number Diff line change
@@ -24,10 +24,6 @@
#'\strong{max_diff_peak2mean} around the mean retention time no shifting is done
#'and the algorithm proceeds with the following sample.
#'
#'@param gc_peak_df data.frame containing GC-data (e.g. retention time, peak
#' area, peak height) of one sample. Variables are stored in columns, rows
#' represent peaks.
#'
#'@param gc_peak_list List of data.frames. Each data.frame contains GC-data
#' (e.g. retention time, peak area, peak height) of one sample. Variables are
#' stored in columns. Rows represent distinct peaks. Retention time is a
2 changes: 1 addition & 1 deletion R/blank_substraction.R
Original file line number Diff line number Diff line change
@@ -59,7 +59,7 @@ if (is.null(input)) stop("No input was defined")
if (is.null(blanks)) stop("Define name(s) of blanks")

# read data and prepare a list
if (class(input) == "GCalign") {
if (inherits(input, "GCalign")) {
if (is.null(conc_col_name)) stop("Define the name of a data frame")
if (conc_col_name %in% names(input[["aligned"]])) {
input2 <- input[["aligned"]][[conc_col_name]]
9 changes: 6 additions & 3 deletions R/choose_optimal_reference.R
Original file line number Diff line number Diff line change
@@ -4,7 +4,7 @@
#' Full alignments of peak lists require the specification of a fixed reference to which all other samples are aligned to. This function provides an simple algorithm to find the most suitable sample among a dataset. The so defined reference can be used for full alignments using \code{\link{linear_transformation}}. The functions is evoked internally by \code{\link{align_chromatograms}} if no reference was specified by the user.
#'
#' @details
#' Every sample is considered in determining the optimal reference in comparison to all other samples by estimating the similarity to all other samples. For a reference-sample pair, the deviation in retention times between all reference peaks and the always nearest peak in the sample is summed and divided by the number of reference peaks. The calculated value is a similarity score that converges to zero the more similar reference and sample are. For every potential reference, the median score of all pair-wise comparisons is used as a similarity proxy. The optimal sample is then defined by the minimum value among these scores. This functions is used internally in \code{\link{align_chromatograms}} to select a reference if non was specified by the user.
#' Every sample is considered in determining the optimal reference in comparison to all other samples by estimating the similarity to all other samples. For a reference-sample pair, the deviation in retention times between all reference peaks and the always nearest peak in the sample is summed up and divided by the number of reference peaks. The calculated value is a similarity score that converges to zero the more similar reference and sample are. For every potential reference, the median score of all pair-wise comparisons is used as a similarity proxy. The optimal sample is then defined by the minimum value among these scores. This functions is used internally in \code{\link{align_chromatograms}} to select a reference if non was specified by the user.
#'
#' @inheritParams align_chromatograms
#'
@@ -40,11 +40,14 @@ choose_optimal_reference <- function(data = NULL, rt_col_name = NULL, sep = "\t"
## get the median scores for shared peaks
x <- df_median_sim_score(gc_peak_list = gc_peak_list,rt_col_name = rt_col_name, method = method)

## take the best, depending on the method choose
## take the best, depending on the method chosen
if (method == "Match") {
index <- which(x[["score"]] == max(x[["score"]]))
} else if (method == "Deviance") {
index <- which(min(x[["score"]]/x[["n_peaks"]]) == min(x[["score"]]/x[["n_peaks"]]))
#index <- which(min(x[["score"]]/x[["n_peaks"]]) == min(x[["score"]]/x[["n_peaks"]]))
# Sun Jun 19 22:40:46 2022 ------------------------------
# Bugfix thanks to hebertodelrio on GitHub!
index <- which(x[["score"]]/x[["n_peaks"]] == min(x[["score"]]/x[["n_peaks"]]))
}

## If more than one would get the same score, take the most central run
8 changes: 4 additions & 4 deletions R/draw_chromatogram.R
Original file line number Diff line number Diff line change
@@ -78,16 +78,16 @@ draw_chromatogram <- function(data = NULL, rt_col_name = NULL, conc_col_name = N
out <- check_input(data = data, rt_col_name = rt_col_name, sep = sep, plot = F, message = F)
if (out == FALSE) stop("Data is not formatted correctly. See check_input for details")
} else {
if (class(data) == "GCalign") {
if (inherits(data, "GCalign")) {
if (!(rt_col_name %in% names(data[["aligned"]]))) stop(print(paste(rt_col_name,"is not a valid variable name. Data contains:",paste(names(data[["aligned"]]),collapse = " & "))))
} else if (class(data) == "list") {
} else if (inherits(data, "list")) {
out <- check_input(data = data, rt_col_name = rt_col_name, sep = sep, plot = F, message = F)
if (out == FALSE) stop("Data is malformed. See check_input for details")
}
}
if (is.character(data)) {
peak_list <- read_peak_list(data, sep, rt_col_name)
} else if (class(data) == "GCalign") {
} else if (inherits(data, "GCalign")) {
step <- match.arg(step, choices = c("aligned","input","shifted"))
if (step == "input") {
peak_list <- data[["input_list"]]
@@ -103,7 +103,7 @@ peak_list <- read_peak_list(data, sep, rt_col_name)
return(x)
})
}
} else if (class(data) == "list") {
} else if (inherits(data, "list")) {
peak_list <- lapply(data, FUN = function(x) {
if (any(is.na(rowSums(x)))) {
p <- as.vector(which(is.na(rowSums(x))))
2 changes: 1 addition & 1 deletion R/merge_redundant_rows.R
Original file line number Diff line number Diff line change
@@ -33,7 +33,7 @@
#' @export
#'
merge_redundant_rows <- function(data, min_diff_peak2peak = NULL) {
if (class(data) != "GCalign") stop("Only data of type GCalign is supported")
if (!methods::is(data, "GCalign")) stop("Only data of type GCalign is supported")
if (is.null(min_diff_peak2peak)) stop("Specify an numeric threshold value in minutes")
gc_peak_list_aligned <- data[["aligned_list"]]

2 changes: 1 addition & 1 deletion R/norm_peaks.R
Original file line number Diff line number Diff line change
@@ -34,7 +34,7 @@ out <- match.arg(out)
## some checks
if (is.null(conc_col_name)) {stop("List containing peak concentration is not specified. Define conc_col_name")}

if (class(data) == "GCalign") {
if (inherits(data, "GCalign")) {
which <- "aligned"
conc_list <- data[[which]][[conc_col_name]]
} else if (is.list(data)) {
2 changes: 1 addition & 1 deletion R/remove_blanks.R
Original file line number Diff line number Diff line change
@@ -24,7 +24,7 @@
#' @export
#'
remove_blanks <- function(data, blanks) {
if (class(data) == "GCalign") {
if (inherits(data, "GCalign")) {
rt_col_name <- data[["Logfile"]][["Call"]][["rt_col_name"]]
data <- data[["aligned_list"]]
} else if (is.list(data)) {
2 changes: 1 addition & 1 deletion R/remove_singletons.R
Original file line number Diff line number Diff line change
@@ -19,7 +19,7 @@
#' @export
#'
remove_singletons <- function(data) {
if (class(data) == "GCalign") {
if (inherits(data, "GCalign")) {
rt_col_name <- data[["Logfile"]][["Call"]][["rt_col_name"]]
data <- data[["aligned_list"]]
} else if (is.list(data)) {
Binary file added README-unnamed-chunk-7-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading