Skip to content

CRAN release 0.1.15

Compare
Choose a tag to compare
@jlmelville jlmelville released this 26 Jun 15:08
· 116 commits to master since this release

New features

  • New function: optimize_graph_layout. Use this to produce optimized output coordinates that
    reflect an input similarity graph (such as that produced by the similarity_graph function.
    similarity_graph followed by optimize_graph_layout is the same as running umap, so the
    purpose of these functions is to allow for more flexibility and decoupling between generating the
    nearest neighbor graph and optimizing the low-dimensional approximation to it. Based on a request
    by user Chengwei94 (#98).
  • New functions: simplicial_set_union and simplicial_set_intersect. These allow for the
    combination of different fuzzy graph representations of a dataset into a single fuzzy graph using
    the UMAP simplicial set operations. Based on a request in the Python UMAP issues tracker by user
    Dhar xion.
  • New parameter for umap_transform: ret_extra. This works like the equivalent parameter for
    umap, and should be a character vector specifying the extra information you would like returned
    in addition to the embedding, in which case a list will be returned with an embedding member
    containing the optimized coordinates. Supported values are "fgraph", "nn", "sigma" and
    "localr". Based on a request by user PedroMilanezAlmeida (#104).
  • New parameter from umap, tumap and umap_transform: seed. This will do the equivalent of
    calling set.seed internally, and hence will help with reproducibility. The chosen seed is
    exported if ret_model = TRUE and umap_transform will use that seed if present, so you only
    need to specify it in umap_transform if you want to change the seed. The default behavior remains
    to not modify the random number state. Based on a request by
    SuhasSrinivasan (#110).

Bug fixes and minor improvements

  • A new setting for init_sdev: set init_sdev = "range" and initial coordinates will be
    range-scaled so each column takes values between 0-10. This pre-processing was added to the Python
    UMAP package at some point after uwot began development and so should probably always be used
    with the default init = "spectral" setting. However, it is not set by default to maintain
    backwards compatibility with older versions of uwot.
  • ret_extra = c("sigma") is now supported by lvish. The Gaussian bandwidths are returned in a
    sigma vector. In addition, a vector of intrinsic dimensionalities estimated for each point using
    an analytical expression of the finite difference method given by
    Lee and co-workers is returned in the dint vector.
  • The min_dist and spread parameters are now returned in the model when umap is run with
    ret_model = TRUE. This is just for documentation purposes, these values are not used directly by
    the model in umap_transform. If the parameters a and b are set directly when invoking umap,
    then both min_dist and spread will be set to NULL in the returned model. This feature was
    added in response to a question from kjiang18 (#95).
  • Some new checks for NA values in input data have been added. Also a warning will be emitted if
    n_components seems to have been set too high.
  • If n_components was greater than n_neighbors then umap_transform would crash the R session.
    Thank you to ChVav for reporting this (#102).
  • Using umap_transform with a model where dens_scale was set could cause a segmentation fault,
    destroying the session. Even if it didn't it could give an entirely artifactual "ring" structure.
    Thank you FemkeSmit for reporting this and providing
    assistance in diagnosing the underlying cause (#103).
  • If you set binary_edge_weights = TRUE, this setting was not exported when ret_model = TRUE,
    and was therefore not respected by umap_transform. This has now been fixed, but you will need to
    regenerate any models that used binary edge weights.
  • The rdoc for the init param said that if there were multiple disconnected components, a
    spectral initialization would attempt to merge multiple sub-graphs. Not true: actually, spectral
    initialization is abandoned in favor of PCA. The documentation has been updated to reflect the true
    state of affairs. No idea what I was thinking of there.
  • load_model and save_model didn't work on Windows 7 due to how the version of tar there
    handles drive letters. Thank you mytarmail for the report (#109).
  • Warn if the initial coordinates have a very large scale (a standard deviation > 10.0), because
    this can lead to small gradients and poor optimization. Thank you SuhasSrinivasan for the
    report (#110).
  • A change to accommodate a forthcoming version of RcppAnnoy. Thank you Dirk Eddelbuettel for
    the PR (#111).
  • A test was failing on Arm architectures. Problem has been "solved" by removing the test, but it
    was testing a floating point value resulting from a failure due to numerical issues, so it's a bit
    of a corner case. Thank you Lucas Kanashiro for reporting (#100).