The SyntheticDatasets.jl package is a library with functions for generating synthetic artificial datasets.
The package can be installed with the Julia package manager.
From the Julia REPL, type ]
to enter the Pkg REPL mode and run:
pkg> add SyntheticDatasets
Or, equivalently, via the Pkg
API:
julia> import Pkg; Pkg.add("SyntheticDatasets")
A set of pluto notebooks and codes demonstrating the project's current functionality is available in the examples folder.
Here are a few examples to show the Package capabilities:
using StatsPlots, SyntheticDatasets
blobs = SyntheticDatasets.make_blobs( n_samples = 1000,
n_features = 2,
centers = [-1 1; -0.5 0.5],
cluster_std = 0.25,
center_box = (-2.0, 2.0),
shuffle = true,
random_state = nothing);
@df blobs scatter(:feature_1, :feature_2, group = :label, title = "Blobs")
gauss = SyntheticDatasets.make_gaussian_quantiles( mean = [10,1],
cov = 2.0,
n_samples = 1000,
n_features = 2,
n_classes = 3,
shuffle = true,
random_state = 2);
@df gauss scatter(:feature_1, :feature_2, group = :label, title = "Gaussian Quantiles")
spirals = SyntheticDatasets.make_twospirals(n_samples = 2000,
start_degrees = 90,
total_degrees = 570,
noise =0.1);
@df spirals scatter(:feature_1, :feature_2, group = :label, title = "Two Spirals")
kernel = SyntheticDatasets.make_halfkernel( n_samples = 1000,
minx = -20,
r1 = 20,
r2 = 35,
noise = 3.0,
ratio = 0.6);
@df kernel scatter(:feature_1, :feature_2, group = :label, title = "Half Kernel")
The SyntheticDatasets.jl is a library with functions for generating synthetic artificial datasets. The package has some functions are interfaces to the dataset generator of the ScikitLearn.
List of package datasets:
Dataset | Title | Reference |
---|---|---|
make_blobs | Generate isotropic Gaussian blobs for clustering. | link |
make_moons | Make two interleaving half circles | link |
make_s_curve | Generate an S curve dataset. | link |
make_regression | Generate a random regression problem. | link |
make_classification | Generate a random n-class classification problem. | link |
make_friedman1 | Generate the “Friedman #1” regression problem. | link |
make_friedman2 | Generate the “Friedman #2” regression problem. | link |
make_friedman3 | Generate the “Friedman #3” regression problem. | link |
make_circles | Make a large circle containing a smaller circle in 2d | link |
make_regression | Generate a random regression problem. | link |
make_classification | Generate a random n-class classification problem. | link |
make_low_rank_matrix | Generate a mostly low rank matrix with bell-shaped singular values. | link |
make_swiss_roll | Generate a swiss roll dataset. | link |
make_hastie_10_2 | Generates data for binary classification used in Hastie et al. | link |
make_gaussian_quantiles | Generate isotropic Gaussian and label samples by quantile. | link |
Disclaimer: SyntheticDatasets.jl borrows code and documentation from scikit-learn in the dataset module, but it is not an official part of that project. It is licensed under MIT.
Dataset | Title | Reference |
---|---|---|
make_twospirals | Generate two spirals dataset. | link |
make_halfkernel | Generate two half kernel dataset. | link |
make_outlier | Generate outlier dataset. | link |