Skip to content

Commit

Permalink
Rename to StreamSampling (#55)
Browse files Browse the repository at this point in the history
  • Loading branch information
Tortar committed Apr 15, 2024
1 parent ce9d802 commit 22f4614
Show file tree
Hide file tree
Showing 9 changed files with 66 additions and 109 deletions.
4 changes: 2 additions & 2 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -61,8 +61,8 @@ jobs:
- run: |
julia --project=./docs -e '
using Documenter: doctest
using IteratorSampling
doctest(IteratorSampling)'
using StreamSampling
doctest(StreamSampling)'
- run: julia --project=./docs ./docs/make.jl
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
Expand Down
4 changes: 2 additions & 2 deletions Project.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
name = "IteratorSampling"
name = "StreamSampling"
uuid = "ef79a3d2-ae9f-5cd2-ab61-e13847810a6e"
version = "0.2.11"
version = "0.3.0"

[deps]
DataStructures = "864edb3b-99cc-5e75-8d2d-829cb0a9cfe8"
Expand Down
16 changes: 8 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
# IteratorSampling.jl
# StreamSampling.jl

[![CI](https://github.com/JuliaDynamics/IteratorSampling.jl/workflows/CI/badge.svg)](https://github.com/JuliaDynamics/IteratorSampling.jl/actions?query=workflow%3ACI)
[![](https://img.shields.io/badge/docs-stable-blue.svg)](https://juliadynamics.github.io/IteratorSampling.jl/stable/)
[![codecov](https://codecov.io/gh/JuliaDynamics/IteratorSampling.jl/graph/badge.svg?token=F8W0MC53Z0)](https://codecov.io/gh/JuliaDynamics/IteratorSampling.jl)
[![CI](https://github.com/JuliaDynamics/StreamSampling.jl/workflows/CI/badge.svg)](https://github.com/JuliaDynamics/StreamSampling.jl/actions?query=workflow%3ACI)
[![](https://img.shields.io/badge/docs-stable-blue.svg)](https://juliadynamics.github.io/StreamSampling.jl/stable/)
[![codecov](https://codecov.io/gh/JuliaDynamics/StreamSampling.jl/graph/badge.svg?token=F8W0MC53Z0)](https://codecov.io/gh/JuliaDynamics/StreamSampling.jl)
[![Aqua QA](https://raw.githubusercontent.com/JuliaTesting/Aqua.jl/master/badge.svg)](https://github.com/JuliaTesting/Aqua.jl)


This package allows to sample from any iterable in a single pass through the data,
even if the number of items in the collection is unknown.
This package allows to sample from any stream in a single pass through the data,
even if the number of items is unknown.

If the iterable is lazy, the memory required grows in relation to the size of the
sample, instead of the all population, which can be useful for sampling from big
Expand All @@ -18,7 +18,7 @@ is also much faster in some common cases, as highlighted below:


```julia
julia> using IteratorSampling
julia> using StreamSampling

julia> using BenchmarkTools, Random, StatsBase

Expand Down Expand Up @@ -53,4 +53,4 @@ julia> @btime sample($rng, collect($iter), Weights($wv.($iter)), 10^4; replace=f
317.230 ms (43 allocations: 370.19 MiB)
```

More information can be found in the [documentation](https://juliadynamics.github.io/IteratorSampling.jl/stable/).
More information can be found in the [documentation](https://juliadynamics.github.io/StreamSampling.jl/stable/).
2 changes: 1 addition & 1 deletion docs/Project.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[deps]
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
IteratorSampling = "ef79a3d2-ae9f-5cd2-ab61-e13847810a6e"
StreamSampling = "ef79a3d2-ae9f-5cd2-ab61-e13847810a6e"
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"

[compat]
Expand Down
8 changes: 4 additions & 4 deletions docs/make.jl
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
using Documenter
using IteratorSampling
using StreamSampling

println("Documentation Build")
makedocs(
modules = [IteratorSampling],
sitename = "IteratorSampling.jl",
modules = [StreamSampling],
sitename = "StreamSampling.jl",
pages = [
"API" => "index.md",
],
Expand All @@ -14,7 +14,7 @@ makedocs(
CI = get(ENV, "CI", nothing) == "true" || get(ENV, "GITHUB_TOKEN", nothing) !== nothing
if CI
deploydocs(
repo = "github.com/JuliaDynamics/IteratorSampling.jl.git",
repo = "github.com/JuliaDynamics/StreamSampling.jl.git",
target = "build",
push_preview = true,
devbranch = "main",
Expand Down
15 changes: 8 additions & 7 deletions docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,14 @@
itsample
```

## Specific algorithms
# Implemented ALgorithms

```@docs
reservoir_sample
reservoir_sample_without_replacement
reservoir_sample_with_replacement
weighted_reservoir_sample_without_replacement
weighted_reservoir_sample_with_replacement
sortedindices_sample
StreamSampling.AlgL
StreamSampling.AlgR
StreamSampling.AlgRSWRSKIP
StreamSampling.AlgAExpJ
StreamSampling.AlgARes
StreamSampling.AlgWRSWRSKIP
StreamSampling.sortedindices_sample
```
118 changes: 37 additions & 81 deletions src/IteratorSampling.jl → src/StreamSampling.jl
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
module IteratorSampling
module StreamSampling

using DataStructures
using Distributions
Expand All @@ -24,11 +24,39 @@ abstract type AbstractWrReservoirSampleMulti <: AbstractReservoirSampleMulti end
abstract type AbstractOrdWrReservoirSampleMulti <: AbstractWrReservoirSampleMulti end

abstract type ReservoirAlgorithm end

"""
Adapted from algorithm L described in "Random sampling with a reservoir, J. S. Vitter, 1985".
"""
struct AlgL <: ReservoirAlgorithm end

"""
Adapted from algorithm R described in "Random sampling with a reservoir, J. S. Vitter, 1985".
"""
struct AlgR <: ReservoirAlgorithm end

"""
Adapted fron algorithm RSWR_SKIP described in "Reservoir-based Random Sampling with Replacement from
Data Stream, B. Park et al., 2008".
"""
struct AlgRSWRSKIP <: ReservoirAlgorithm end

"""
Adapted from algorithm A-Res described in "Weighted random sampling with a reservoir,
P. S. Efraimidis et al., 2006".
"""
struct AlgARes <: ReservoirAlgorithm end

"""
Adapted from algorithm A-ExpJ described in "Weighted random sampling with a reservoir,
P. S. Efraimidis et al., 2006".
"""
struct AlgAExpJ <: ReservoirAlgorithm end

"""
Adapted from algorithm WRSWR_SKIP described in "A Skip-based Algorithm for Weighted Reservoir
Sampling with Replacement, A. Meligrana, 2024".
"""
struct AlgWRSWRSKIP <: ReservoirAlgorithm end

const algL = AlgL()
Expand All @@ -48,22 +76,23 @@ include("WeightedSamplingSingle.jl")
include("WeightedSamplingMulti.jl")

"""
itsample([rng], iter, [wv]; kwargs...)
itsample([rng], iter, method = algL)
itsample([rng], iter, wv, method = algAExpJ)
Return a random element of the iterator, optionally specifying a `rng`
(which defaults to `Random.default_rng()`) and a `wv` function.
If the iterator is empty, it returns `nothing`.
-----
itsample([rng], iter, [wv], n::Int; replace = false, ordered = false, kwargs...)
itsample([rng], iter, method = algL; ordered = false)
itsample([rng], iter, wv, method = algAExpJ; ordered = false)
Return a vector of `n` random elements of the iterator,
optionally specifying a `rng` (which defaults to `Random.default_rng()`).
`replace` dictates whether sampling is performed with replacement.
`ordered` dictates whether an ordered sample (also called a sequential
sample, i.e. a sample where items appear in the same order as in `iter`).
optionally specifying a `rng` (which defaults to `Random.default_rng()`)
and a `method`. `ordered` dictates whether an ordered sample (also called a sequential
sample, i.e. a sample where items appear in the same order as in `iter`) must be
collected.
If the iterator has less than `n` elements, in the case of sampling without
replacement, it returns a vector of those elements.
Expand All @@ -72,77 +101,6 @@ function itsample end

export itsample

"""
reservoir_sample(rng, iter, [wv]; method = :alg_L)
reservoir_sample(rng, iter, [wv], n; replace = false, ordered = false, kwargs...)
Reservoir sampling algorithm with and without replacement.
The optional `kwargs` are passed to more specific methods called internally by the
function, which can either be
- [`reservoir_sample_without_replacement`](@ref)
- [`reservoir_sample_with_replacement`](@ref)
- [`weighted_reservoir_sample_without_replacement`](@ref)
- [`weighted_reservoir_sample_with_replacement`](@ref)
depending to the kind of sampling performed.
"""
function reservoir_sample end

export reservoir_sample

"""
reservoir_sample_without_replacement(rng, iter, n; ordered = false, method = :alg_L)
Reservoir sampling algorithm without replacement. The `method` keyword can be either `:alg_L` or
`:alg_R`.
Adapted from algorithms R and L described in "Random sampling with a reservoir, J. S. Vitter, 1985".
"""
function reservoir_sample_without_replacement end

export reservoir_sample_without_replacement

"""
reservoir_sample_with_replacement(rng, iter, n; ordered = false)
Reservoir sampling algorithm with replacement.
Adapted fron algorithm RSWR_SKIP described in "Reservoir-based Random Sampling with Replacement from
Data Stream, B. Park et al., 2008".
"""
function reservoir_sample_with_replacement end

export reservoir_sample_with_replacement

"""
weighted_reservoir_sample_without_replacement(rng, iter, wv, n; ordered = false, method = :alg_AExpJ)
Weighted reservoir sampling algorithm without replacement. The `method` keyword can be
either `:alg_ARes` or `:alg_AExpJ`. `wv` should be a function which accept an element
of the iterator and returns a `Float64`.
Adapted from algorithm A-Res and A-ExpJ described in "Weighted random sampling with a reservoir,
P. S. Efraimidis et al., 2006".
"""
function weighted_reservoir_sample_without_replacement end

export weighted_reservoir_sample_without_replacement

"""
weighted_reservoir_sample_with_replacement(rng, iter, wv, n; ordered = false)
Weighted reservoir sampling algorithm without replacement. `wv` should be a function
which accept an element of the iterator and returns a `Float64`.
Adapted from algorithm WRSWR_SKIP described in "A Skip-based Algorithm for Weighted Reservoir
Sampling with Replacement, A. Meligrana, 2024".
"""
function weighted_reservoir_sample_with_replacement end

export weighted_reservoir_sample_with_replacement

"""
sortedindices_sample(rng, iter)
sortedindices_sample(rng, iter, n; replace = false, ordered = false)
Expand All @@ -153,6 +111,4 @@ before starting the sampling.
"""
function sortedindices_sample end

export sortedindices_sample

end
2 changes: 1 addition & 1 deletion src/WeightedSamplingMulti.jl
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,7 @@ function transform(rng, reservoir, ::WORSample)
end

function weighted_reservoir_sample_with_replacement(rng, iter, wv, n, is::Union{WRSample, OrdWRSample})
iter_type = IteratorSampling.calculate_eltype(iter)
iter_type = calculate_eltype(iter)
it = iterate(iter)
isnothing(it) && return iter_type[]
reservoir = Vector{iter_type}(undef, n)
Expand Down
6 changes: 3 additions & 3 deletions test/runtests.jl
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@

using IteratorSampling
using StreamSampling

using Distributions
using HypothesisTests
using Random
using StableRNGs
using Test

@testset "IteratorSampling.jl Tests" begin
@testset "StreamSampling.jl Tests" begin
include("package_sanity_tests.jl")
include("unweighted_sampling_single_tests.jl")
include("unweighted_sampling_multi_tests.jl")
include("weighted_sampling_single_tests.jl")
include("weighted_sampling_multi_tests.jl")
end
end

0 comments on commit 22f4614

Please sign in to comment.