Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
Tortar authored Apr 18, 2024
1 parent a39c9e0 commit 8e6f084
Showing 1 changed file with 5 additions and 43 deletions.
48 changes: 5 additions & 43 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,52 +5,14 @@
[![codecov](https://codecov.io/gh/JuliaDynamics/StreamSampling.jl/graph/badge.svg?token=F8W0MC53Z0)](https://codecov.io/gh/JuliaDynamics/StreamSampling.jl)
[![Aqua QA](https://raw.githubusercontent.com/JuliaTesting/Aqua.jl/master/badge.svg)](https://github.com/JuliaTesting/Aqua.jl)


The scope of this package is providing general methods to sample from any stream in a single pass through the data, even when the number of items contained in the stream is unknown.
The scope of this package is to provide general methods to sample from any stream in a single pass through the data, even when
the number of items contained in the stream is unknown.

This has some advantages over other sampling procedures:

- If the iterable is lazy, the memory required grows in relation to the size of the sample, instead of the all population.
- The sample collected is a random sample of the portion of the stream seen thus far at any point of the sampling process.

Moreover, it turns out that sampling with the techniques implemented in this library
is also much faster in some common cases, as highlighted below:


```julia
julia> using StreamSampling

julia> using BenchmarkTools, Random, StatsBase

julia> rng = Xoshiro(42);

julia> iter = Iterators.filter(x -> x != 10, 1:10^7);

julia> wv(el) = 1.0

julia> @btime itsample($rng, $iter, 10^4, algRSWRSKIP);
11.744 ms (5 allocations: 156.39 KiB)

julia> @btime sample($rng, collect($iter), 10^4; replace=true);
131.933 ms (20 allocations: 146.91 MiB)

julia> @btime itsample($rng, $iter, 10^4, algL);
10.260 ms (3 allocations: 78.22 KiB)

julia> @btime sample($rng, collect($iter), 10^4; replace=false);
132.069 ms (27 allocations: 147.05 MiB)

julia> @btime itsample($rng, $iter, $wv, 10^4, algWRSWRSKIP);
32.278 ms (18 allocations: 547.34 KiB)

julia> @btime sample($rng, collect($iter), Weights($wv.($iter)), 10^4; replace=true);
348.220 ms (49 allocations: 675.21 MiB)

julia> @btime itsample($rng, $iter, $wv, 10^4, algAExpJ);
39.965 ms (11 allocations: 234.78 KiB)

julia> @btime sample($rng, collect($iter), Weights($wv.($iter)), 10^4; replace=false);
306.039 ms (43 allocations: 370.19 MiB)
```

- In some cases, sampling with the techniques implemented in this library can bring considerable performance gains, since
the population of items doesn't need to be previously stored in memory.

More information can be found in the [documentation](https://juliadynamics.github.io/StreamSampling.jl/dev/).

0 comments on commit 8e6f084

Please sign in to comment.