Skip to content

Commit

Permalink
Move first page of docs in ReadMe
Browse files Browse the repository at this point in the history
  • Loading branch information
Tortar authored Apr 23, 2024
1 parent 0acb185 commit efb166d
Showing 1 changed file with 81 additions and 1 deletion.
82 changes: 81 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,4 +15,84 @@ This has some advantages over other sampling procedures:
- In some cases, sampling with the techniques implemented in this library can bring considerable performance gains, since
the population of items doesn't need to be previously stored in memory.

More information can be found in the [documentation](https://juliadynamics.github.io/StreamSampling.jl/stable/).
## Brief overview of the functionalities

The [`itsample`](@ref) function allows to consume all the stream at once and return the sample collected:

```julia
julia> using StreamSampling

julia> st = 1:100;

julia> itsample(st, 5)
5-element Vector{Int64}:
9
15
52
96
91
```
In some cases, one needs to control the updates the [`ReservoirSample`](@ref) will be subject to. In this case
you can simply use the [`update!`](@ref) function to fit new values in the reservoir:

```julia
julia> using StreamSampling

julia> rs = ReservoirSample(Int, 5);

julia> for x in 1:100
update!(rs, x)
end

julia> value(rs)
5-element Vector{Int64}:
7
9
20
49
74
```

Consult the [API page](https://juliadynamics.github.io/StreamSampling.jl/stable/api/) for more information on these and other functionalities.

## Benchmark

As stated in the first section, using these sampling techniques can bring down considerably the memory usage of the program,
but there are cases where they are also more time efficient, as demostrated below with a comparison with the
equivalent methods of `StatsBase.sample`:

```julia
julia> using StreamSampling

julia> using BenchmarkTools, Random, StatsBase

julia> rng = Xoshiro(42);

julia> iter = Iterators.filter(x -> x != 10, 1:10^7);

julia> wv(el) = 1.0

julia> @btime itsample($rng, $iter, 10^4, algRSWRSKIP);
11.744 ms (5 allocations: 156.39 KiB)

julia> @btime sample($rng, collect($iter), 10^4; replace=true);
131.933 ms (20 allocations: 146.91 MiB)

julia> @btime itsample($rng, $iter, 10^4, algL);
10.260 ms (3 allocations: 78.22 KiB)

julia> @btime sample($rng, collect($iter), 10^4; replace=false);
132.069 ms (27 allocations: 147.05 MiB)

julia> @btime itsample($rng, $iter, $wv, 10^4, algWRSWRSKIP);
32.278 ms (18 allocations: 547.34 KiB)

julia> @btime sample($rng, collect($iter), Weights($wv.($iter)), 10^4; replace=true);
348.220 ms (49 allocations: 675.21 MiB)

julia> @btime itsample($rng, $iter, $wv, 10^4, algAExpJ);
39.965 ms (11 allocations: 234.78 KiB)

julia> @btime sample($rng, collect($iter), Weights($wv.($iter)), 10^4; replace=false);
306.039 ms (43 allocations: 370.19 MiB)
```

0 comments on commit efb166d

Please sign in to comment.