Skip to content
This repository has been archived by the owner on Feb 3, 2020. It is now read-only.

Commit

Permalink
Merge branch 'feature/heap' into develop
Browse files Browse the repository at this point in the history
  • Loading branch information
KristofferC committed Feb 19, 2015
2 parents b92af81 + 084f5de commit 06afa15
Show file tree
Hide file tree
Showing 10 changed files with 181 additions and 166 deletions.
1 change: 0 additions & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@ notifications:
script:
- julia -e 'Pkg.init(); Pkg.clone(pwd())'
- julia -e 'Pkg.add("FactCheck");'
- julia -e 'Pkg.add("ArrayViews");'
- julia -e 'Pkg.test("KDTrees", coverage=true)'

after_success:
Expand Down
22 changes: 10 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,13 @@ Kd trees for Julia.

[![Build Status](https://travis-ci.org/KristofferC/KDTrees.jl.svg?branch=master)](https://travis-ci.org/KristofferC/KDTrees.jl) [![Coverage Status](https://coveralls.io/repos/KristofferC/KDTrees.jl/badge.svg)](https://coveralls.io/r/KristofferC/KDTrees.jl)

Currently supports KNN-search and finding all points inside an hyper sphere centered at a given point. Currently only
uses Euclidean distance.
Currently supports KNN-search and finding all points inside an hyper sphere centered at a given point

Some care has been taken with regards to performance. For example the tree is not implemented as nodes pointing to other nodes but instead as a collection of densely packed arrays. This should give better cache locality. The negative aspect of this storage method is that the tree is immutable and new data can not be entered into the tree after it has been created.
Care has been taken with regards to performance. The tree is for example not naively implemented as nodes pointing to other nodes but instead as a collection of densely packed arrays. This gives better cache locality. This
however means that the tree is immutable and new points can not be entered into the tree after it has been created.

There are some benchmarks for the creation of the tree and the different searches in the benchmark folder.

Since this is a new project there are still some obvious improvements which are listed in the TODO list.

## Author
Kristoffer Carlsson (@KristofferC)

Expand All @@ -30,7 +28,8 @@ The `data` argument for the tree should be a matrix of floats of dimension `(n_d

### Points inside hyper sphere

Finds all points inside an hyper sphere centered at a given point. This is done with the exported function `query_ball_point(tree, point, radius)`. Returns the sorted indices of these points.
The exported `query_ball_point(tree, point, radius)` finds all points inside a hyper sphere centered at a given point with the given radius. The function
returns a sorted list of the indices of the points in the sphere.

```julia
using KDTrees
Expand All @@ -53,10 +52,7 @@ gives the indices:

### K-Nearest-Neighbours

Finds the *k* nearest neighbours to a given point. his is done with the exported function `k_nearest_neighbour(tree, point, k)`. Returns a tuple of two lists with the indices and the distances
from the given points respectively. These are sorted in the order of smallest to largest distance.

The current implementation is a bit slower than it has to be for large *k*.
The exported function `k_nearest_neighbour(tree, point, k)` finds the *k* nearest neighbours to a given point. The function returns a tuple of two lists with the indices and the distances from the given points respectively. These are sorted in the order of smallest to largest distance.

```julia
using KDTrees
Expand All @@ -70,11 +66,13 @@ gives both the indices and distances:

## Benchmarks

The benchmarks have been made with computer with a 4 core Intel i5-2500K @ 3.3 GHz with Julia v0.4.0-dev+3034.

Clicking on a plot takes you to the Plotly site for the plot where the exact data can be seen.

### KNN benchmark

[![bench_knn](https://plot.ly/~kcarlsson89/284.png)](https://plot.ly/~kcarlsson89/284/)
[![bench_knn](https://plot.ly/~kcarlsson89/346.png)](https://plot.ly/~kcarlsson89/346/)

### Build time benchmark

Expand All @@ -83,7 +81,7 @@ Clicking on a plot takes you to the Plotly site for the plot where the exact dat
## TODOs
* Add proper benchmarks, compare with others implementations. Update: Partly done
* Add other measures than Euclidean distance.
* Use a bounded priority queue for storing the K best points in KNN instead of a linear array (should only matter for large K). Julias built in PQ is slower than a normal array.
* Use a heap for storing the K best points in KNN instead of a linear array (should only matter for large K).

### Contribution

Expand Down
2 changes: 1 addition & 1 deletion REQUIRE
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
ArrayViews
julia 0.3
Compat
27 changes: 12 additions & 15 deletions benchmark/bench_build_tree.jl
Original file line number Diff line number Diff line change
@@ -1,21 +1,18 @@
using KDTrees
using StatsBase
using Plotly

function run_bench_build_tree(dim, knn, exps, rounds)
println("Running build tree benchmark for 10^(", exps, ") points in ", dim, " dimensions.")
n_points = [10^i for i in exps]

times = zeros(length(n_points), rounds)

# Compile it
tree = KDTree(randn(2,2))
k_nearest_neighbour(tree, zeros(2), 1)

timer = 0.0
for (i, n_point) in enumerate(n_points)
for (j , round) in enumerate(1:rounds)
n_iters = 5
println("Round ", j, " out of ", rounds, " for ", dim, "x", n_point, "...")
println("Round ", j, " out of ", rounds, " for ", dim, "x", n_point, "...\r")
while true
timer = time_ns()
for k in 1:n_iters
Expand All @@ -27,34 +24,34 @@ function run_bench_build_tree(dim, knn, exps, rounds)
n_iters *= 3
continue
end
break # Break this round
break # Ends this round
end
times[i, j] = timer / n_iters
end
println("\n")
end
println("Done!")
println("\nDone!")
return mean_and_std(times, 2)
end

dim = 3
knn = 5
exps = 1:0.5:6
exps = 3:0.5:6
rounds = 3

times, stds = run_bench_build_tree(dim, knn, exps, rounds)
times = vec(times)
stds = vec(stds)
# Plotting
####################


stderr = stds / sqrt(length(stds))
ymins = times - 1.96*stderr
ymaxs = times + 1.96*stderr
sizes = [10.0^i::Float64 for i in exps]


####################################################################
# Plotting
#=
using Plotly
####################################################################
data = [
[
"x" => sizes,
Expand Down Expand Up @@ -88,10 +85,10 @@ Plotly.signin("kcarlsson89", "lolololoololololo")


response = Plotly.plot(data, ["layout" => layout,
"filename" => "bench_build",
"filename" => "bench_build_x",
"fileopt" => "overwrite"])
plot_url = response["url"]
=#



#=
Expand Down
30 changes: 13 additions & 17 deletions benchmark/bench_knn.jl
Original file line number Diff line number Diff line change
Expand Up @@ -7,41 +7,38 @@ function run_bench_knn_points(dim, knn, exps, rounds)

times = zeros(length(n_points), rounds)

# Compile it
tree = KDTree(randn(2,2))
k_nearest_neighbour(tree, zeros(2), 1)

timer = 0.0
for (i, n_point) in enumerate(n_points)
for (j , round) in enumerate(1:rounds)
n_iters = 100
println("Round ", j, " out of ", rounds, " for ", dim, "x", n_point, "...")
data = rand(dim, int(n_point))
tree = KDTree(data, 15)
print("Round ", j, " out of ", rounds, " for ", dim, "x", int(n_point), "...\r")
data = float32(rand(dim, int(n_point)))
tree = KDTree(data, 5)
while true
timer = time_ns()
for k in 1:n_iters
p = rand(dim)
p = float32(rand(dim))
k_nearest_neighbour(tree, p, knn)
end
timer = (time_ns() - float(timer)) / 10^9 # To seconds
if timer < 1.0
n_iters *= 3
continue
end
break # Break this round
break # Ends this round
end
times[i, j] = timer / n_iters
end
print("\n")
end
println("Done!")
println("\nDone!")
return mean_and_std(1./times, 2)
end

data = Dict[]
for knn in [1, 5, 10]
dim = 3
exps = 1:0.5:6
exps = 3:0.5:6
rounds = 5

speeds, stds = run_bench_knn_points(dim, knn, exps, rounds)
Expand All @@ -58,7 +55,7 @@ for knn in [1, 5, 10]
"y" => speeds,
"type" => "scatter",
"mode" => "lines+markers",
"name" => string("k = ", knn),
"name" => string("k2 = ", knn),
"error_y" => [
"type" => "data",
"array" => stderr*1.96*2,
Expand All @@ -68,9 +65,9 @@ for knn in [1, 5, 10]
push!(data, trace)
end


####################################################################
# Plotting
####################
####################################################################

layout = [
"title" => "KNN search speed (dim = 3)",
Expand All @@ -86,11 +83,10 @@ layout = [
"autorange" => true
]

using Plotly
Plotly.signin("kcarlsson89", "lololololololol")
Plotly.signin("kcarlsson89", "lolololololol")

response = Plotly.plot(data, ["layout" => layout,
"filename" => "plotly-log-axes",
"filename" => "bench_linux_rly_inline_04",
"fileopt" => "overwrite"])
plot_url = response["url"]

Expand Down
6 changes: 6 additions & 0 deletions benchmark/bench_query_ball.jl
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,12 @@ run_bench_query_ball()


#=
2015-02-14
[1.512347500000003e-6 9.946277500000007e-6 5.1572081499999986e-5
3.7438639999999997e-6 2.3865769500000025e-5 0.00023704194749999996
7.439227999999991e-6 5.898463300000001e-5 0.0006188465289999998
1.2508987000000021e-5 0.0001077092400000001 0.001196649754000001]
2015-02-06: (removed old inaccurate results)
[6.202539999999983e-6 3.595097900000001e-5 0.00021109113900000007
1.3630627000000005e-5 7.9238053e-5 0.0006688952190000003
Expand Down
1 change: 0 additions & 1 deletion src/KDTrees.jl
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@ module KDTrees

import Base.show

using ArrayViews
using Compat

export KDTree
Expand Down
Loading

0 comments on commit 06afa15

Please sign in to comment.