Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
48c22f2
added delta estimation heuristic, tested on iris
jaroslavh Feb 11, 2019
a1492da
added classifyPoints method for iris and cleaned up blobs2d
jaroslavh Feb 11, 2019
2fa9c2e
bugfix: not using similarity_measure passed to finding delta function
jaroslavh Feb 12, 2019
d9eaa75
added new datasets, improved generation notebook
jaroslavh Feb 12, 2019
0562c3c
added testbed
jaroslavh Feb 12, 2019
825e900
bugfix: delta_medoids_full not updating cluster keys
jaroslavh Feb 14, 2019
9be61f5
added new datasets and better plotting of results
jaroslavh Feb 17, 2019
be9e24b
updated old files
jaroslavh Feb 17, 2019
7b1dd24
added random_selection algorithm
jaroslavh Feb 19, 2019
c7e6c41
moved greedy algorithm to bc_utils, improved random
jaroslavh Mar 1, 2019
3e1447b
added mfeat datasets to testbed
jaroslavh Mar 6, 2019
777b754
added task of thesis
jaroslavh Mar 6, 2019
f8ff7d3
Introduction first draft
jaroslavh Mar 10, 2019
b26c58a
drafted chapters with questions
jaroslavh Mar 10, 2019
e3a0509
docs added to greedy and random algorithms
jaroslavh Mar 10, 2019
a7742ac
removed obsoleted files, added sections
jaroslavh Mar 15, 2019
b5f5cb2
separation into chapters
jaroslavh Mar 29, 2019
b05fc2d
added first chapter to text
jaroslavh Apr 5, 2019
382a200
testbed simplified, methods moved to butils
jaroslavh Apr 5, 2019
4d2a781
text update
jaroslavh Apr 25, 2019
272f51b
added skeleton of testbed section
jaroslavh Apr 25, 2019
6209a6b
VH test, zda mohu editovat
Apr 25, 2019
67e859c
Upravy na str 9 a 11, oznaceny modre
Apr 25, 2019
33c5d97
Cervena barva pro komentare JH
Apr 25, 2019
d2565d0
comments from VH added to text, few code improvements
jaroslavh Apr 26, 2019
57bbec0
fixed communication mistakes with VH
jaroslavh Apr 26, 2019
4fb4717
final chapters division, 0.3 MK comments added
jaroslavh May 6, 2019
73f69c8
added first experiments to text
jaroslavh May 7, 2019
27e512f
added results to Experiment 2
jaroslavh May 8, 2019
05a96ef
Added part of the third experiment.
jaroslavh May 8, 2019
73d706a
added picture of datasets
jaroslavh May 9, 2019
fa74218
added labels to axes in datasets.png
jaroslavh May 9, 2019
2aced64
updated datasets chapter begining
jaroslavh May 9, 2019
790720e
merged chapters 2 and 3
jaroslavh May 9, 2019
1f37a72
added structure to introduction
jaroslavh May 9, 2019
96c149d
added citation, numbered equations
jaroslavh May 9, 2019
424892e
fixed few todo notes and added refs
jaroslavh May 9, 2019
b0803bf
VH in intro
May 9, 2019
76fda61
Intro finished.
May 9, 2019
267c356
Jeste jedna carka ve vete chybne. Byla to moje chyba.
May 9, 2019
88a6524
Merge branch 'sp2' of https://github.com/jaroslavh/bc into sp2
jaroslavh May 9, 2019
5d254d6
Kapitola Goals
May 10, 2019
47a3a73
Merge branch 'sp2' of https://github.com/jaroslavh/bc into sp2
jaroslavh May 10, 2019
f35dcb0
getting the repo up to date
jaroslavh May 10, 2019
404e754
added centroids explanation
jaroslavh May 10, 2019
54ed98a
added medoids img and definition
jaroslavh May 10, 2019
05563d9
updated experiments pictures - not final probably rhough
jaroslavh May 10, 2019
d2faa73
added experiment 4 template and few notes to it
jaroslavh May 10, 2019
c042c22
experiments updated
jaroslavh May 10, 2019
de41cc6
added kmeans pseudocode
jaroslavh May 10, 2019
fffbd51
final version w\o abstract, conclusion and time complexity
jaroslavh May 11, 2019
5b06a30
VH added dummy figure to replace a missing figure exp5.png
May 12, 2019
2805540
DummyFigure podruhe
May 12, 2019
522f751
minor modifications
jaroslavh May 12, 2019
3ec6289
Merge branch 'sp2' of https://github.com/jaroslavh/bc into sp2
jaroslavh May 12, 2019
ae1f7b3
added missing image
jaroslavh May 12, 2019
6f8d71f
Introduction and Goals updated
jaroslavh May 12, 2019
d1eea01
after MK comments
jaroslavh May 12, 2019
bf13550
merged pictures and results for exp1 and exp2
jaroslavh May 12, 2019
3cbb601
VH comment incorporated
jaroslavh May 12, 2019
5122dd6
added fit experiment
jaroslavh May 12, 2019
f545f6d
added conclusion of whole thesis
jaroslavh May 12, 2019
ca44956
conlusion updated, abstract, most of todo done
jaroslavh May 13, 2019
ace7973
almost done, last experiment missing
jaroslavh May 14, 2019
38d37ad
updated repository and final touches to the tex file
jaroslavh May 14, 2019
eb2639d
changes in experiment, last minute, i hope i did not break it :( :(
jaroslavh May 14, 2019
eecb35d
modification of second experiment
jaroslavh May 15, 2019
37c2e6b
final version of text
jaroslavh May 15, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
89 changes: 33 additions & 56 deletions data-generation/datagen.py
Original file line number Diff line number Diff line change
@@ -1,66 +1,43 @@
import json
from random import randint
import csv
import sys

#-------------------------------------------------------------------------
# Generates data in 3d space 0 to 1000 cube.
# Outputs a csv file in format cluster number, x, y, z
# Parameters:
# cluster_size integer - number of points to generate in each cluster
# cluster_number integer - number of clusters in space, capped at 8
# file_name string - name of file to save the cluster
# noise boolean - True - generate only clusters, False - generate some more random points
def generate_3d_data(file_name, cluster_number, cluster_size, noise):

if (cluster_number) > 8:
return False

with open(file_name, 'w', newline='') as csv_file:
data_writer = csv.writer(csv_file, delimiter=',', quotechar='|', quoting=csv.QUOTE_MINIMAL)
for cluster in range(0, cluster_number):
x = 100 + 500 * ((cluster >> 2) % 2)
y = 100 + 500 * ((cluster >> 1) % 2)
z = 100 + 500 * ((cluster >> 0) % 2)
for item in range(1, cluster_size + 1):
data_writer.writerow([cluster, randint(0, 450) + x, randint(0, 450)
+ y, randint(0, 450) + z])
if noise == True:
generate_random_points(cluster_size, file_name)
return True

#-------------------------------------------------------------------------
# Generates random points in 0 to 1000 space and appends them to csv file
# Format 100, x, y, z where 100 stands for no cluster but scattered random points
# Parameters:
# number_of_points integer - how many points to generate
# file_name string - to what file to append
def generate_random_points(number_of_points, file_name):
with open(file_name, 'a', newline='') as csv_file:
data_writer = csv.writer(csv_file, delimiter=',', quotechar='|', quoting=csv.QUOTE_MINIMAL)
for i in range(0, number_of_points + 1):
data_writer.writerow([100, randint(0, 1000), randint(0, 1000), randint(0, 1000)])
# filling table = for size = 4
# XX | a0 | a1 | a2 | a3
# ------------------------
# a0 | XX | | |
# ------------------------
# a1 |fill| XX | |
# ------------------------
# a2 |fill|fill| XX |
# ------------------------
# a3 |fill|fill|fill| XX
# ------------------------
def generate_similarity_tables(size):

ret_dict = {}
id_num = 0
while id_num != size: #generate identificators
ret_dict.update({"a" + str(id_num) : []})
id_num += 1

for row in ret_dict:
for column in ret_dict:
if(row == column):
continue
app = randint(0,1000)
if (app > 800):
ret_dict[row].append({column:app})
ret_dict[column].append({row:app})
return ret_dict

#-------------------------------------------------------------------------

print("Data generation started:")
print("Starting to generate data")

if len(sys.argv) != 5:
print("Usage: python3 datagen.py <output filename> <number of clusters> <points per cluster> <generate noise>")
exit(1)
data = generate_similarity_tables(1000)

out_file = sys.argv[1]
number_of_clusters = int(sys.argv[2])
points_per_cluster = int(sys.argv[3])
if (sys.argv[4] == 'T'):
gen_noise = True
elif (sys.argv[4] == 'F'):
gen_noise = False
else:
print("Please use T (true) or F (false) to generate specify whether you want to generate noise.")
exit(1)

if generate_3d_data(out_file, number_of_clusters, points_per_cluster, gen_noise):
print("Data generated.")
else:
print("Data generation failed.")
output_file = open("data.json", 'w')

output_file.write(json.dumps(data))
2 changes: 1 addition & 1 deletion datasets/overlap.csv
Original file line number Diff line number Diff line change
Expand Up @@ -2997,4 +2997,4 @@
5,607,375,477
5,489,532,480
5,763,350,645
5,357,510,698
5,357,510,698
Loading