Write code to run Analogy task over data sets #16

sehsanm · 2018-12-03T13:21:23Z

There must be an option to set following option:

Cosine distance
Euclidean Distance
Distance defined in " Omer Levy and Yoav Goldberg. Linguistic regularities in sparse and explicit word representations. In CoNLL , 2014"
Support batch operation
To have a threshold to accept the word if it is in top N result

abb4s · 2018-12-09T19:22:44Z

hi , I created a sample format for test script :
analogy_test.py.zip
but I need a sample data-set of a corpus , its corresponding semantic vector model and an analogy data-set . if you define and create architecture of project it would be more clearer . I assumed that we have a data-sets package for loading corpora and data-sets and models package for loading models.

sehsanm · 2018-12-09T19:51:28Z

Hi
Some points here I'm assuming what you have sent is a psudo-code for what needs to be done. What you have to do is to create a package containing base methods for analogy task.

The final output from my perspective is that you load all datasets for analogy (maybe multiple) and then run them and finally create a CSV file to have results as well as categories score for each dataset .
So you are not dependent on corpus, you are dependent on a memory loaded model (see : #15 )

So the psudo code will be something like :

import models
#load dataset
analog_datasets=datasets.loadAnalogyDataset('/data/analogy')
#load model
model = models.loadmodel('/data/models/model_khafan.bin' )
for dataset in analog_datasets 
	for row in data_set:
		r1= model.getVec(row.a)
		r2= model.getVec(row.b)
		r3=model.getVec(row.c)
		words=model.getKNear(r3+r2-r1,thershold , 'Cosine_Distance')
		totals[row.category] = totals[row.category] + 1
		if row.d in words:
			corrects[row.category] = =corrects[row.category] + 1
	write_result_to_file(data_set , totals , corrects)

So what you have to do :

Write the code to load one or more analogy data
consider the catgories on analogy
write the code to evaluate (with option of multiple distances cosine, euclidian, max, ....)
a helper method to store the result as CSV in a file for further process

abb4s · 2018-12-10T18:07:05Z

hi , thank you for instructions. I tried to implement requirements but I can't test it completely because we haven't model yet . result file is attached :
scripts.zip

sehsanm added this to the Assignment milestone Dec 3, 2018

sehsanm added the TEST SCRIPT label Dec 3, 2018

abb4s self-assigned this Dec 6, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Write code to run Analogy task over data sets #16

Write code to run Analogy task over data sets #16

sehsanm commented Dec 3, 2018 •

edited

Loading

abb4s commented Dec 9, 2018

sehsanm commented Dec 9, 2018

abb4s commented Dec 10, 2018

Write code to run Analogy task over data sets #16

Write code to run Analogy task over data sets #16

Comments

sehsanm commented Dec 3, 2018 • edited Loading

abb4s commented Dec 9, 2018

sehsanm commented Dec 9, 2018

abb4s commented Dec 10, 2018

sehsanm commented Dec 3, 2018 •

edited

Loading