Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write code to run Analogy task over data sets #16

Open
sehsanm opened this issue Dec 3, 2018 · 3 comments
Open

Write code to run Analogy task over data sets #16

sehsanm opened this issue Dec 3, 2018 · 3 comments
Assignees
Milestone

Comments

@sehsanm
Copy link
Owner

sehsanm commented Dec 3, 2018

There must be an option to set following option:

  • Cosine distance
  • Euclidean Distance
  • Distance defined in " Omer Levy and Yoav Goldberg. Linguistic regularities in sparse and explicit word representations. In CoNLL , 2014"
  • Support batch operation
  • To have a threshold to accept the word if it is in top N result
@sehsanm sehsanm added this to the Assignment milestone Dec 3, 2018
@abb4s abb4s self-assigned this Dec 6, 2018
@abb4s
Copy link
Collaborator

abb4s commented Dec 9, 2018

hi , I created a sample format for test script :
analogy_test.py.zip
but I need a sample data-set of a corpus , its corresponding semantic vector model and an analogy data-set . if you define and create architecture of project it would be more clearer . I assumed that we have a data-sets package for loading corpora and data-sets and models package for loading models.

@sehsanm
Copy link
Owner Author

sehsanm commented Dec 9, 2018

Hi
Some points here I'm assuming what you have sent is a psudo-code for what needs to be done. What you have to do is to create a package containing base methods for analogy task.

The final output from my perspective is that you load all datasets for analogy (maybe multiple) and then run them and finally create a CSV file to have results as well as categories score for each dataset .
So you are not dependent on corpus, you are dependent on a memory loaded model (see : #15 )

So the psudo code will be something like :

import models
#load dataset
analog_datasets=datasets.loadAnalogyDataset('/data/analogy')
#load model
model = models.loadmodel('/data/models/model_khafan.bin' )
for dataset in analog_datasets 
	for row in data_set:
		r1= model.getVec(row.a)
		r2= model.getVec(row.b)
		r3=model.getVec(row.c)
		words=model.getKNear(r3+r2-r1,thershold , 'Cosine_Distance')
		totals[row.category] = totals[row.category] + 1
		if row.d in words:
			corrects[row.category] = =corrects[row.category] + 1
	write_result_to_file(data_set , totals , corrects)

So what you have to do :

  • Write the code to load one or more analogy data
  • consider the catgories on analogy
  • write the code to evaluate (with option of multiple distances cosine, euclidian, max, ....)
  • a helper method to store the result as CSV in a file for further process

@abb4s
Copy link
Collaborator

abb4s commented Dec 10, 2018

hi , thank you for instructions. I tried to implement requirements but I can't test it completely because we haven't model yet . result file is attached :
scripts.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants