Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError: 'united_states' #3

Open
hanayashiki opened this issue Apr 21, 2019 · 3 comments
Open

KeyError: 'united_states' #3

hanayashiki opened this issue Apr 21, 2019 · 3 comments

Comments

@hanayashiki
Copy link

Hello, I would like to test HiExpan on wiki corpus. After featureExtraction, I ran

~/HiExpan/src/HiExpan-new$ python3.6 main.py -data wiki

to test.
But after loading those files in wiki/intermediate, I got:

=== Finish loading data ...... ===
=== Start loading seed supervision ...... ===
Traceback (most recent call last):
  File "main.py", line 120, in <module>
    newNode = TreeNode(parent=rootNode, level=0, eid=ename2eid[children], ename=children,
KeyError: 'united_states'

It seems that united_states is not included in those entities. What could possibly be wrong?
Thank you.

@hanayashiki
Copy link
Author

After I edited seedLoader.py from

    if corpusName == "wiki":
        userInput = [
            ["ROOT", -1, ["united_states", "china", "canada"]],
            ["united_states", 0, ["california", "illinois", "florida"]],
            ["china", 0, ["shandong", "zhejiang", "sichuan"]],
        ]

to

    if corpusName == "wiki":
        userInput = [
            ["ROOT", -1, ["United States", "China", "Canada"]],
            ["United States", 0, ["California", "Illinois", "Florida"]],
            ["China", 0, ["Shandong", "Zhejiang", "Sichuan"]],
        ]

It seems to be working. It seems that the phrases are not connect by "_" according to your paper.

@mickeysjm
Copy link
Owner

Thanks for pointing this out. The seed entities need to appear in the generated entity2id.txt file. I think the phrases are connected with "_" during the embedding learning and corpus preprocessing stage but then converted back. Glad to hear you have started running the expansion code. Thanks.

@hanayashiki
Copy link
Author

Thanks for pointing this out. The seed entities need to appear in the generated entity2id.txt file. I think the phrases are connected with "_" during the embedding learning and corpus preprocessing stage but then converted back. Glad to hear you have started running the expansion code. Thanks.

I was using the preprocessed corpus downloaded from your given links. Maybe the sample inputs in the seedLoader.py should be changed to be compatible with that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants