KeyError: 'united_states' #3

hanayashiki · 2019-04-21T07:28:35Z

Hello, I would like to test HiExpan on wiki corpus. After featureExtraction, I ran

~/HiExpan/src/HiExpan-new$ python3.6 main.py -data wiki

to test.
But after loading those files in wiki/intermediate, I got:

=== Finish loading data ...... ===
=== Start loading seed supervision ...... ===
Traceback (most recent call last):
  File "main.py", line 120, in <module>
    newNode = TreeNode(parent=rootNode, level=0, eid=ename2eid[children], ename=children,
KeyError: 'united_states'

It seems that united_states is not included in those entities. What could possibly be wrong?
Thank you.

The text was updated successfully, but these errors were encountered:

hanayashiki · 2019-04-21T07:38:27Z

After I edited seedLoader.py from

    if corpusName == "wiki":
        userInput = [
            ["ROOT", -1, ["united_states", "china", "canada"]],
            ["united_states", 0, ["california", "illinois", "florida"]],
            ["china", 0, ["shandong", "zhejiang", "sichuan"]],
        ]

to

    if corpusName == "wiki":
        userInput = [
            ["ROOT", -1, ["United States", "China", "Canada"]],
            ["United States", 0, ["California", "Illinois", "Florida"]],
            ["China", 0, ["Shandong", "Zhejiang", "Sichuan"]],
        ]

It seems to be working. It seems that the phrases are not connect by "_" according to your paper.

mickeysjm · 2019-04-21T07:49:41Z

Thanks for pointing this out. The seed entities need to appear in the generated entity2id.txt file. I think the phrases are connected with "_" during the embedding learning and corpus preprocessing stage but then converted back. Glad to hear you have started running the expansion code. Thanks.

hanayashiki · 2019-04-21T10:00:09Z

Thanks for pointing this out. The seed entities need to appear in the generated entity2id.txt file. I think the phrases are connected with "_" during the embedding learning and corpus preprocessing stage but then converted back. Glad to hear you have started running the expansion code. Thanks.

I was using the preprocessed corpus downloaded from your given links. Maybe the sample inputs in the seedLoader.py should be changed to be compatible with that

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KeyError: 'united_states' #3

KeyError: 'united_states' #3

hanayashiki commented Apr 21, 2019

hanayashiki commented Apr 21, 2019

mickeysjm commented Apr 21, 2019

hanayashiki commented Apr 21, 2019

KeyError: 'united_states' #3

KeyError: 'united_states' #3

Comments

hanayashiki commented Apr 21, 2019

hanayashiki commented Apr 21, 2019

mickeysjm commented Apr 21, 2019

hanayashiki commented Apr 21, 2019