Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trying to understand cparents.txt in Constituency parsing #23

Open
inigo-jauregi opened this issue Mar 6, 2018 · 6 comments
Open

Trying to understand cparents.txt in Constituency parsing #23

inigo-jauregi opened this issue Mar 6, 2018 · 6 comments

Comments

@inigo-jauregi
Copy link

I have downloaded the SICK data and obtained the dependency and constituency parsing with the fetch_and_preprocess.sh file.

I am now trying to understand what is the information that is generated in the cparents.txt file.
This is an example:

a.txt -> Two dogs are fighting
a.cparents.txt -> 5 5 7 7 6 0 6

If I am not mistaken, from the cparents.txt I should be able to build the parse tree. Is that right? And how would the tree for this example look like?

Thanks for any help in advance

@inigo-jauregi
Copy link
Author

Another example:

a.txt -> Two dogs are playing by a tree
a.cparents.txt -> 8 8 10 11 12 13 13 9 0 9 10 11 12

@yeladlouni
Copy link

yeladlouni commented Jan 3, 2019

For N tokens, you'll obtain a binary tree of 2N-1 nodes.
To use contituency parsing, you should implement BinaryTreeLSTM which is not ported from the lua original code.

@yalunar
Copy link

yalunar commented Feb 24, 2019

First assign numbers from 0 to length-1 to the initial sentence in a.txt. These number stands for indices of leaf nodes.
Then substract 1 for all the numbers in a.cparents.txt. Now the number here stands for the index of parent node.
Take the following for example:
a.txt -> Two dogs are fighting | Two ->0,dogs -> 1,are -> 2,fighting -> 3.
a.cparents.txt -> 5 5 7 7 6 0 6 | a.cparents.txt -> 4 4 6 6 5 -1 5

Write cparents as following, the numbers here stands for the indices of a node, -1 stands for the root node:
4 4 6 6 5 -1 5 | parent node index
0 1 2 3 4 5 6 | child node index
the first row is the parent of the second row. For example,4 is the parent of 0 and 1, 6 is the parent of 2 and 3.
leaf nodes are 0,1,2,3.
Now you can have the tree. Hope this help.
@ijauregiCMCRC

@venusafroid
Copy link

First assign numbers from 0 to length-1 to the initial sentence in a.txt. These number stands for indices of leaf nodes.
Then substract 1 for all the numbers in a.cparents.txt. Now the number here stands for the index of parent node.
Take the following for example:
a.txt -> Two dogs are fighting | Two ->0,dogs -> 1,are -> 2,fighting -> 3.
a.cparents.txt -> 5 5 7 7 6 0 6 | a.cparents.txt -> 4 4 6 6 5 -1 5

Write cparents as following, the numbers here stands for the indices of a node, -1 stands for the root node:
4 4 6 6 5 -1 5 | parent node index
0 1 2 3 4 5 6 | child node index
the first row is the parent of the second row. For example,4 is the parent of 0 and 1, 6 is the parent of 2 and 3.
leaf nodes are 0,1,2,3.
Now you can have the tree. Hope this help.
@ijauregiCMCRC

Thank you very much !!!!!!
I have been confused for a long time. You answer is really helpful !!!!!

@venusafroid
Copy link

First assign numbers from 0 to length-1 to the initial sentence in a.txt. These number stands for indices of leaf nodes.
Then substract 1 for all the numbers in a.cparents.txt. Now the number here stands for the index of parent node.
Take the following for example:
a.txt -> Two dogs are fighting | Two ->0,dogs -> 1,are -> 2,fighting -> 3.
a.cparents.txt -> 5 5 7 7 6 0 6 | a.cparents.txt -> 4 4 6 6 5 -1 5

Write cparents as following, the numbers here stands for the indices of a node, -1 stands for the root node:
4 4 6 6 5 -1 5 | parent node index
0 1 2 3 4 5 6 | child node index
the first row is the parent of the second row. For example,4 is the parent of 0 and 1, 6 is the parent of 2 and 3.
leaf nodes are 0,1,2,3.
Now you can have the tree. Hope this help.
@ijauregiCMCRC
By the way, what the meaning of lines in dparents.txt?

@yalunar
Copy link

yalunar commented Aug 7, 2019

@venusafroid
Hi! Lines in dparents.txt represent dependency of two words. For example, the sentence is: "Two dogs are wrestling and hugging." The numbers in dparents.txt are parent node idxes of every word:
2 4 4 0 4 4
The i-th number represent the parent node index of the i-th word. 0 represents root node.
Then the parent node of "Two" is "dogs", the parent node of "dogs, are, and, hugging" are "wrestling".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants