Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update conversion #218

Closed
wants to merge 1 commit into from
Closed

update conversion #218

wants to merge 1 commit into from

Conversation

manonreau
Copy link
Contributor

@manonreau manonreau commented Oct 18, 2022

Reference Issues/PRs

Fixes #217

What does this implement/fix? Explain your changes

The edge features are now given as a list of lists instead of a list of string during the networkx object to pyg object conversion

What testing did you do to verify the changes in this PR?

def graph2pkl(g, fname):
    """
    Save graphs as .pkl files

    Args:
        g (object): graph
    """
    
    # Graphein data to save
    d = ["config",
        "coords",
        "edge_index",
        "element_symbol",
        "kind",
        "node_id",
        "node_type",
        "residue_name",
        "residue_number"]
    
    # Convert networkx graph to pytorch geometric object
    format_convertor = GraphFormatConvertor('nx', 'pyg',
                                                verbose = None,
                                                columns = d)
    g = format_convertor(g)
    return g

g = graph2pkl(G, ('test'))
print(g)
g.kind

Pull Request Checklist

  • Added a note about the modification or contribution to the ./CHANGELOG.md file (if applicable)
  • Added appropriate unit test functions in the ./graphein/tests/* directories (if applicable)
  • Modify documentation in the corresponding Jupyter Notebook under ./notebooks/ (if applicable)
  • Ran python -m py.test tests/ and make sure that all unit tests pass (for small modifications, it might be sufficient to only run the specific test file, e.g., python -m py.test tests/protein/test_graphs.py)
  • Checked for style issues by running black . and isort .

@sonarqubecloud
Copy link

SonarCloud Quality Gate failed.    Quality Gate failed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell B 1 Code Smell

No Coverage information No Coverage information
0.0% 0.0% Duplication

@a-r-j
Copy link
Owner

a-r-j commented Oct 18, 2022

Thanks for the PR @manonreau!! I'll check this out tomorrow.

Do you think you'd be able to add an appropriate unit test?

@a-r-j
Copy link
Owner

a-r-j commented Oct 23, 2022

Hi @manonreau could you provide the code for g = format_node_edge_features(g) so I can write a test & get this merged in? Thanks!!

@a-r-j
Copy link
Owner

a-r-j commented Oct 23, 2022

Changes added in #220

@a-r-j a-r-j closed this Oct 23, 2022
@manonreau
Copy link
Contributor Author

Hi @manonreau could you provide the code for g = format_node_edge_features(g) so I can write a test & get this merged in? Thanks!!

Hi @a-r-j, Thank you very much for considering my PRs. I just removed the g = format_node_edge_features(g) since it was just a function to add node level descriptors. I does not change anything to the structure of the graph object.

You should be able to write a test now.

@a-r-j
Copy link
Owner

a-r-j commented Oct 24, 2022

@manonreau I see. Would you be willing to share it anyway? It could be useful :)

And thanks for the contributions!!

@manonreau
Copy link
Contributor Author

Sure, here it is:

def onehot(idx, size):
    """One hot encoder
    """
    onehot = torch.zeros(size)
    # Fill the one-hot encoded sequence with 1 at the corresponding idx
    onehot[idx] = 1
    return np.array(onehot)

def format_node_edge_features(g):
    """Format the nodes and edges features computed with Graphein

    Args:
        g (object): graph

    Returns:
        object: updated graph
    """
    
    # one hot encoding
    residue_names = {'CYS': 0, 'HIS': 1, 'ASN': 2, 'GLN': 3, 'SER': 4, 'THR': 5, 'TYR': 6, 'TRP': 7,
                     'ALA': 8, 'PHE': 9, 'GLY': 10, 'ILE': 11, 'VAL': 12, 'MET': 13, 'PRO': 14, 'LEU': 15,
                     'GLU': 16, 'ASP': 17, 'LYS': 18, 'ARG': 19}
    
    edge_type_encoding = {
        'peptide_bond': 0, 'aromatic': 1, 'disulfide': 2, 'ionic': 3, 
        'aromatic_sulphur': 4, 'cation_pi' : 5, 'distance_threshold' : 6, 'hbond' : 7}
    
    # convert node information
    resname_onehot = []    
    for res in g.residue_name :
        # One hot encoding of the residue name
        resname_onehot.append(onehot(residue_names[res], len (residue_names)))

    g["residue"] = resname_onehot
    
    edge_onehot = []
    for res in g.kind :
        # One hot encoding of the edge type
        edge_onehot.append(onehot([edge_type_encoding[x] for x in res], len (edge_type_encoding)))

    g["edge_attr"] = edge_onehot

    return g

I later noticed that the onehot encoding is already provided by Graphein :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Networkx to pyg conversion loses track or edge features
2 participants