Skip to content

update conversion#218

Closed
manonreau wants to merge 1 commit intoa-r-j:masterfrom
manonreau:conversion
Closed

update conversion#218
manonreau wants to merge 1 commit intoa-r-j:masterfrom
manonreau:conversion

Conversation

@manonreau
Copy link
Copy Markdown
Contributor

@manonreau manonreau commented Oct 18, 2022

Reference Issues/PRs

Fixes #217

What does this implement/fix? Explain your changes

The edge features are now given as a list of lists instead of a list of string during the networkx object to pyg object conversion

What testing did you do to verify the changes in this PR?

def graph2pkl(g, fname):
    """
    Save graphs as .pkl files

    Args:
        g (object): graph
    """
    
    # Graphein data to save
    d = ["config",
        "coords",
        "edge_index",
        "element_symbol",
        "kind",
        "node_id",
        "node_type",
        "residue_name",
        "residue_number"]
    
    # Convert networkx graph to pytorch geometric object
    format_convertor = GraphFormatConvertor('nx', 'pyg',
                                                verbose = None,
                                                columns = d)
    g = format_convertor(g)
    return g

g = graph2pkl(G, ('test'))
print(g)
g.kind

Pull Request Checklist

  • Added a note about the modification or contribution to the ./CHANGELOG.md file (if applicable)
  • Added appropriate unit test functions in the ./graphein/tests/* directories (if applicable)
  • Modify documentation in the corresponding Jupyter Notebook under ./notebooks/ (if applicable)
  • Ran python -m py.test tests/ and make sure that all unit tests pass (for small modifications, it might be sufficient to only run the specific test file, e.g., python -m py.test tests/protein/test_graphs.py)
  • Checked for style issues by running black . and isort .

@sonarqubecloud
Copy link
Copy Markdown

SonarCloud Quality Gate failed.    Quality Gate failed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell B 1 Code Smell

No Coverage information No Coverage information
0.0% 0.0% Duplication

@a-r-j
Copy link
Copy Markdown
Owner

a-r-j commented Oct 18, 2022

Thanks for the PR @manonreau!! I'll check this out tomorrow.

Do you think you'd be able to add an appropriate unit test?

@a-r-j
Copy link
Copy Markdown
Owner

a-r-j commented Oct 23, 2022

Hi @manonreau could you provide the code for g = format_node_edge_features(g) so I can write a test & get this merged in? Thanks!!

@a-r-j
Copy link
Copy Markdown
Owner

a-r-j commented Oct 23, 2022

Changes added in #220

@a-r-j a-r-j closed this Oct 23, 2022
@manonreau
Copy link
Copy Markdown
Contributor Author

Hi @manonreau could you provide the code for g = format_node_edge_features(g) so I can write a test & get this merged in? Thanks!!

Hi @a-r-j, Thank you very much for considering my PRs. I just removed the g = format_node_edge_features(g) since it was just a function to add node level descriptors. I does not change anything to the structure of the graph object.

You should be able to write a test now.

@a-r-j
Copy link
Copy Markdown
Owner

a-r-j commented Oct 24, 2022

@manonreau I see. Would you be willing to share it anyway? It could be useful :)

And thanks for the contributions!!

@manonreau
Copy link
Copy Markdown
Contributor Author

Sure, here it is:

def onehot(idx, size):
    """One hot encoder
    """
    onehot = torch.zeros(size)
    # Fill the one-hot encoded sequence with 1 at the corresponding idx
    onehot[idx] = 1
    return np.array(onehot)

def format_node_edge_features(g):
    """Format the nodes and edges features computed with Graphein

    Args:
        g (object): graph

    Returns:
        object: updated graph
    """
    
    # one hot encoding
    residue_names = {'CYS': 0, 'HIS': 1, 'ASN': 2, 'GLN': 3, 'SER': 4, 'THR': 5, 'TYR': 6, 'TRP': 7,
                     'ALA': 8, 'PHE': 9, 'GLY': 10, 'ILE': 11, 'VAL': 12, 'MET': 13, 'PRO': 14, 'LEU': 15,
                     'GLU': 16, 'ASP': 17, 'LYS': 18, 'ARG': 19}
    
    edge_type_encoding = {
        'peptide_bond': 0, 'aromatic': 1, 'disulfide': 2, 'ionic': 3, 
        'aromatic_sulphur': 4, 'cation_pi' : 5, 'distance_threshold' : 6, 'hbond' : 7}
    
    # convert node information
    resname_onehot = []    
    for res in g.residue_name :
        # One hot encoding of the residue name
        resname_onehot.append(onehot(residue_names[res], len (residue_names)))

    g["residue"] = resname_onehot
    
    edge_onehot = []
    for res in g.kind :
        # One hot encoding of the edge type
        edge_onehot.append(onehot([edge_type_encoding[x] for x in res], len (edge_type_encoding)))

    g["edge_attr"] = edge_onehot

    return g

I later noticed that the onehot encoding is already provided by Graphein :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Networkx to pyg conversion loses track or edge features

2 participants